FilmInfo+ - A german movie details scraper with auto grouping (1 Viewer)

badboyxx

Portal Pro
June 15, 2012
728
97
Home Country
Germany Germany
RoChess I tried your modification and it works so far well. But one problem persists. After the scraping between the writers are still unwanted words. I think these can be always different. I don't know how to solve this problem. Look at the picture. Perhaps someone has an idea. The imdb no is tt2333784
 

Attachments

  • writers.jpg
    writers.jpg
    229.9 KB

RoChess

Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    Seems i was overzealous in minimizing the original expression, use the following:

    Looking at your XML mods, I see you took my instructions the wrong way. You were supposed to edit the expression itself from old into new, not adjust the code function itself.

    It should result in:

    Code:
     <set name="rx_cmnt">
     <![CDATA[
     (?:\(as[^)]+\))|(?:\([^)]+)|(?:\s*\.{2,}\s*)|(?:\sand\s)|(?:&)
     ]]>
     </set>
    ....
     <replace name="writers" input="${writers}" pattern="${rx_cmnt}" with=" " />

    Obviously the "...." means leave that old code alone, incase you take me literally again on that part :)

    the "(?:\sand\s)" gets rid of " and "
    the "(?:&)" gets rid of the '&'

    So that should do it, unless u have other combinations.
     

    badboyxx

    Portal Pro
    June 15, 2012
    728
    97
    Home Country
    Germany Germany
    When I take your code from the last post and change only the line

    Code:
    <replace name="writers" input="${writers}" pattern="${rx_cmnt}" with=" " />


    into

    Code:
    <replace name="writers" input="${writers}" pattern=" (?:\(as[^)]+\))|(?:\([^)]+)|(?:\s*\.{2,}\s*)|(?:&amp;)|(?:and)" with=" " />

    then it works how it should. When I have new unwanted words in the future, I only have to add them too. Now I have nothing to change manually after scraping.

    Big thanks to RoChess.
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    Yeah, that is the same, I just expected rx_cmnt to be used elsewhere as well (directors/crew/etc), but I guess they do not use the same structure.

    So if you do it that way you can kill the whole CDATA declaration, as it is no longer used.
     

    badboyxx

    Portal Pro
    June 15, 2012
    728
    97
    Home Country
    Germany Germany
    RoChess can you help me one more time please?
    I edited the category "Family" in the script as "Kinder- & Familienfilm". But when a movie is scraped with this category, it has the label "Kinder-/Familienfilm" and not "Kinder- & Familienfilm". Do you know what the problem could be in my script?
     

    Attachments

    • FilmInfo+_V1.3.9-2.xml
      74.1 KB

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    I would have to see 'why' it fails, which means a scraper-debug enabled log file on a movie that you expect it to work on. That way I can see input string, RegExp used and output generated, so I can pin point why it fails. It seems to struggle with the '&' symbol, can you otherwise settle for "Kinder- und Familienfilme" ? At least to test.
     

    badboyxx

    Portal Pro
    June 15, 2012
    728
    97
    Home Country
    Germany Germany
    I would have to see 'why' it fails, which means a scraper-debug enabled log file on a movie that you expect it to work on. That way I can see input string, RegExp used and output generated, so I can pin point why it fails. It seems to struggle with the '&' symbol, can you otherwise settle for "Kinder- und Familienfilme" ? At least to test.


    When I change it into "Kinder- und Familienfilme", it gets scraped as "Kinder-/Familienfilm".

    Here is the scraper-debug enabled log file.
     

    Attachments

    • movingpictures.zip
      86.3 KB

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    Easy solution.

    Source of data = http://ofdbgw.org/movie/252909

    01-Sep-2014 12:36:43 Debug [ ScraperNode]: Assigned variable: details[0].genre = <genre>
    <titel>Abenteuer</titel>
    <titel>Kinder-/Familienfilm</titel>
    <titel>Komödie</titel>
    <titel>Musikfilm</titel>
    </genre>

    That means you just have to add another genre replacement entry for:

    Kinder-/Familienfilm# Kinder- & Familienfilm#
     

    badboyxx

    Portal Pro
    June 15, 2012
    728
    97
    Home Country
    Germany Germany
    That means you just have to add another genre replacement entry for:

    Kinder-/Familienfilm# Kinder- & Familienfilm#


    I tried it exactly how you wrote but it won't work, I don't know why.
     

    badboyxx

    Portal Pro
    June 15, 2012
    728
    97
    Home Country
    Germany Germany
    In this thread didn't happen something since a long time. The plugin is working so far so good but there is one problem. The summary of so many movies can't be scraped because the source site has no summary. Is there a possibility to expand the plugin with another site(s) which has more available summarys? I would do it but I have not the know-how.
     

    Users who are viewing this thread

    Top Bottom