FilmInfo+ - A german movie details scraper with auto grouping (2 Viewers)

badboyxx · August 22, 2014

RoChess I tried your modification and it works so far well. But one problem persists. After the scraping between the writers are still unwanted words. I think these can be always different. I don't know how to solve this problem. Look at the picture. Perhaps someone has an idea. The imdb no is tt2333784

RoChess · August 22, 2014

Seems i was overzealous in minimizing the original expression, use the following:

Looking at your XML mods, I see you took my instructions the wrong way. You were supposed to edit the expression itself from old into new, not adjust the code function itself.

It should result in:

Code:

 <set name="rx_cmnt">
 <![CDATA[
 (?:\(as[^)]+\))|(?:\([^)]+)|(?:\s*\.{2,}\s*)|(?:\sand\s)|(?:&)
 ]]>
 </set>
....
 <replace name="writers" input="${writers}" pattern="${rx_cmnt}" with=" " />

Obviously the "...." means leave that old code alone, incase you take me literally again on that part

the "(?:\sand\s)" gets rid of " and "
the "(?:&)" gets rid of the '&'

So that should do it, unless u have other combinations.

badboyxx · August 23, 2014

When I take your code from the last post and change only the line

RoChess said:
Code:

<replace name="writers" input="${writers}" pattern="${rx_cmnt}" with=" " />

into

Code:

<replace name="writers" input="${writers}" pattern=" (?:\(as[^)]+\))|(?:\([^)]+)|(?:\s*\.{2,}\s*)|(?:&amp;)|(?:and)" with=" " />

then it works how it should. When I have new unwanted words in the future, I only have to add them too. Now I have nothing to change manually after scraping.

Big thanks to RoChess.

RoChess · August 24, 2014

Yeah, that is the same, I just expected rx_cmnt to be used elsewhere as well (directors/crew/etc), but I guess they do not use the same structure.

So if you do it that way you can kill the whole CDATA declaration, as it is no longer used.

badboyxx · August 31, 2014

RoChess can you help me one more time please?
I edited the category "Family" in the script as "Kinder- & Familienfilm". But when a movie is scraped with this category, it has the label "Kinder-/Familienfilm" and not "Kinder- & Familienfilm". Do you know what the problem could be in my script?

RoChess · September 1, 2014

I would have to see 'why' it fails, which means a scraper-debug enabled log file on a movie that you expect it to work on. That way I can see input string, RegExp used and output generated, so I can pin point why it fails. It seems to struggle with the '&' symbol, can you otherwise settle for "Kinder- und Familienfilme" ? At least to test.

badboyxx · September 1, 2014

RoChess said:
I would have to see 'why' it fails, which means a scraper-debug enabled log file on a movie that you expect it to work on. That way I can see input string, RegExp used and output generated, so I can pin point why it fails. It seems to struggle with the '&' symbol, can you otherwise settle for "Kinder- und Familienfilme" ? At least to test.

When I change it into "Kinder- und Familienfilme", it gets scraped as "Kinder-/Familienfilm".

Here is the scraper-debug enabled log file.

RoChess · September 1, 2014

Easy solution.

Source of data = http://ofdbgw.org/movie/252909

01-Sep-2014 12:36:43 Debug [ ScraperNode]: Assigned variable: details[0].genre = <genre>
<titel>Abenteuer</titel>
<titel>Kinder-/Familienfilm</titel>
<titel>Komödie</titel>
<titel>Musikfilm</titel>
</genre>

That means you just have to add another genre replacement entry for:

Kinder-/Familienfilm# Kinder- & Familienfilm#

badboyxx · September 1, 2014

RoChess said:
That means you just have to add another genre replacement entry for:

Kinder-/Familienfilm# Kinder- & Familienfilm#

I tried it exactly how you wrote but it won't work, I don't know why.

badboyxx · March 26, 2015

In this thread didn't happen something since a long time. The plugin is working so far so good but there is one problem. The summary of so many movies can't be scraped because the source site has no summary. Is there a possibility to expand the plugin with another site(s) which has more available summarys? I would do it but I have not the know-how.

FilmInfo+ - A german movie details scraper with auto grouping (2 Viewers)

badboyxx

Portal Pro

Attachments

RoChess

Extension Developer

badboyxx

Portal Pro

RoChess

Extension Developer

badboyxx

Portal Pro

Attachments

RoChess

Extension Developer

badboyxx

Portal Pro

Attachments

RoChess

Extension Developer

badboyxx

Portal Pro

badboyxx

Portal Pro

Users who are viewing this thread