home
products
contribute
download
documentation
forum
Home
Forums
New posts
Search forums
What's new
New posts
All posts
Latest activity
Members
Registered members
Current visitors
Donate
Log in
Register
What's new
Search
Search
Search titles only
By:
New posts
Search forums
Search titles only
By:
Menu
Log in
Register
Navigation
Install the app
Install
More options
Contact us
Close Menu
Forums
MediaPortal 1
MediaPortal 1 Plugins
Popular Plugins
Moving Pictures
FilmInfo+ - A german movie details scraper with auto grouping
Contact us
RSS
JavaScript is disabled. For a better experience, please enable JavaScript in your browser before proceeding.
You are using an out of date browser. It may not display this or other websites correctly.
You should upgrade or use an
alternative browser
.
Reply to thread
Message
<blockquote data-quote="RoChess" data-source="post: 1096485" data-attributes="member: 18896"><p>movingpictures.zip:</p><p> </p><p>[code]</p><p>18-Aug-2014 10:45:06 Debug [ ScraperNode]: executing parse: <parse name="writers" input="${details[0].drehbuch}" xpath="//name" /></p><p>18-Aug-2014 10:45:06 Debug [ ScraperNode]: Assigned variable: writers.count = 1</p><p>18-Aug-2014 10:45:06 Debug [ ScraperNode]: Assigned variable: writers[0] = Christopher Miller</p><p>18-Aug-2014 10:45:06 Debug [ ScraperNode]: Assigned variable: movie.writers =</p><p>18-Aug-2014 10:45:06 Debug [ ScraperNode]: Assigned variable: writer = Christopher Miller</p><p>18-Aug-2014 10:45:06 Debug [ ScraperNode]: Assigned variable: count = 0</p><p>18-Aug-2014 10:45:06 Debug [ ScraperNode]: Assigned variable: movie.writers = |Christopher Miller</p><p>[/code]</p><p> </p><p>So that should result in "Christopher Miller", and does not demonstrate the failure you mentioned.</p><p> </p><p>movingpictures2.zip gives me:</p><p> </p><p>[code]</p><p>18-Aug-2014 11:35:08 Debug [ ScraperNode]: Assigned variable: rx_writers_block = (?<=Writing\scredits)(?<WritersBlock>.*?)(?=</table>)</p><p>18-Aug-2014 11:35:08 Debug [ ScraperNode]: Assigned variable: rx_writers = (?:>)([^<]+?)(?:</)</p><p>18-Aug-2014 11:35:08 Debug [ ScraperNode]: Assigned variable: rx_cmnt = (\(\s*WGA\s*\))|(\(in alphabetical order\))|(\sand\s)|(&)</p><p>18-Aug-2014 11:35:08 Debug [ ScraperNode]: executing parse: <parse name="writers_block" input="${cast_page:htmldecode}" regex="${rx_writers_block}" /></p><p>18-Aug-2014 11:35:08 Debug [ ScraperNode]: name: writers_block ||| pattern: (?<=Writing\scredits)(?<WritersBlock>.*?)(?=</table>) ||| input: [not logged due to size]</p><p>18-Aug-2014 11:35:08 Debug [ ScraperNode]: Assigned variable: writers_block.count = 1</p><p>18-Aug-2014 11:35:08 Debug [ ScraperNode]: Assigned variable: movie.writers =</p><p>18-Aug-2014 11:35:08 Debug [ ScraperNode]: executing replace: <replace name="writers" input="${writers_block}" pattern="${rx_tag}" with=" " /></p><p>18-Aug-2014 11:35:08 Debug [ ScraperNode]: Assigned variable: writers = Erica Rivinoja ... (screenplay) & John Francis Daley ... (screenplay) & Jonathan M. Goldstein ... (screenplay) (as Jonathan Goldstein) Phil Lord ... (story) & Christopher Miller ... (story) & Erica Rivinoja ... (story) Judi Barrett ... (characters) & Ron Barrett ... (characters)</p><p>18-Aug-2014 11:35:08 Debug [ ScraperNode]: executing replace: <replace name="writers" input="${writers}" pattern="${rx_cmnt}" with=" " /></p><p>18-Aug-2014 11:35:08 Debug [ ScraperNode]: Assigned variable: writers = Erica Rivinoja ... (screenplay) John Francis Daley ... (screenplay) Jonathan M. Goldstein ... (screenplay) (as Jonathan Goldstein) Phil Lord ... (story) Christopher Miller ... (story) Erica Rivinoja ... (story) Judi Barrett ... (characters) Ron Barrett ... (characters)</p><p>18-Aug-2014 11:35:08 Debug [ ScraperNode]: executing replace: <replace name="writers" input="${writers}" pattern="\s+" with=" " /></p><p>18-Aug-2014 11:35:08 Debug [ ScraperNode]: Assigned variable: writers = Erica Rivinoja ... (screenplay) John Francis Daley ... (screenplay) Jonathan M. Goldstein ... (screenplay) (as Jonathan Goldstein) Phil Lord ... (story) Christopher Miller ... (story) Erica Rivinoja ... (story) Judi Barrett ... (characters) Ron Barrett ... (characters)</p><p>18-Aug-2014 11:35:08 Debug [ ScraperNode]: executing replace: <replace name="writers" input="${writers}" pattern="\)" with=")|" /></p><p>18-Aug-2014 11:35:08 Debug [ ScraperNode]: Assigned variable: writers = Erica Rivinoja ... (screenplay)| John Francis Daley ... (screenplay)| Jonathan M. Goldstein ... (screenplay)| (as Jonathan Goldstein)| Phil Lord ... (story)| Christopher Miller ... (story)| Erica Rivinoja ... (story)| Judi Barrett ... (characters)| Ron Barrett ... (characters)|</p><p>18-Aug-2014 11:35:08 Debug [ ScraperNode]: Assigned variable: movie.writers = Erica Rivinoja ... (screenplay)| John Francis Daley ... (screenplay)| Jonathan M. Goldstein ... (screenplay)| (as Jonathan Goldstein)| Phil Lord ... (story)| Christopher Miller ... (story)| Erica Rivinoja ... (story)| Judi Barrett ... (characters)| Ron Barrett ... (characters)|</p><p>[/code]</p><p> </p><p>So we got a winner there.</p><p> </p><p>The Regular Expression "rx_cmnt" is what is used to clean up the results (weird way though), and current value is "(\(\s*WGA\s*\))|(\(in alphabetical order\))|(\sand\s)|(&)"</p><p> </p><p>No idea why Merlyn made the stuff between parenthasis static, as I can not think of any reason that should be used.</p><p> </p><p>So change "rx_cmnt" into: <strong>(?:\(as[^)]+\))|(?:\([^)]+)|(?:\s*\.{2,}\s*)|(?:&)</strong></p><p> </p><p>And "<replace name="writers" input="${writers}" pattern="\)" with=")|" />" into: <strong><replace name="writers" input="${writers}" pattern="\)" with="|" /></strong></p><p> </p><p>And you will be good to go.</p><p> </p><p>Really weird way that was done on multiple lines, as it can all be done in a single Regular Expression, but that requires more work from me to explain, so you'll have to settle for this easy 'ugly' fix.</p><p> </p><p>PS: the <strong>bold</strong> stuff is the only stuff you end up adjusting into.</p></blockquote><p></p>
[QUOTE="RoChess, post: 1096485, member: 18896"] movingpictures.zip: [code] 18-Aug-2014 10:45:06 Debug [ ScraperNode]: executing parse: <parse name="writers" input="${details[0].drehbuch}" xpath="//name" /> 18-Aug-2014 10:45:06 Debug [ ScraperNode]: Assigned variable: writers.count = 1 18-Aug-2014 10:45:06 Debug [ ScraperNode]: Assigned variable: writers[0] = Christopher Miller 18-Aug-2014 10:45:06 Debug [ ScraperNode]: Assigned variable: movie.writers = 18-Aug-2014 10:45:06 Debug [ ScraperNode]: Assigned variable: writer = Christopher Miller 18-Aug-2014 10:45:06 Debug [ ScraperNode]: Assigned variable: count = 0 18-Aug-2014 10:45:06 Debug [ ScraperNode]: Assigned variable: movie.writers = |Christopher Miller [/code] So that should result in "Christopher Miller", and does not demonstrate the failure you mentioned. movingpictures2.zip gives me: [code] 18-Aug-2014 11:35:08 Debug [ ScraperNode]: Assigned variable: rx_writers_block = (?<=Writing\scredits)(?<WritersBlock>.*?)(?=</table>) 18-Aug-2014 11:35:08 Debug [ ScraperNode]: Assigned variable: rx_writers = (?:>)([^<]+?)(?:</) 18-Aug-2014 11:35:08 Debug [ ScraperNode]: Assigned variable: rx_cmnt = (\(\s*WGA\s*\))|(\(in alphabetical order\))|(\sand\s)|(&) 18-Aug-2014 11:35:08 Debug [ ScraperNode]: executing parse: <parse name="writers_block" input="${cast_page:htmldecode}" regex="${rx_writers_block}" /> 18-Aug-2014 11:35:08 Debug [ ScraperNode]: name: writers_block ||| pattern: (?<=Writing\scredits)(?<WritersBlock>.*?)(?=</table>) ||| input: [not logged due to size] 18-Aug-2014 11:35:08 Debug [ ScraperNode]: Assigned variable: writers_block.count = 1 18-Aug-2014 11:35:08 Debug [ ScraperNode]: Assigned variable: movie.writers = 18-Aug-2014 11:35:08 Debug [ ScraperNode]: executing replace: <replace name="writers" input="${writers_block}" pattern="${rx_tag}" with=" " /> 18-Aug-2014 11:35:08 Debug [ ScraperNode]: Assigned variable: writers = Erica Rivinoja ... (screenplay) & John Francis Daley ... (screenplay) & Jonathan M. Goldstein ... (screenplay) (as Jonathan Goldstein) Phil Lord ... (story) & Christopher Miller ... (story) & Erica Rivinoja ... (story) Judi Barrett ... (characters) & Ron Barrett ... (characters) 18-Aug-2014 11:35:08 Debug [ ScraperNode]: executing replace: <replace name="writers" input="${writers}" pattern="${rx_cmnt}" with=" " /> 18-Aug-2014 11:35:08 Debug [ ScraperNode]: Assigned variable: writers = Erica Rivinoja ... (screenplay) John Francis Daley ... (screenplay) Jonathan M. Goldstein ... (screenplay) (as Jonathan Goldstein) Phil Lord ... (story) Christopher Miller ... (story) Erica Rivinoja ... (story) Judi Barrett ... (characters) Ron Barrett ... (characters) 18-Aug-2014 11:35:08 Debug [ ScraperNode]: executing replace: <replace name="writers" input="${writers}" pattern="\s+" with=" " /> 18-Aug-2014 11:35:08 Debug [ ScraperNode]: Assigned variable: writers = Erica Rivinoja ... (screenplay) John Francis Daley ... (screenplay) Jonathan M. Goldstein ... (screenplay) (as Jonathan Goldstein) Phil Lord ... (story) Christopher Miller ... (story) Erica Rivinoja ... (story) Judi Barrett ... (characters) Ron Barrett ... (characters) 18-Aug-2014 11:35:08 Debug [ ScraperNode]: executing replace: <replace name="writers" input="${writers}" pattern="\)" with=")|" /> 18-Aug-2014 11:35:08 Debug [ ScraperNode]: Assigned variable: writers = Erica Rivinoja ... (screenplay)| John Francis Daley ... (screenplay)| Jonathan M. Goldstein ... (screenplay)| (as Jonathan Goldstein)| Phil Lord ... (story)| Christopher Miller ... (story)| Erica Rivinoja ... (story)| Judi Barrett ... (characters)| Ron Barrett ... (characters)| 18-Aug-2014 11:35:08 Debug [ ScraperNode]: Assigned variable: movie.writers = Erica Rivinoja ... (screenplay)| John Francis Daley ... (screenplay)| Jonathan M. Goldstein ... (screenplay)| (as Jonathan Goldstein)| Phil Lord ... (story)| Christopher Miller ... (story)| Erica Rivinoja ... (story)| Judi Barrett ... (characters)| Ron Barrett ... (characters)| [/code] So we got a winner there. The Regular Expression "rx_cmnt" is what is used to clean up the results (weird way though), and current value is "(\(\s*WGA\s*\))|(\(in alphabetical order\))|(\sand\s)|(&)" No idea why Merlyn made the stuff between parenthasis static, as I can not think of any reason that should be used. So change "rx_cmnt" into: [B](?:\(as[^)]+\))|(?:\([^)]+)|(?:\s*\.{2,}\s*)|(?:&)[/B] And "<replace name="writers" input="${writers}" pattern="\)" with=")|" />" into: [B]<replace name="writers" input="${writers}" pattern="\)" with="|" />[/B] And you will be good to go. Really weird way that was done on multiple lines, as it can all be done in a single Regular Expression, but that requires more work from me to explain, so you'll have to settle for this easy 'ugly' fix. PS: the [B]bold[/B] stuff is the only stuff you end up adjusting into. [/QUOTE]
Insert quotes…
Verification
Post reply
Forums
MediaPortal 1
MediaPortal 1 Plugins
Popular Plugins
Moving Pictures
FilmInfo+ - A german movie details scraper with auto grouping
Contact us
RSS
Top
Bottom