- October 9, 2008
- 235
- 12
- Home Country
-
Denmark
Hey where can i get a small plot IMDB scraper that is working!?
Denmark
Denmark
Hey where can i get a small plot IMDB scraper that is working!?
Hey where can i get a small plot IMDB scraper that is working!?
No one!??
Thee was one a while ago, but can't get it to workanymore..
I'm tired of all my movies are downloaded with a very large plot... It's impossible to read all that, just to see what the movie is about!!
Denmark
Hey where can i get a small plot IMDB scraper that is working!?
No one!??
Thee was one a while ago, but can't get it to workanymore..
I'm tired of all my movies are downloaded with a very large plot... It's impossible to read all that, just to see what the movie is about!!
I thing i read on google code for shurt plot or long plot will be in the new version 0.8 that will come out soon ....
rasmuskarlsen;535498 As far as i can see it's not going to be included in v. 0.8.0...[/QUOTE said:sorry it will be in v 0.9 look in screenshot # 318
<!-- Plot Summary -->
<retrieve name='summary_page' url='http://www.imdb.com/title/${movie.site_id}/plotsummary'/>
<parse name="summary" input="${summary_page}" regex="${rx_plot}"/>
<set name="summary_clean" value="${summary[0][0]:striptags}" />
<set name="movie.summary" value="${summary_clean:htmldecode}" />
<!-- Plot Summary (if first method fails) -->
<if test="${movie.summary}=">
<parse name="summary2" input="${details_page}" regex="${rx_plot2}"/>
<set name="summary_clean" value="${summary2[0][0]:striptags}" />
<set name="movie.summary" value="${summary_clean:htmldecode}" />
</if>
<!-- Do not retrieve long plot
<retrieve name='summary_page' url='http://www.imdb.com/title/${movie.site_id}/plotsummary'/>
<parse name="summary" input="${summary_page}" regex="${rx_plot}"/>
<set name="summary_clean" value="${summary[0][0]:striptags}" />
<set name="movie.summary" value="${summary_clean:htmldecode}" /> -->
<!-- Plot Summary (Retrieve short plot) -->
<parse name="summary2" input="${details_page}" regex="${rx_plot2}"/>
<set name="summary_clean" value="${summary2[0][0]:striptags}" />
<set name="movie.summary" value="${summary_clean:htmldecode}" />
</if>
If you look at the current imdb scraper script you will see this section:
What is happening here is that the scraper is looking for the long plot from the /plotsummary page at imdb. If this method fails to retrieve the (long) plot it falls back to secondary method, which is to grab the (short) plot from the film's main imdb page.Code:<!-- Plot Summary --> <retrieve name='summary_page' url='http://www.imdb.com/title/${movie.site_id}/plotsummary'/> <parse name="summary" input="${summary_page}" regex="${rx_plot}"/> <set name="summary_clean" value="${summary[0][0]:striptags}" /> <set name="movie.summary" value="${summary_clean:htmldecode}" /> <!-- Plot Summary (if first method fails) --> <if test="${movie.summary}="> <parse name="summary2" input="${details_page}" regex="${rx_plot2}"/> <set name="summary_clean" value="${summary2[0][0]:striptags}" /> <set name="movie.summary" value="${summary_clean:htmldecode}" /> </if>
To change this behaviour, you could edit the scraper xml and reimport it into MovingPictures. You could change the code above to this:
Code:<!-- Do not retrieve long plot <retrieve name='summary_page' url='http://www.imdb.com/title/${movie.site_id}/plotsummary'/> <parse name="summary" input="${summary_page}" regex="${rx_plot}"/> <set name="summary_clean" value="${summary[0][0]:striptags}" /> <set name="movie.summary" value="${summary_clean:htmldecode}" /> --> <!-- Plot Summary (Retrieve short plot) --> <parse name="summary2" input="${details_page}" regex="${rx_plot2}"/> <set name="summary_clean" value="${summary2[0][0]:striptags}" /> <set name="movie.summary" value="${summary_clean:htmldecode}" /> </if>
This should now pick up sort plot summary for you.
Denmark
<!-- Do not retrieve long plot
<retrieve name='summary_page' url='http://www.imdb.com/title/${movie.site_id}/plotsummary'/>
<parse name="summary" input="${summary_page}" regex="${rx_plot}"/>
<set name="summary_clean" value="${summary[0][0]:striptags}" />
<set name="movie.summary" value="${summary_clean:htmldecode}" /> -->
<!-- Plot Summary (Retrieve short plot) -->
<parse name="summary2" input="${details_page}" regex="${rx_plot2}"/>
<set name="summary_clean" value="${summary2[0][0]:striptags}" />
<set name="movie.summary" value="${summary_clean:htmldecode}" />
--> ysmp
IMDB Scraper...
IMDb.xml - moving-pictures - Project Hosting on Google Code
It's working, if you remove the: </If> , at the bottom:
<!-- Do not retrieve long plot
<retrieve name='summary_page' url='http://www.imdb.com/title/${movie.site_id}/plotsummary'/>
<parse name="summary" input="${summary_page}" regex="${rx_plot}"/>
<set name="summary_clean" value="${summary[0][0]:striptags}" />
<set name="movie.summary" value="${summary_clean:htmldecode}" /> -->
<!-- Plot Summary (Retrieve short plot) -->
<parse name="summary2" input="${details_page}" regex="${rx_plot2}"/>
<set name="summary_clean" value="${summary2[0][0]:striptags}" />
<set name="movie.summary" value="${summary_clean:htmldecode}" />
But instead of "commenting out" the long plot. Can't you reverse it, so if no short plot is found, then the long plot is used instead!? Have tried playing with the code, but can't make it work :/
And thanks for the help, until now![]()
<!-- Short Plot Summary -->
<parse name="summary2" input="${details_page}" regex="${rx_plot2}"/>
<set name="summary_clean" value="${summary2[0][0]:striptags}" />
<set name="movie.summary" value="${summary_clean:htmldecode}" />
<!-- Long Plot Summary (if first method fails) -->
<if test="${movie.summary}=">
<retrieve name='summary_page' url='http://www.imdb.com/title/${movie.site_id}/plotsummary'/>
<parse name="summary" input="${summary_page}" regex="${rx_plot}"/>
<set name="summary_clean" value="${summary[0][0]:striptags}" />
<set name="movie.summary" value="${summary_clean:htmldecode}" />
</if>