home
products
contribute
download
documentation
forum
Home
Forums
New posts
Search forums
What's new
New posts
All posts
Latest activity
Members
Registered members
Current visitors
Donate
Log in
Register
What's new
Search
Search
Search titles only
By:
New posts
Search forums
Search titles only
By:
Menu
Log in
Register
Navigation
Install the app
Install
More options
Contact us
Close Menu
Forums
MediaPortal 1
MediaPortal 1 Plugins
Popular Plugins
Moving Pictures
IMDb scraper request with Short summary
Contact us
RSS
JavaScript is disabled. For a better experience, please enable JavaScript in your browser before proceeding.
You are using an out of date browser. It may not display this or other websites correctly.
You should upgrade or use an
alternative browser
.
Reply to thread
Message
<blockquote data-quote="RoChess" data-source="post: 603939" data-attributes="member: 18896"><p>The other thread is meant for the default IMDb scapers, so I moved you over, otherwise things get confusing.</p><p></p><p>The MovingPictures plugin will offer the option later to control to some extend what the scraper does, so you will be able to configure that you prefer short summaries over long ones, one director/writer over all, etc.</p><p></p><p>Until that update is done, I will teach you how to do it yourself.</p><p></p><p>[collapse]</p><p>Open a copy of the scraper in notepad (you don't wanna loose original incase you make a mistake).</p><p></p><p>If you have already installed the scraper you are about to edit, then you need to fix the header information.</p><p></p><p>[code] <version major="1" minor="5" point="..." /></p><p> <published month="04" day="..." year="2010" />[/code]</p><p></p><p>Make sure the "..." parts in the above section are higher then the one you already installed, or upgrade will fail.</p><p></p><p>Then locate and delete the following section:</p><p></p><p>[code] <!-- Plot Summary --></p><p> <retrieve name='summary_page' url='http://www.imdb.com/title/${movie.site_id}/plotsummary'/></p><p> <parse name="summary" input="${summary_page}" regex="${rx_plot}"/></p><p> <set name="summary_clean" value="${summary[0][0]:striptags}" /></p><p> <set name="movie.summary" value="${summary_clean:htmldecode}" />[/code]</p><p></p><p>As this gets the long summary, and the code following that deals with the short summary when no long summay exists. So by deleting this section you end up with short summary only.</p><p></p><p>For single director, you want to eliminate the 'loop' that gets all the directors, so locate:</p><p></p><p>[code] <!-- Directors --></p><p> <parse name="directors_block" input="${details_page}" regex='&lt;h5&gt;Director[s]?:&lt;/h5&gt;.*?&lt;/div&gt;'/></p><p> <parse name="directors" input="${directors_block}" regex='&lt;a href="/name/nm\d{7}/"[^&gt;]*&gt;([^&lt;]+)&lt;/a&gt;'/></p><p> <set name='movie.directors' value=''/></p><p> <loop name='currDirector' on='directors'></p><p> <set name="movie.directors" value="${movie.directors}|${currDirector[0]:htmldecode}"/></p><p> </loop>[/code]</p><p></p><p>and change it into:</p><p></p><p>[code] <!-- Directors --></p><p> <parse name="directors_block" input="${details_page}" regex='&lt;h5&gt;Director[s]?:&lt;/h5&gt;.*?&lt;/div&gt;'/></p><p> <parse name="directors" input="${directors_block}" regex='&lt;a href="/name/nm\d{7}/"[^&gt;]*&gt;([^&lt;]+)&lt;/a&gt;'/></p><p> <set name="movie.directors" value="|${directors[0][0]:htmldecode}|"/>[/code]</p><p></p><p>The same for writers, locate:</p><p></p><p>[code] <!-- Writers --></p><p> <parse name="writers_block" input="${details_page}" regex="${rx_writers_block}" /></p><p> <parse name='writers' input="${writers_block}" regex='&lt;a href="/name/nm\d+/"[^&gt;]*&gt;([^&lt;]+)&lt;/a&gt;'/></p><p> <set name='movie.writers' value=''/></p><p> <loop name='currWriter' on='writers'></p><p> <set name='movie.writers' value='${movie.writers}|${currWriter[0]:htmldecode}'/></p><p> </loop>[/code]</p><p></p><p>and change it into:</p><p></p><p>[code] <!-- Writers --></p><p> <parse name="writers_block" input="${details_page}" regex="${rx_writers_block}" /></p><p> <parse name='writers' input="${writers_block}" regex='&lt;a href="/name/nm\d+/"[^&gt;]*&gt;([^&lt;]+)&lt;/a&gt;'/></p><p> <set name='movie.writers' value='|${writers[0][0]:htmldecode}|'/>[/code]</p><p></p><p>Then save file, and import. For instructions on how to import a new scraper, use FAQ.[/collapse]</p><p></p><p>Enjoy.</p></blockquote><p></p>
[QUOTE="RoChess, post: 603939, member: 18896"] The other thread is meant for the default IMDb scapers, so I moved you over, otherwise things get confusing. The MovingPictures plugin will offer the option later to control to some extend what the scraper does, so you will be able to configure that you prefer short summaries over long ones, one director/writer over all, etc. Until that update is done, I will teach you how to do it yourself. [collapse] Open a copy of the scraper in notepad (you don't wanna loose original incase you make a mistake). If you have already installed the scraper you are about to edit, then you need to fix the header information. [code] <version major="1" minor="5" point="..." /> <published month="04" day="..." year="2010" />[/code] Make sure the "..." parts in the above section are higher then the one you already installed, or upgrade will fail. Then locate and delete the following section: [code] <!-- Plot Summary --> <retrieve name='summary_page' url='http://www.imdb.com/title/${movie.site_id}/plotsummary'/> <parse name="summary" input="${summary_page}" regex="${rx_plot}"/> <set name="summary_clean" value="${summary[0][0]:striptags}" /> <set name="movie.summary" value="${summary_clean:htmldecode}" />[/code] As this gets the long summary, and the code following that deals with the short summary when no long summay exists. So by deleting this section you end up with short summary only. For single director, you want to eliminate the 'loop' that gets all the directors, so locate: [code] <!-- Directors --> <parse name="directors_block" input="${details_page}" regex='<h5>Director[s]?:</h5>.*?</div>'/> <parse name="directors" input="${directors_block}" regex='<a href="/name/nm\d{7}/"[^>]*>([^<]+)</a>'/> <set name='movie.directors' value=''/> <loop name='currDirector' on='directors'> <set name="movie.directors" value="${movie.directors}|${currDirector[0]:htmldecode}"/> </loop>[/code] and change it into: [code] <!-- Directors --> <parse name="directors_block" input="${details_page}" regex='<h5>Director[s]?:</h5>.*?</div>'/> <parse name="directors" input="${directors_block}" regex='<a href="/name/nm\d{7}/"[^>]*>([^<]+)</a>'/> <set name="movie.directors" value="|${directors[0][0]:htmldecode}|"/>[/code] The same for writers, locate: [code] <!-- Writers --> <parse name="writers_block" input="${details_page}" regex="${rx_writers_block}" /> <parse name='writers' input="${writers_block}" regex='<a href="/name/nm\d+/"[^>]*>([^<]+)</a>'/> <set name='movie.writers' value=''/> <loop name='currWriter' on='writers'> <set name='movie.writers' value='${movie.writers}|${currWriter[0]:htmldecode}'/> </loop>[/code] and change it into: [code] <!-- Writers --> <parse name="writers_block" input="${details_page}" regex="${rx_writers_block}" /> <parse name='writers' input="${writers_block}" regex='<a href="/name/nm\d+/"[^>]*>([^<]+)</a>'/> <set name='movie.writers' value='|${writers[0][0]:htmldecode}|'/>[/code] Then save file, and import. For instructions on how to import a new scraper, use FAQ.[/collapse] Enjoy. [/QUOTE]
Insert quotes…
Verification
Post reply
Forums
MediaPortal 1
MediaPortal 1 Plugins
Popular Plugins
Moving Pictures
IMDb scraper request with Short summary
Contact us
RSS
Top
Bottom