Reply to thread

Message

<blockquote data-quote="RoChess" data-source="post: 603939" data-attributes="member: 18896">The other thread is meant for the default IMDb scapers, so I moved you over, otherwise things get confusing.The MovingPictures plugin will offer the option later to control to some extend what the scraper does, so you will be able to configure that you prefer short summaries over long ones, one director/writer over all, etc.Until that update is done, I will teach you how to do it yourself.[collapse]Open a copy of the scraper in notepad (you don't wanna loose original incase you make a mistake).If you have already installed the scraper you are about to edit, then you need to fix the header information.[code]&nbsp; &nbsp; &lt;version major=&quot;1&quot; minor=&quot;5&quot; point=&quot;...&quot; /&gt;&nbsp; &nbsp; &lt;published month=&quot;04&quot; day=&quot;...&quot; year=&quot;2010&quot; /&gt;[/code]Make sure the &quot;...&quot; parts in the above section are higher then the one you already installed, or upgrade will fail.Then locate and delete the following section:[code]&nbsp; &nbsp; &nbsp; &lt;!-- Plot Summary --&gt;&nbsp; &nbsp; &nbsp; &lt;retrieve name='summary_page' url='http://www.imdb.com/title/${movie.site_id}/plotsummary'/&gt;&nbsp; &nbsp; &nbsp; &lt;parse name=&quot;summary&quot; input=&quot;${summary_page}&quot; regex=&quot;${rx_plot}&quot;/&gt;&nbsp; &nbsp; &nbsp; &lt;set name=&quot;summary_clean&quot; value=&quot;${summary[0][0]:striptags}&quot; /&gt;&nbsp; &nbsp; &nbsp; &lt;set name=&quot;movie.summary&quot; value=&quot;${summary_clean:htmldecode}&quot; /&gt;[/code]As this gets the long summary, and the code following that deals with the short summary when no long summay exists. So by deleting this section you end up with short summary only.For single director, you want to eliminate the 'loop' that gets all the directors, so locate:[code]&nbsp; &nbsp; &nbsp; &lt;!-- Directors --&gt;&nbsp; &nbsp; &nbsp; &lt;parse name=&quot;directors_block&quot; input=&quot;${details_page}&quot; regex='&amp;lt;h5&amp;gt;Director[s]?:&amp;lt;/h5&amp;gt;.*?&amp;lt;/div&amp;gt;'/&gt;&nbsp; &nbsp; &nbsp; &lt;parse name=&quot;directors&quot; input=&quot;${directors_block}&quot; regex='&amp;lt;a href=&quot;/name/nm\d{7}/&quot;[^&amp;gt;]*&amp;gt;([^&amp;lt;]+)&amp;lt;/a&amp;gt;'/&gt;&nbsp; &nbsp; &nbsp; &lt;set name='movie.directors' value=''/&gt;&nbsp; &nbsp; &nbsp; &lt;loop name='currDirector' on='directors'&gt;&nbsp; &nbsp; &nbsp; &nbsp; &lt;set name=&quot;movie.directors&quot; value=&quot;${movie.directors}|${currDirector[0]:htmldecode}&quot;/&gt;&nbsp; &nbsp; &nbsp; &lt;/loop&gt;[/code]and change it into:[code]&nbsp; &nbsp; &nbsp; &lt;!-- Directors --&gt;&nbsp; &nbsp; &nbsp; &lt;parse name=&quot;directors_block&quot; input=&quot;${details_page}&quot; regex='&amp;lt;h5&amp;gt;Director[s]?:&amp;lt;/h5&amp;gt;.*?&amp;lt;/div&amp;gt;'/&gt;&nbsp; &nbsp; &nbsp; &lt;parse name=&quot;directors&quot; input=&quot;${directors_block}&quot; regex='&amp;lt;a href=&quot;/name/nm\d{7}/&quot;[^&amp;gt;]*&amp;gt;([^&amp;lt;]+)&amp;lt;/a&amp;gt;'/&gt;&nbsp; &nbsp; &nbsp; &lt;set name=&quot;movie.directors&quot; value=&quot;|${directors[0][0]:htmldecode}|&quot;/&gt;[/code]The same for writers, locate:[code]&nbsp; &nbsp; &nbsp; &lt;!-- Writers --&gt;&nbsp; &nbsp; &nbsp; &lt;parse name=&quot;writers_block&quot; input=&quot;${details_page}&quot; regex=&quot;${rx_writers_block}&quot; /&gt;&nbsp; &nbsp; &nbsp; &lt;parse name='writers' input=&quot;${writers_block}&quot; regex='&amp;lt;a href=&quot;/name/nm\d+/&quot;[^&amp;gt;]*&amp;gt;([^&amp;lt;]+)&amp;lt;/a&amp;gt;'/&gt;&nbsp; &nbsp; &nbsp; &lt;set name='movie.writers' value=''/&gt;&nbsp; &nbsp; &nbsp; &lt;loop name='currWriter' on='writers'&gt;&nbsp; &nbsp; &nbsp; &nbsp; &lt;set name='movie.writers' value='${movie.writers}|${currWriter[0]:htmldecode}'/&gt;&nbsp; &nbsp; &nbsp; &lt;/loop&gt;[/code]and change it into:[code]&nbsp; &nbsp; &nbsp; &lt;!-- Writers --&gt;&nbsp; &nbsp; &nbsp; &lt;parse name=&quot;writers_block&quot; input=&quot;${details_page}&quot; regex=&quot;${rx_writers_block}&quot; /&gt;&nbsp; &nbsp; &nbsp; &lt;parse name='writers' input=&quot;${writers_block}&quot; regex='&amp;lt;a href=&quot;/name/nm\d+/&quot;[^&amp;gt;]*&amp;gt;([^&amp;lt;]+)&amp;lt;/a&amp;gt;'/&gt;&nbsp; &nbsp; &nbsp; &lt;set name='movie.writers' value='|${writers[0][0]:htmldecode}|'/&gt;[/code]Then save file, and import. For instructions on how to import a new scraper, use FAQ.[/collapse]Enjoy.</blockquote>

[QUOTE="RoChess, post: 603939, member: 18896"] The other thread is meant for the default IMDb scapers, so I moved you over, otherwise things get confusing. The MovingPictures plugin will offer the option later to control to some extend what the scraper does, so you will be able to configure that you prefer short summaries over long ones, one director/writer over all, etc. Until that update is done, I will teach you how to do it yourself. [collapse] Open a copy of the scraper in notepad (you don't wanna loose original incase you make a mistake). If you have already installed the scraper you are about to edit, then you need to fix the header information. [code] <version major="1" minor="5" point="..." /> <published month="04" day="..." year="2010" />[/code] Make sure the "..." parts in the above section are higher then the one you already installed, or upgrade will fail. Then locate and delete the following section: [code]  <retrieve name='summary_page' url='http://www.imdb.com/title/${movie.site_id}/plotsummary'/> <parse name="summary" input="${summary_page}" regex="${rx_plot}"/> <set name="summary_clean" value="${summary[0][0]:striptags}" /> <set name="movie.summary" value="${summary_clean:htmldecode}" />[/code] As this gets the long summary, and the code following that deals with the short summary when no long summay exists. So by deleting this section you end up with short summary only. For single director, you want to eliminate the 'loop' that gets all the directors, so locate: [code]  <parse name="directors_block" input="${details_page}" regex='<h5>Director[s]?:</h5>.*?</div>'/> <parse name="directors" input="${directors_block}" regex='<a href="/name/nm\d{7}/"[^>]*>([^<]+)</a>'/> <set name='movie.directors' value=''/> <loop name='currDirector' on='directors'> <set name="movie.directors" value="${movie.directors}|${currDirector[0]:htmldecode}"/> </loop>[/code] and change it into: [code]  <parse name="directors_block" input="${details_page}" regex='<h5>Director[s]?:</h5>.*?</div>'/> <parse name="directors" input="${directors_block}" regex='<a href="/name/nm\d{7}/"[^>]*>([^<]+)</a>'/> <set name="movie.directors" value="|${directors[0][0]:htmldecode}|"/>[/code] The same for writers, locate: [code]  <parse name="writers_block" input="${details_page}" regex="${rx_writers_block}" /> <parse name='writers' input="${writers_block}" regex='<a href="/name/nm\d+/"[^>]*>([^<]+)</a>'/> <set name='movie.writers' value=''/> <loop name='currWriter' on='writers'> <set name='movie.writers' value='${movie.writers}|${currWriter[0]:htmldecode}'/> </loop>[/code] and change it into: [code]  <parse name="writers_block" input="${details_page}" regex="${rx_writers_block}" /> <parse name='writers' input="${writers_block}" regex='<a href="/name/nm\d+/"[^>]*>([^<]+)</a>'/> <set name='movie.writers' value='|${writers[0][0]:htmldecode}|'/>[/code] Then save file, and import. For instructions on how to import a new scraper, use FAQ.[/collapse] Enjoy. [/QUOTE]

Verification