Getting IMDb ID (1 Viewer)

piernik

Portal Pro
October 22, 2008
141
26
Hi,

I'm making polish scraper and IMDb ID is very important (ex. getting backdrops).
I'm trying to get it through wikipedia links, but it's not working every time.

Please paste here code for getting IMDb.com ID (nothing more) based on english title. Then I could include it in polish scraper.
I think it could be useful not just for me (that's why separate topic)
 

RoChess

Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    Hi,

    I'm making polish scraper and IMDb ID is very important (ex. getting backdrops).
    I'm trying to get it through wikipedia links, but it's not working every time.

    Please paste here code for getting IMDb.com ID (nothing more) based on english title. Then I could include it in polish scraper.
    I think it could be useful not just for me (that's why separate topic)

    Look at the fuzzy Google code that is used in the scope.dk scraper. You should be able to adapt that for your Polish scraper easy.
     

    piernik

    Portal Pro
    October 22, 2008
    141
    26
    Thanks for source.
    Here is code that fetches imdb id. Maybe someone finds it useful.

    Code:
    <!-- no imdb_id yet - try to get it from google - only exact title match -->
    		<if test='${movie.imdb_id}='>
    		  <retrieve name="imdb_find" url="http://www.google.com/search?hl=en&amp;hs=a8f&amp;q=site%3Awww.imdb.com+${movie.title:safe}+(${movie.year:safe})" />
    		  <parse name="imdb_parser" input="${imdb_find}" regex="&lt;h3\sclass=&quot;r&quot;&gt;&lt;a\shref=&quot;http\://www\.imdb\.com/title/(tt\d+)\/&quot;\s.+?&gt;&lt;em&gt;${movie.title}&lt;/em&gt;\s*\(&lt;em&gt;${movie.year}&lt;/em&gt;\)" />
    		  <set name="movie.imdb_id" value="${imdb_parser[0][0]}"/>
    		  <!-- If no IMDB try to get it with the first exact alternate title. -->
    		  <if test='${movie.imdb_id}='>
    			<if test='${movie.alternate_titles}!='>
    				<retrieve name="imdb_find" url="http://www.google.com/search?hl=en&amp;hs=a8f&amp;q=site%3Awww.imdb.com+${movie.alternate_titles}+(${movie.year:safe})" />
    				<parse name="imdb_parser" input="${imdb_find}" regex="&lt;h3\sclass=&quot;r&quot;&gt;&lt;a\shref=&quot;http\://www\.imdb\.com/title/(tt\d+)\/&quot;\s.+?&gt;&lt;em&gt;${movie.alternate_titles}&lt;/em&gt;\s*\(&lt;em&gt;${movie.year}&lt;/em&gt;\)" />
    				<set name="movie.imdb_id" value="${imdb_parser[0][0]}"/>
    			</if>
    		  </if>
    		  <!-- If no IMDB try to get it with the first not exact title. -->
    		  <if test='${movie.imdb_id}='>
    			<retrieve name="imdb_find" url="http://www.google.com/search?hl=en&amp;hs=a8f&amp;q=site%3Awww.imdb.com+${movie.title}+(${movie.year:safe})" />
    			<parse name="imdb_parser" input="${imdb_find}" regex="&lt;h3\sclass=&quot;r&quot;&gt;&lt;a\shref=&quot;http\://www\.imdb\.com/title/(tt\d+)\/&quot;\s.+?&gt;&lt;em&gt;.+?&lt;/em&gt;\s*\(&lt;em&gt;${movie.year}&lt;/em&gt;\)" />
    			<set name="movie.imdb_id" value="${imdb_parser[0][0]}"/>
    		  </if>
    		  <!-- If no IMDB try to get it with the first not exact alternate title. -->
    		  <if test='${movie.imdb_id}='>
    			<if test='${movie.alternate_titles}!='>
    				<retrieve name="imdb_find" url="http://www.google.com/search?hl=en&amp;hs=a8f&amp;q=site%3Awww.imdb.com+${movie.alternate_titles}+(${movie.year:safe})" />
    				<parse name="imdb_parser" input="${imdb_find}" regex="&lt;h3\sclass=&quot;r&quot;&gt;&lt;a\shref=&quot;http\://www\.imdb\.com/title/(tt\d+)\/&quot;\s.+?&gt;&lt;em&gt;.+?&lt;/em&gt;\s*\(&lt;em&gt;${movie.year}&lt;/em&gt;\)" />
    				<set name="movie.imdb_id" value="${imdb_parser[0][0]}"/>
    			</if>
    		  </if>
    		</if>
     

    Users who are viewing this thread

    Top Bottom