Hi,
i try to make a scraper for cinefacts.de. i´m on my first attemp, so there should be only the search thing in it.
My regex is fine for the search site, but i can't get the movie search to work. it only finds one movie title if i wrote zufaellig verheiratet not for Zufällig verheiratet and the date is always (9999). Maybe one of you guys could take a look in it and try, to tell me what´s wrong and how to setup this.
Here' the code:
Muchas gracias
Schenk
i try to make a scraper for cinefacts.de. i´m on my first attemp, so there should be only the search thing in it.
My regex is fine for the search site, but i can't get the movie search to work. it only finds one movie title if i wrote zufaellig verheiratet not for Zufällig verheiratet and the date is always (9999). Maybe one of you guys could take a look in it and try, to tell me what´s wrong and how to setup this.
Here' the code:
Code:
<action name="search">
<set name="offset" value="0" />
<!-- Regular Expressions -->
<set name="rx_search_results">
<![CDATA[
<a href="/kino/(?<movieID>.+)/(?<movieAKA>.+)/filmdetails.html">\s+<b title="(?<movieTitle>.+?)"\s.+\s+\D+(?<movieYear>\d{4})
]]>
</set>
<!-- Retrieve results using Title -->
<retrieve name="search_page" url="http://www.cinefacts.de/suche/suche.php?name=${search.title:safe}" />
<!-- if we got a details page, this is used. if not, regex does not match so we dont process the loop-->
<parse name="details_page_block" input="${search_page}" regex="${rx_search_results}"/>
<if test="details_page_block[0][0]!=">
<loop name="item_return" on="details_page_block">
<add name="counter" value1="${count}" value2="${offset}" />
<set name="movie[${counter}].title" value="${item_return[2]:htmldecode}"/>
<set name="movie[${counter}].alternate_titles" value="${item_return[1]:htmldecode}" />
<!-- tests the existance of a year before trying to put on in the movie info -->
<if test="${item_return[3]}!=">
<set name="movie[${counter}].year" value="${item_return[3]:htmldecode}"/>
</if>
<set name="movie[${counter}].site_id" value="${item_return[0]}"/>
<set name="movie[${counter}].details_url" value="http://www.cinefacts.de/kino/${item_return[0]}/${item_return[1]}/filmdetails.html"/>
<subtract name="movie[${counter}].popularity" value1="100" value2="${counter}"/>
</loop>
</if>
</action>
</ScriptableScraper>
Muchas gracias
Schenk