- July 8, 2011
- 250
- 322
- Home Country
- Germany
I've got some issues with the <retrieve /> node in my scraper. Hopefully someone can help...
So, the code I have is this:
and it results in
So, the problem obviously is the red marked line. The variable is cleared, right after the retrieve is done.
It should not be there and I cannot figure out where this is coming from.
The url I try to retrieve exists and I can copy it and open in IE or Firefox without any problem. Does anyone have any idea, what might be the cause?
No other retrieve in the scraper does this.
Can anyone help? @fforde or @RoChess maybe?
So, the code I have is this:
Code:
<!-- OFDB Details -->
<retrieve name="ofdb_details_page" url="http://www.ofdb.de/film/${movie.ofdb_id},${ofdb_movie_url[0][1]}"/>
<set name="rx_TitelDE">
<![CDATA[
(?:<title>OFDb\s-\s)(?<Titel>.*?)\s\((?<Jahr>\d{4})\)(?=</title>)
]]>
</set>
and it results in
Code:
16-Dec-2012 15:44:55 Debug [ ScraperNode]: executing retrieve: <retrieve name="ofdb_details_page" url="http://www.ofdb.de/film/${movie.ofdb_id},${ofdb_movie_url[0][1]}" />
16-Dec-2012 15:44:55 Debug [ ScraperNode]: Retrieving URL: http://www.ofdb.de/film/1067,Beverly-Hills-Cop-II
16-Dec-2012 15:44:55 Debug [ WebGrabber]: GetResponse: URL=http://www.ofdb.de/film/1067,Beverly-Hills-Cop-II, UserAgent=Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11, CookieHeader=ofdb_theme=0; ofdb_ret=view.php%253Fpage%253Dstart, Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
16-Dec-2012 15:44:55 Debug [ ScraperNode]: Assigned variable: urn://scraper/header/www.ofdb.de = ofdb_theme=0; ofdb_ret=view.php%253Fpage%253Dstart
16-Dec-2012 15:44:55 Debug [ WebGrabber]: GetString: Encoding=
[COLOR=#ff0000][B]16-Dec-2012 15:44:55 Debug [ ScraperNode]: Assigned variable: ofdb_details_page = [/B][/COLOR]
16-Dec-2012 15:44:55 Debug [ ScraperNode]: Assigned variable: rx_TitelDE = (?:<title>OFDb\s-\s)(?<Titel>.*?)\s\((?<Jahr>\d{4})\)(?=</title>)
So, the problem obviously is the red marked line. The variable is cleared, right after the retrieve is done.
It should not be there and I cannot figure out where this is coming from.
The url I try to retrieve exists and I can copy it and open in IE or Firefox without any problem. Does anyone have any idea, what might be the cause?
No other retrieve in the scraper does this.
Can anyone help? @fforde or @RoChess maybe?