Scraper Version & Published (1 Viewer)

mitiok2008 · March 2, 2009

Is it possible to turn of checking of version number on scraper loading? I try to modify scraper for kinopoisk.ru and each time I made small changes I need to change it. Moreover, I need to change published date - in this case only scraper is loaded.

Code:

    <version major="1" minor="1" point="5"/>
    <published month="2" day="22" year="2009"/>

fforde · March 2, 2009

Put the scraper engine in debug mode. In the config screen goto the Importer Settings tab, and click the Movie Details Data Sources button. On the popup click the gear icon and then click enable debug mode. You should now see a little green debug icon on the popup and you should be able to load new versions of the script without having to change the version number or release date.

mitiok2008 · March 2, 2009

perfect! thanks!

One more question. Do you have any idea what kind of error is the following. First cover-art is loaded ok (with connection timeout but OK), the second (from 22:28:08) doesn't. Requseted urls look very similar, but everytime I try to load that move I've got the same error.

Code:

02-Mar-2009 22:25:20  Info [       MovieImporter]: User reprocessing Kamenniy.cvetok.1946.DivX.DVDRip.Kinozal.tv.avi 
02-Mar-2009 22:25:23  Info [       MovieImporter]: No exact match for Kamenniy.cvetok.1946.DivX.DVDRip.Kinozal.tv.avi 
02-Mar-2009 22:25:37  Info [       MovieImporter]: User reprocessing Kamenniy.cvetok.1946.DivX.DVDRip.Kinozal.tv.avi 
02-Mar-2009 22:25:40  Info [       MovieImporter]: No exact match for Kamenniy.cvetok.1946.DivX.DVDRip.Kinozal.tv.avi 
02-Mar-2009 22:25:50  Info [       MovieImporter]: User reprocessing Kamenniy.cvetok.1946.DivX.DVDRip.Kinozal.tv.avi 
02-Mar-2009 22:25:53  Info [       MovieImporter]: Auto-approved Kamenniy.cvetok.1946.DivX.DVDRip.Kinozal.tv.avi as Каменный цветок 
02-Mar-2009 22:25:53  Info [       MovieImporter]: Retrieving details for "Каменный цветок" 
02-Mar-2009 22:26:08 Error [          WebGrabber]: Connection failed: Reached retry limit of 10. URL=http://www.kinopoisk.ru/level/1/film/44697 System.Net.WebException: Время ожидания операции истекло
   в System.Net.HttpWebRequest.GetResponse()
   в Cornerstone.Tools.WebGrabber.GetResponse()
02-Mar-2009 22:26:11 Error [         DBMovieInfo]: Bad URL format, failed loading image:  
02-Mar-2009 22:26:11 Error [         DBMovieInfo]: Failed retrieving cover artwork for 'Каменный цветок' [] from . 
02-Mar-2009 22:26:14 Error [         DBMovieInfo]: Bad URL format, failed loading image: / 
02-Mar-2009 22:26:14 Error [         DBMovieInfo]: Failed retrieving cover artwork for 'Каменный цветок' [] from /. 
02-Mar-2009 22:26:29  Info [         DBMovieInfo]: Added cover art for 'Каменный цветок' [] from http://www.kinopoisk.ru/images/poster/734846.jpg. 
02-Mar-2009 22:26:29  Info [       MovieImporter]: Added "Каменный цветок". 
02-Mar-2009 22:27:40  Info [       MovieImporter]: User reprocessing Bratz.Superzvezdy.2008.DivX.DVDRip.avi 
02-Mar-2009 22:27:46  Info [       MovieImporter]: No exact match for Bratz.Superzvezdy.2008.DivX.DVDRip.avi 
02-Mar-2009 22:27:57  Info [       MovieImporter]: User reprocessing Bratz.Superzvezdy.2008.DivX.DVDRip.avi 
02-Mar-2009 22:28:01  Info [       MovieImporter]: No exact match for Bratz.Superzvezdy.2008.DivX.DVDRip.avi 
02-Mar-2009 22:28:08  Info [       MovieImporter]: User approved Bratz.Superzvezdy.2008.DivX.DVDRip.avias Братц: Суперзвезды (видео) 
02-Mar-2009 22:28:08  Info [       MovieImporter]: User approved Bratz.Superzvezdy.2008.DivX.DVDRip.avias Братц: Суперзвезды (видео) 
02-Mar-2009 22:28:08  Info [       MovieImporter]: Retrieving details for "Братц: Суперзвезды (видео)" 
02-Mar-2009 22:28:19 Error [         DBMovieInfo]: Bad URL format, failed loading image:  
02-Mar-2009 22:28:19 Error [         DBMovieInfo]: Failed retrieving cover artwork for 'Братц: Суперзвезды* (видео)' [] from . 
02-Mar-2009 22:28:22 Error [         DBMovieInfo]: Bad URL format, failed loading image: / 
02-Mar-2009 22:28:22 Error [         DBMovieInfo]: Failed retrieving cover artwork for 'Братц: Суперзвезды* (видео)' [] from /. 
02-Mar-2009 22:28:34  Info [         DBMovieInfo]: Added cover art for 'Братц: Суперзвезды* (видео)' [] from http://www.kinopoisk.ru/images/poster/818846.jpg. 
02-Mar-2009 22:28:34 Fatal [       MovieImporter]: Unhandled error in MediaScanner. System.NotSupportedException: Данный формат пути не поддерживается.
   в System.Security.Util.StringExpressionSet.CanonicalizePath(String path, Boolean needFullPath)
   в System.Security.Util.StringExpressionSet.CreateListFromExpressions(String[] str, Boolean needFullPath)
   в System.Security.Permissions.FileIOPermission.AddPathList(FileIOPermissionAccess access, AccessControlActions control, String[] pathListOrig, Boolean checkForDuplicates, Boolean needFullPath, Boolean copyPathList)
   в System.Security.Permissions.FileIOPermission..ctor(FileIOPermissionAccess access, String[] pathList, Boolean checkForDuplicates, Boolean needFullPath)
   в System.IO.FileInfo..ctor(String fileName)
   в MediaPortal.Plugins.MovingPictures.DataProviders.LocalProvider.getBackdropsFromMovieFolder(DBMovieInfo movie)
   в MediaPortal.Plugins.MovingPictures.DataProviders.LocalProvider.GetBackdrop(DBMovieInfo movie)
   в MediaPortal.Plugins.MovingPictures.DataProviders.DataProviderManager.GetBackdrop(DBMovieInfo movie)
   в MediaPortal.Plugins.MovingPictures.LocalMediaManagement.MovieImporter.AssignFileToMovie(IList`1 localMedia, DBMovieInfo movie, Boolean update)
   в MediaPortal.Plugins.MovingPictures.LocalMediaManagement.MovieImporter.AssignAndCommit(MovieMatch match, Boolean update)
   в MediaPortal.Plugins.MovingPictures.LocalMediaManagement.MovieImporter.ProcessNextApprovedMatches()
   в MediaPortal.Plugins.MovingPictures.LocalMediaManagement.MovieImporter.ScanMedia()

fforde · March 2, 2009

It looks like this is not related to your scraper script. It looks like it's trying to load a backdrop from the movie folder and what is loaded is not a valid image file.

mitiok2008 · March 2, 2009

fforde said:
It looks like this is not related to your scraper script. It looks like it's trying to load a backdrop from the movie folder and what is loaded is not a valid image file.

Thanks! I turned off "search movie folders for backdrops..." and everything became ok.

But that is not last question

.

if there are any posibility to combine two arrays returned by "parse" statement. For ex, I would like to get links for cover-art from two different pages. I can run one parse for one page, then for second but if there are any way to combine them? Or (just got the idea!) - can I combine two urls to one??? and then run parse on it.

fforde · March 2, 2009

There is no easy way to do this, but if you keep track of the last index inserted into the cover array, you can start with the next available index from your seperate URL. It is a bit of a bear of a script, but take a look at the impawards.com script, it does something like this.

This section sort of does what you are describing: http://code.google.com/p/moving-pic...DataProviders/ScraperScripts/IMPAwards.xml#40

mitiok2008 · March 3, 2009

fforde said:
There is no easy way to do this, but if you keep track of the last index inserted into the cover array, you can start with the next available index from your seperate URL. It is a bit of a bear of a script, but take a look at the impawards.com script, it does something like this.

This section sort of does what you are describing: IMPAwards.xml - moving-pictures - Google Code

I'm fail

. I tried to use the same logic as in your link, but it doesn't work. Below is my script. First part takes covers from first url (it works!) and second from the second one. Both urls exist and in separate run provide appropriate results.

Code:

		<!-- First check the cover-art page -->
		<retrieve name="cover_page" url="http://www.kinopoisk.ru/level/17/film/${movie.site_id}/adv_type/cover" retries="10" timeout="10000" timeout_increment="4000" />
		<!-- Make sure we are not getting the generic cover-art page. -->
		<parse name="posterverify" input="${cover_page}" regex="${rx_cover_verify}"/>
		<if test="${posterverify[0][0]}!=">
			<!-- Then get cover arts from cover-art pages -->
			<parse name="posterLinks" input="${cover_page}" regex="${rx_covers}"/>
			<loop name='cover_url' on='posterLinks'>
		        <set name='cover_art[${count}].url' value='http://www.kinopoisk.ru/images/poster/${cover_url[0]}.jpg'/>
		    </loop>
		</if>

		
		
		<!-- Second, check the posters page -->
		<retrieve name="cover_page" url="http://www.kinopoisk.ru/level/17/film/${movie.site_id}" retries="10" timeout="10000" timeout_increment="4000" />
		<!-- Make sure we are not getting the generic poster page. -->
		<parse name="posterverify" input="${cover_page}" regex="${rx_cover_verify}"/>
		<if test="${posterverify[0][0]}!=">
			<!-- Then get cover arts from POSTERS pages -->
			<parse name="posterLinks" input="${cover_page}" regex="${rx_covers}"/>
			<loop name='cover_url' on='posterLinks'>
				<add name='next' value1='$count' value2='1' />
				<set name='cover_art[${next}].url' value='http://www.kinopoisk.ru/images/poster/${cover_url[0]}.jpg'/>
			</loop>
		</if>

here is the part of the log-file. The most strange thing here is 16:19:26 string - error parsing numbers. The string is ABSOLUTELY equal to your string.

And one more thing. log node doesn't work for me. Have no idea why. If I try to put log node inside scraper file I can't load scraper - error loading.

Code:

03-Mar-2009 16:18:40  Info [   ScriptableScraper]: Loading scriptable scraper: Kinopoisk.ru (141521) Version 1.1.6 
03-Mar-2009 16:19:04  Info [       MovieImporter]: Failed watching: 'E:\' (NoRootDirectory) - Path is currently offline. 
03-Mar-2009 16:19:21 Error [         DBMovieInfo]: Bad URL format, failed loading image:  
03-Mar-2009 16:19:21 Error [         DBMovieInfo]: Failed retrieving cover artwork for 'Мадагаскар*2' [28] from . 
03-Mar-2009 16:19:23 Error [         DBMovieInfo]: Bad URL format, failed loading image: / 
03-Mar-2009 16:19:23 Error [         DBMovieInfo]: Failed retrieving cover artwork for 'Мадагаскар*2' [28] from /. 
03-Mar-2009 16:19:26 Error [         ScraperNode]: Error parsing numbers: <add name="next" value1="$count" value2="1" /> 
03-Mar-2009 16:19:30  Info [         DBMovieInfo]: Cover art for 'Мадагаскар*2' [28] already exists from http://www.kinopoisk.ru/images/poster/875588.jpg.

fforde · March 3, 2009

Well the problem is you do not have the ${count} variable properly escaped (you need brackets) in your add node. But that doesn't really matter because the way you are doing this will not work. ${count} is a system variable that represents the current iteration of the loop and it persists until you run another loop. So what you need to do is store to a new variable, let's call it ${next_free} that is set to ${count} + 1 after the first loop but before the second loop. Then inside the second loop you need to use the ${next_free} variable as an index into your cover_art array. And be sure to add 1 to the ${next_free} variable each iteration of the second loop.

Hope that makes sense.

mitiok2008 · March 3, 2009

fforde said:
Well the problem is you do not have the ${count} variable properly escaped (you need brackets) in your add node. But that doesn't really matter because the way you are doing this will not work. ${count} is a system variable that represents the current iteration of the loop and it persists until you run another loop. So what you need to do is store to a new variable, let's call it ${next_free} that is set to ${count} + 1 after the first loop but before the second loop. Then inside the second loop you need to use the ${next_free} variable as an index into your cover_art array. And be sure to add 1 to the ${next_free} variable each iteration of the second loop.

Hope that makes sense.

finally, I solve the problem, slightly modifying your proposal. It was necessary to include next_free variable to both loops (see below). I wasn't able to get value from ${count} variable. I put add node in different places, right after the first loop, but any time I tried to use the node <add name="next_free" value1="${count}" value2="1" /> I got Error parsing number into log file. But it's OK for now - solution is working.
I think that scraper need just some polishing with data acruals (there are small errors with directors) and I'll send it to you.

btw, LOG NODE doesn't work for me - it make "Scraper loading error".

and one more question. Any progress with backdrop scraper. I really sure that now I'm able to do it for kinoposik.ru.

.

Thank a lot for your help.

Code:

	<set name='next_free' value='0'/> 
	
	
		<!-- Verify that Kinopoisk was used with site_id -->
	<if test="${movie.site_id}!=">
		<!-- First check the cover-art page -->
		<retrieve name="cover_page" url="http://www.kinopoisk.ru/level/17/film/${movie.site_id}/adv_type/cover" retries="10" timeout="10000" timeout_increment="4000" />
		<!-- Make sure we are not getting the generic cover-art page. -->
		<parse name="posterverify" input="${cover_page}" regex="${rx_cover_verify}"/>
		<if test="${posterverify[0][0]}!=">
			<!-- Then get cover arts from cover-art pages -->
			<parse name="posterLinks" input="${cover_page}" regex="${rx_covers}"/>
			<loop name='cover_url' on='posterLinks'>
		        <set name='cover_art[${count}].url' value='http://www.kinopoisk.ru/images/poster/${cover_url[0]}.jpg'/>
				<add name='next_free' value1='${next_free}' value2='1' />
		    </loop>
		</if>
				
		<!-- Second, check the posters page -->
		<retrieve name="cover_page" url="http://www.kinopoisk.ru/level/17/film/${movie.site_id}" retries="10" timeout="10000" timeout_increment="4000" />
		<!-- Make sure we are not getting the generic poster page. -->
		<parse name="posterverify" input="${cover_page}" regex="${rx_cover_verify}"/>
		<if test="${posterverify[0][0]}!=">
			<!-- Then get cover arts from POSTERS pages -->
			<parse name="posterLinks" input="${cover_page}" regex="${rx_covers}"/>
			<loop name='cover_url' on='posterLinks'>
				<set name='cover_art[${next_free}].url' value='http://www.kinopoisk.ru/images/poster/${cover_url[0]}.jpg'/>
				<add name='next_free' value1='${next_free}' value2='1' />
			</loop>
		</if>
	</if>

LRFalk01 · March 4, 2009

You're a good man mitiok2008. What was wrong with the directors?

-LRFalk01

Scraper Version & Published (1 Viewer)

mitiok2008

Portal Pro

fforde

Community Plugin Dev

mitiok2008

Portal Pro

fforde

Community Plugin Dev

mitiok2008

Portal Pro

fforde

Community Plugin Dev

mitiok2008

Portal Pro

fforde

Community Plugin Dev

mitiok2008

Portal Pro

LRFalk01

Portal Pro

Users who are viewing this thread