Suggestion to use XBMC's XML scrapers for HTTP scraping (1 Viewer)

Gamester17

Portal Pro
May 12, 2004
98
3
Sweden
Home Country
Sweden Sweden
FYI
Nicezia said:
Version 4.0 of the ScraperXML Web Scraper Library has been released and it should work with every XBMC scraper now.

https://sourceforge.net/forum/forum.php?forum_id=960514

List of XBMC scrapers:

Music
allmusic - allmusic
discogs - Discogs - Database and Marketplace for Music on Vinyl, CD, Cassette, MP3 and More
israel-music - Israeli & Jewish Music, DVD Movies & Films
lastfm - Last.fm - Listen to free music with internet radio and the largest music catalogue online
yahoomusic - Yahoo! Music - Internet Radio, Music Videos, Artists, Music News, Interviews and Performances

Movies
Excalibur
KinoPoisk - ?????????.ru. ??? ?????? ???????
MyMovies
adultdvdempire
adultfilmdatabase
allocine - Allocine.de wird zu FILMSTARTS.de
amazonuk - Amazon.co.uk: low prices in Electronics, Books, Music, DVDs & more
amazonus - Amazon.com: Online Shopping for Electronics, Apparel, Computers, Books, DVDs & more
asiandb - AsianDB.com: your connection to Asia
culturalia - CulturaliaNet
daum
filmaffinity - FilmAffinity
filmstarts - Startseite | FILMSTARTS.de
filmup - FilmUP.com - Cinema, HomeVideo e TV
filmweb - Filmweb.pl - ?eb pe?en filmów!
imdb - The Internet Movie Database (IMDb)
jadedVideo
moviemaze - MovieMaze - Film, Kino, Trailer, Poster, Wallpaper und News
naver - ??? :: ??? ?? ??, ???
ofdb - http://www.ofdb.de
ptgate - :: Cinema PTGate ::
sratim - ????? - ????? ?????, ??????? ????????, ???????? ????????, ???????, ???? ???.
tmdb - themoviedb.org (TMDb) | The open movie database

TV Shows
tvcom - TV.com - Free Full Episodes & Clips, Show Info and TV Listings Guide
tvdb - Online TV Database - An open directory of television shows for HTPC software
tvrage - TVRage.Com: TV News - TV Shows - TV Forums - TV Listings - Entertainment News!
imdb tv - IMDb | IMDbTV

Music Videos
mtv - New Music Videos, Reality TV Shows, Celebrity News, Top Stories | MTV




Please note there have been radical changes in the way it works:

no need to actually call for it to CreateSearchUrl all of that has been encompassed into the GetSearchResults Function

at the moment the object to send settings to the scrapers is disconnected as i've put a frontend object for each different scraper type now, the parser itself is a seperate entity (with no public access) Now you acces the scraperParser through their class objects (MovieScraper, TvShowScraper, AlbumArtistScraper)

The movie scraper has two calls, GetSearchResults (Which you should supply with the Moviename, and optionally the year for best results), and GetDetails, which only requires you return the selected element of the item you wish to download info for.

The album/artist search are encompassed in the same scraperobject (as the scrapers themselves are set up this way, Each having two functions like the moviescraper object (GetAlbumSearchResults & GetAlbumDetails for Album - GetArtistSearchResults & GetArtistDetails for artist)

TV scrapers are another story, the scrapers themselves being an elegant and very versitile piece of work... There are 4 function calls that need to be used with it. First there's the two calls to get the tvshow overview, (GetSearchResults, and GetDetails - then there are two more calls for episodes, one to update from the site the list of episodes (UpdateEpisodeGuide) and the other gets the actual episode details (GetEpisodeDetails).

There is a console program in the svn that demonstrates the usage of all types, and what input should go into each.

Will Be updating the Windows GUI Test Program as soon as i reconnect the settings objects
 

Gamester17

Portal Pro
May 12, 2004
98
3
Sweden
Home Country
Sweden Sweden
FYI; Nicezia has now also released a Scraper Editor (with GUI) for Windows:
[RELEASE] Scraper Editor - XBMC Community Forum
Nicezia said:
[RELEASE] Scraper Editor

Scraper editor partially finished (finished enough to make scrapers)

Haven't added the tester yet, as i'm having problems with scraperXML's http protocol in c#

Runnning on Linux via Mono (Notice there were a few bugs i had to work out on it namely the bindings, but now it runs error free on linux)
ScraperEditorLinuxviaMono.png



Running on windows (Notice the highlighting of Matches and turning text of match groups to Red)
ScraperEditorOnWindowsExpressionEdi.png


Download


Just a slight note, use ctrl+v to paste text into box below as i haven't included a context menu for that yet, in the expression box i haven't yet finished the context menu, (which will include common regular expression which will be configurable via an XML file)

and currently only the disk image works for saving, (menu items don't work yet)

just thought i'd make a quick release, (since the basic functionality of the editor is finished, and all i really need to add is perks)

the load image file button doesn't work as of yet.

but this will load edit and save any scraper made from scratch or that comes with XBMC. In it is an enhanced version of the Excalibur Scraper (which gets linked actors AND non-linked actors - and instead of guessing at image names actually goes and downloads images from the actor page)

oh and as a reminder !!!!DO NOT!!!!! use XML entities in the editor, put the pure xml in and entities will be managed on compiling the XML file!
PS! Nicezia is now also in the process of porting the ScraperXML to C# so keep an eye on:
SourceForge.net: ScraperXML
and
ScraperXML (Open Source XML Web Scraper VB.NET Library) please help verify my work... - XBMC Community Forum

ScraperXML is a Open Source Web Scraper Library compatible with all XBMC's XML Scrapers :D
 

fforde

Community Plugin Dev
June 7, 2007
2,667
1,702
42
Texas
Home Country
United States of America United States of America
I appreciate your enthusiasm Gamester17, but I am not sure anyone here is going to jump on board this project...
 

lboregard

New Member
August 15, 2010
2
0
fforde, the cornerstone.scraperengine seems to be very cool.

the issue is that the code seems highly coupled to mediaportal and some 3rd party libraries (nlog, for instance).

is it ok if an effort goes underway to decouple the scraper engine from mediaportal ...

I really dig the dictionary approach to input/output.

please let me know your comments.
 

fforde

Community Plugin Dev
June 7, 2007
2,667
1,702
42
Texas
Home Country
United States of America United States of America
The Cornerstone scraping engine has zero references to MediaPortal. It does use the NLog library for logging, but that would be easy enough to change or remove. It is licensed under GPL3 so assuming what you want to use it for is under a compatible licences you should be good. Jump on IRC or send me a PM if you'd like to chat a bit more about it.
 

Users who are viewing this thread

Top Bottom