Suggestion to use XBMC's XML scrapers for HTTP scraping (1 Viewer)

Gamester17

Portal Pro
May 12, 2004
98
3
Sweden
Home Country
Sweden Sweden
FYI
Nicezia said:
Version 4.0 of the ScraperXML Web Scraper Library has been released and it should work with every XBMC scraper now.

https://sourceforge.net/forum/forum.php?forum_id=960514

List of XBMC scrapers:

Music
allmusic - allmusic
discogs - Discogs - Database and Marketplace for Music on Vinyl, CD, Cassette, MP3 and More
israel-music - Israeli & Jewish Music, DVD Movies & Films
lastfm - Last.fm - Listen to free music with internet radio and the largest music catalogue online
yahoomusic - Yahoo! Music - Internet Radio, Music Videos, Artists, Music News, Interviews and Performances

Movies
Excalibur
KinoPoisk - ?????????.ru. ??? ?????? ???????
MyMovies
adultdvdempire
adultfilmdatabase
allocine - Allocine.de wird zu FILMSTARTS.de
amazonuk - Amazon.co.uk: low prices in Electronics, Books, Music, DVDs & more
amazonus - Amazon.com: Online Shopping for Electronics, Apparel, Computers, Books, DVDs & more
asiandb - AsianDB.com: your connection to Asia
culturalia - CulturaliaNet
daum
filmaffinity - FilmAffinity
filmstarts - Startseite | FILMSTARTS.de
filmup - FilmUP.com - Cinema, HomeVideo e TV
filmweb - Filmweb.pl - ?eb pe?en filmów!
imdb - The Internet Movie Database (IMDb)
jadedVideo
moviemaze - MovieMaze - Film, Kino, Trailer, Poster, Wallpaper und News
naver - ??? :: ??? ?? ??, ???
ofdb - http://www.ofdb.de
ptgate - :: Cinema PTGate ::
sratim - ????? - ????? ?????, ??????? ????????, ???????? ????????, ???????, ???? ???.
tmdb - themoviedb.org (TMDb) | The open movie database

TV Shows
tvcom - TV.com - Free Full Episodes & Clips, Show Info and TV Listings Guide
tvdb - Online TV Database - An open directory of television shows for HTPC software
tvrage - TVRage.Com: TV News - TV Shows - TV Forums - TV Listings - Entertainment News!
imdb tv - IMDb | IMDbTV

Music Videos
mtv - New Music Videos, Reality TV Shows, Celebrity News, Top Stories | MTV




Please note there have been radical changes in the way it works:

no need to actually call for it to CreateSearchUrl all of that has been encompassed into the GetSearchResults Function

at the moment the object to send settings to the scrapers is disconnected as i've put a frontend object for each different scraper type now, the parser itself is a seperate entity (with no public access) Now you acces the scraperParser through their class objects (MovieScraper, TvShowScraper, AlbumArtistScraper)

The movie scraper has two calls, GetSearchResults (Which you should supply with the Moviename, and optionally the year for best results), and GetDetails, which only requires you return the selected element of the item you wish to download info for.

The album/artist search are encompassed in the same scraperobject (as the scrapers themselves are set up this way, Each having two functions like the moviescraper object (GetAlbumSearchResults & GetAlbumDetails for Album - GetArtistSearchResults & GetArtistDetails for artist)

TV scrapers are another story, the scrapers themselves being an elegant and very versitile piece of work... There are 4 function calls that need to be used with it. First there's the two calls to get the tvshow overview, (GetSearchResults, and GetDetails - then there are two more calls for episodes, one to update from the site the list of episodes (UpdateEpisodeGuide) and the other gets the actual episode details (GetEpisodeDetails).

There is a console program in the svn that demonstrates the usage of all types, and what input should go into each.

Will Be updating the Windows GUI Test Program as soon as i reconnect the settings objects
 

Gamester17

Portal Pro
May 12, 2004
98
3
Sweden
Home Country
Sweden Sweden
FYI; Nicezia has now also released a Scraper Editor (with GUI) for Windows:
[RELEASE] Scraper Editor - XBMC Community Forum
Nicezia said:
[RELEASE] Scraper Editor

Scraper editor partially finished (finished enough to make scrapers)

Haven't added the tester yet, as i'm having problems with scraperXML's http protocol in c#

Runnning on Linux via Mono (Notice there were a few bugs i had to work out on it namely the bindings, but now it runs error free on linux)
ScraperEditorLinuxviaMono.png



Running on windows (Notice the highlighting of Matches and turning text of match groups to Red)
ScraperEditorOnWindowsExpressionEdi.png


Download


Just a slight note, use ctrl+v to paste text into box below as i haven't included a context menu for that yet, in the expression box i haven't yet finished the context menu, (which will include common regular expression which will be configurable via an XML file)

and currently only the disk image works for saving, (menu items don't work yet)

just thought i'd make a quick release, (since the basic functionality of the editor is finished, and all i really need to add is perks)

the load image file button doesn't work as of yet.

but this will load edit and save any scraper made from scratch or that comes with XBMC. In it is an enhanced version of the Excalibur Scraper (which gets linked actors AND non-linked actors - and instead of guessing at image names actually goes and downloads images from the actor page)

oh and as a reminder !!!!DO NOT!!!!! use XML entities in the editor, put the pure xml in and entities will be managed on compiling the XML file!
PS! Nicezia is now also in the process of porting the ScraperXML to C# so keep an eye on:
SourceForge.net: ScraperXML
and
ScraperXML (Open Source XML Web Scraper VB.NET Library) please help verify my work... - XBMC Community Forum

ScraperXML is a Open Source Web Scraper Library compatible with all XBMC's XML Scrapers :D
 

fforde

Community Plugin Dev
June 7, 2007
2,666
1,699
40
Texas
Home Country
United States of America United States of America
I appreciate your enthusiasm Gamester17, but I am not sure anyone here is going to jump on board this project...
 

lboregard

New Member
August 15, 2010
2
0
fforde, the cornerstone.scraperengine seems to be very cool.

the issue is that the code seems highly coupled to mediaportal and some 3rd party libraries (nlog, for instance).

is it ok if an effort goes underway to decouple the scraper engine from mediaportal ...

I really dig the dictionary approach to input/output.

please let me know your comments.
 

fforde

Community Plugin Dev
June 7, 2007
2,666
1,699
40
Texas
Home Country
United States of America United States of America
The Cornerstone scraping engine has zero references to MediaPortal. It does use the NLog library for logging, but that would be easy enough to change or remove. It is licensed under GPL3 so assuming what you want to use it for is under a compatible licences you should be good. Jump on IRC or send me a PM if you'd like to chat a bit more about it.
 

Users who are viewing this thread

Similar threads

Come on - not interested in another challenge?! ;)
Come on - not interested in another challenge?! ;)
The backslash is in the temp path now and the new album thumbnail resolution setting seems to be working. Thank you! (y) Sorry to...
Replies
106
Views
3K
It's very strange, but you can see that it happens. The logic of NSIS is a complete mystery. :)
It's very strange, but you can see that it happens. The logic of NSIS is a complete mystery. :)
We have just released MediaPortal 1.28 Summer Breeze Highlights of this release Bugfixes: [MP1-5053] - TV framerate is not...
Replies
12
Views
1K
Just for your information: couldn't find the ns*.tmp file...
Just for your information: couldn't find the ns*.tmp file...
Pre Releases are provided as a way for the community to test and give feedback on all the exciting things we have lined up for the...
Replies
39
Views
2K
We have just released MediaPortal 1.27 Highlights of this release Bugfixes: [MP1-5050] - GUIHome should not be copied next to the mediaportal executable [MP1-5051] - Music Playback fails with ReplayGain enabled and no ReplayGain Information in song [MP1-5053] - TV framerate is not parsed from tsbuffer Since Pre-Release ...
We have just released MediaPortal 1.27 Highlights of this release Bugfixes: [MP1-5050] - GUIHome should not be copied next to...
We have just released MediaPortal 1.27 Highlights of this release Bugfixes: [MP1-5050] - GUIHome should not be copied next to...
Replies
0
Views
2K
Hallo, Since the update to MP1 27 pre, the refresh rate display no longer works correctly. The font is too small and illegible. The display can be reached with "Shift, 1". Andreas
Hallo, Since the update to MP1 27 pre, the refresh rate display no longer works correctly. The font is too small and illegible. The...
Pre Releases are provided as a way for the community to test and give feedback on all the exciting things we have lined up for the...
Replies
2
Views
995
Top Bottom