Sports webpage scraper (extension) (1 Viewer)

zag2me

Portal Pro
April 11, 2006
216
68
Home Country
England England
Evening all,

No plugin yet, but I have written a small console app that reports the premiership football scores or Goals from the BBC vidiprinter website here:

http://newsimg.bbc.co.uk/sport1/hi/football/live_videprinter/default.stm

bbcvidiprinter.jpg


The idea is to write a plugin around this console app that works with any sports websites that report scores. I got tired of plugins breaking when providers change their sites so im hoping to make this app configurable by the user and by keeping the grabber seperate it wont break the plugin each time something changes either.

At the moment, the idea is to create a xml file with the results which can easily be read by a media portal plugin.

App 0.3
Source 0.3


Has anyone else got any sports requests or web pages that are scrapable, I need to look at them before designing the configuration file.
 

fdask

Portal Member
September 27, 2006
10
0
50
Home Country
Canada Canada
At the moment, the idea is to create a xml file with the results which can easily be read by a media portal plugin.

As brought up in the thread hobbes' linked too, why not turn the data into an RSS feed which the existing News reader can handle already?

I like the idea of this though. Any chance of you sharing your code so I can try my hand at scraping some sites?
 

zag2me

Portal Pro
April 11, 2006
216
68
Home Country
England England
Just a small update, my little app now has a config file where you can give it a search strings to look for and also change the webpage that is scrapped.

bbcvidiprinter2.jpg


If anyone has any other sports results html pages post them and I will test the app out with them. Theoretically it should be able to scrape any page that has results on seperate lines. It also removes any html coding automatically.

EDIT: I tried it on http://www.sportinglife.com/results_update/home.html and it worked great :) You could get rugby, motorcycling, athletics, snooker, tennis, speedway, golf, and strangly enough bowls! from there.
 

zag2me

Portal Pro
April 11, 2006
216
68
Home Country
England England
No unfortuantly it wont work on that website because it has the score on a different line to the team names in the html source. As soon as this happens you need to write extra code to parse the information from the specific website which is not really what this app is aimed at.

Code:
      <b><a href="/nfl/teams/gnb">Green Bay</a></b>
      1-2 (Road: 1-0)

I've updated the first post with the executable, so feel free to try it on some pages.
 

hobbes487

Portal Pro
August 25, 2005
208
3
Home Country
United States of America United States of America

zag2me

Portal Pro
April 11, 2006
216
68
Home Country
England England
I will uploaded a new version on the first page with RSS support.

This is how the RSS feed looks in firefox:

sports_scores_rss.jpg


Im assuming the feed will work with media-portal fine.
 

trosty

MP Donator
  • Premium Supporter
  • October 6, 2004
    160
    0
    Zurich/Switzerland
    Home Country
    Switzerland Switzerland

    zag2me

    Portal Pro
    April 11, 2006
    216
    68
    Home Country
    England England
    The app will need some more work to make those scores parse as it uses tables to seperate the scores rather than putting them on a seperate line like the bbc website. I will see what I can do though. Im sure I could get it to show all the results.

    Code:
    <tr   align=left  class=bg4><td><b>GAME</b></td><td><b></b></td></tr><tr  height=17  class=bg2 align=right valign=middle><td  align=left>Miami 17-28 Pittsburgh</td><td  align=left></td></tr><tr align=middle class=bg1><td class=bg1 colspan=9><font class=bg1font><b>Sunday, Sep. 10</b></font></td></tr>
    <tr   align=left  class=bg4><td><b>GAME</b></td><td><b></b></td></tr><tr  height=17  class=bg2 align=right valign=middle><td  align=left>Atlanta 20-6 Carolina</td><td  align=left></td></tr><tr  height=17  class=bg2 align=right valign=middle><td  align=left>Baltimore 27-0 Tampa Bay</td><td  align=left></td></tr><tr  height=17  class=bg2 align=right valign=middle><td  align=left>Buffalo 17-19 New England</td><td  align=left></td></tr><tr  height=17  class=bg2 align=right valign=middle><td  align=left>Cincinnati 23-10 Kansas City</td><td  align=left></td></tr><tr  height=17  class=bg2 align=right valign=middle><td  align=left>Denver 10-18 St. Louis</td><td  align=left></td></tr><tr  height=17  class=bg2 align=right valign=middle><td  align=left>New Orleans 19-14 Cleveland</td><td  align=left></td></tr><tr  height=17  class=bg2 align=right valign=middle><td  align=left>N.Y. Jets 23-16 Tennessee</td><td  align=left></td></tr><tr  height=17  class=bg2 align=right valign=middle><td  align=left>Philadelphia 24-10 Houston</td><td  align=left></td></tr><tr  height=17  class=bg2 align=right valign=middle><td  align=left>Seattle 9-6 Detroit</td><td  align=left></td></tr><tr  height=17  class=bg2 align=right valign=middle><td  align=left>Chicago 26-0 Green Bay</td><td  align=left></td></tr><tr  height=17  class=bg2 align=right valign=middle><td  align=left>Dallas 17-24 Jacksonville</td><td  align=left></td></tr><tr  height=17  class=bg2 align=right valign=middle><td  align=left>San Francisco 27-34 Arizona</td><td  align=left></td></tr><tr  height=17  class=bg2 align=right valign=middle><td  align=left>Indianapolis 26-21 N.Y. Giants</td><td  align=left></td></tr><tr align=middle class=bg1><td class=bg1 colspan=9><font class=bg1font><b>Monday, Sep. 11</b></font></td></tr>
    <tr   align=left  class=bg4><td><b>GAME</b></td><td><b></b></td></tr><tr  height=17  class=bg2 align=right valign=middle><td  align=left>Minnesota 19-16 Washington</td><td  align=left></td></tr><tr  height=17  class=bg2 align=right valign=middle><td  align=left>San Diego 27-0 Oakland</td><td  align=left></td></tr></table></div>


    Its just the way they set the website up, and the content management system they use to update it. There must be some score listings on the web somewhere that use simpler HTML.
     

    Users who are viewing this thread

    Top Bottom