IMDb Scraper with RottenTomatoes rating (check end of thread for final versions) (1 Viewer)

RoChess

Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,345
    1,824
    Country flag
    • Thread starter
    • Moderator
    • #11
    Re: IMDb Scraper with RottenTomatoes rating

    Yes, but there is a bug with filters when you go complex.

    The 'all' works like AND and causes filters to cancel eachoter out if you add more.
    The 'none' works like AND and causes filters to cancel eachoter out if you add more.
    The 'any' should be the one to use, but it crashes on me when I run combinations.

    That reminds me, I need to test that and provide fforde with debug logs.
     

    Surferosa

    Portal Pro
    September 2, 2009
    55
    5
    England England
    Re: IMDb Scraper with RottenTomatoes rating

    So let's take '300' as an example.
    Okay- my bad. 300 was actually a successful lookup I think- it has a rating in MP of 6 so I presumed it had failed. The RT score is 60%, imdb 7.8- so clearly it worked.

    However, from what I can see- Im having 4 distinct issues:

    1. Intermitent RT Scrape Failure
    I can only put this one down to some sort of time-out failure, but approx 1 film in every 5/6 doesnt seem to have updated from my batch scrape. As soon as I manually refresh info from the internet for that film individually- it works. Its difficult to say how wide-spread this is as I have to go and check each movie individually to make sure it has updated correctly (ie I have no way of knowing whether the scrape has worked, or whether I need to manually retry). Couple of thoughts on this issue;

    -would it be possible to overwrite the existing Score field with a Null value before performing the RT Score lookup. That way I could simply look down the list of movies and perform a count of the Nulls. Failing that, if it could overwrite a different field (tagline?) with 'RT Score found = x time' so that I can see it worked, and when (ie on which run).

    -if it is a time-out issue (and thats only a theory mind), would it be possible to try reparsing the page multiple times (ie if no page returned, retry upto 3 times).

    2. 100% scores
    I think the scraper is failing when RT scores 100%. Two examples I have found are aliens and man on wire. If this is the case- an easy fix I would imagine?

    3. N/A Scores
    Some films arent scored yet ($5 a day) and get a RT score of N/A. Dont know what we should do with these. At present, my database simply retined the score of 7 from imdb that the old scraper had returned- but for someone on a new movie import this would simply fail. Im guessing N/A isnt a value that the MP db can hold, so no ideas what to do here...

    4. Bad RT Links?
    Ive found a film away we go that doesnt link correctly when using the RT link with the imdb no. Using the imdb no for this -1176740- returns you to farlanders. Again- not sure you can fix this one, so I may contact RT and see if its an error. Also, difficult to know how many of these there are without knowing which ones have / have not updated sucessfully.

    I guess fixing / working around these things is going to be a prerequsite before going too deep into the reviews bit- however really like where your heading with this. The scrape of the critics summary is brilliant. I had imagined a seperate reviews page (a RT info page?) could be accessed from the movie detail page (rather than replacing any of the existing information)- but to do this would need a skin mod. I use streamedMP, so Ill post something up there and see if there is any interest- from memory he supports skin mods and sometimes incorportates them into future releases- so if we could get someone to write it then maybe......
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,345
    1,824
    Country flag
    • Thread starter
    • Moderator
    • #13
    Re: IMDb Scraper with RottenTomatoes rating

    -if it is a time-out issue (and thats only a theory mind), would it be possible to try reparsing the page multiple times (ie if no page returned, retry upto 3 times).
    The rottentomatoes.com website I've noticed at times isn't as fast as say imdb.com, and the scraper uses default timeout values to abort. I believe default is 5 seconds, so if the Retrieve node takes longer it kills it. Could try adjusting this to a higher value if RT website indeed has trouble keeping up.

    I think the scraper is failing when RT scores 100%. Two examples I have found are aliens and man on wire. If this is the case- an easy fix I would imagine?
    Since you provided examples, I was able to see there is a difference in code that is used, so would have to adapt Regular Expressions to match.

    Some films arent scored yet ($5 a day) and get a RT score of N/A.
    Now that you gave me an example, I can scan for this as well (same problem as with 100%), and simply keep field blank, or use score of 0, what do you think?

    Ive found a film away we go that doesnt link correctly when using the RT link with the imdb no.
    This is totally out of my control and you indeed need to communicate with RT to fix the problem.

    The scrape of the critics summary is brilliant.
    So that works, as in editing the existing summary? I'm pretty sure ltfearme, who is involved with the StreamedMP skin, is too busy working on releasing the new version, so he probably doesn't wanna bother with a custom modification like this right now. I guess you can ask, perhaps it is easy to do.
     

    Surferosa

    Portal Pro
    September 2, 2009
    55
    5
    England England
    Re: IMDb Scraper with RottenTomatoes rating

    The rottentomatoes.com website I've noticed at times isn't as fast as say imdb.com, and the scraper uses default timeout values to abort. I believe default is 5 seconds, so if the Retrieve node takes longer it kills it. Could try adjusting this to a higher value if RT website indeed has trouble keeping up.
    Well, to be honest, I dont mind manually refreshing the films that haven't worked: I just need an easier way of knowing which ones have updated from RT and which ones haven't. That way, we would also have an idea of how widespread the issue was. Could you code some sort of flag in (as per my post)?

    How would I go about changing the default timeout value? Dont mind giving this a go to see if it makes a difference.

    Since you provided examples, I was able to see there is a difference in code that is used, so would have to adapt Regular Expressions to match.

    Now that you gave me an example, I can scan for this as well (same problem as with 100%), and simply keep field blank, or use score of 0, what do you think?
    Not sure I understand here. If the score is N/A I would think keeping it blank is the best. However, if the score is 100- are you saying that cannot go into the score field? If so, Id use a value of 9.9 (assuming that is acceptable).

    So that works, as in editing the existing summary? I'm pretty sure ltfearme, who is involved with the StreamedMP skin, is too busy working on releasing the new version, so he probably doesn't wanna bother with a custom modification like this right now. I guess you can ask, perhaps it is easy to do.
    Your probably right: I was hoping that someone would have an interest in doing a RT-mod for StreamedMP that would call up a seperate RT page from the movie detail page. If I had any technical experatise in this area Id love to have a go- unfortunately I have zilch!

    The summary (should) provide a good synopsis of what the film is about (though personally I dont like somehow you get 10 words on the film, sometimes an entire essay!)- so maybe putting it there doesnt work. I dont think the tagline is used at all- so maybe that is the place for it; but as you say we'd then need a skin mod to display it somewhere.

    What do you think?


    Edit: I have posted a bug in the RT forum on Away We Go.. post
     

    mortstar

    MP Donator
  • Premium Supporter
  • January 30, 2008
    415
    41
    England England
    Re: IMDb Scraper with RottenTomatoes rating

    This script works really well, thanks so much!

    Much better than having every film scored between 7.2 and 7.7!

    :D
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,345
    1,824
    Country flag
    • Thread starter
    • Moderator
    • #16
    Re: IMDb Scraper with RottenTomatoes rating

    Ok, I've fixed the problem with the 100% score on movies, such as Aliens. It was easier for me code wise to ignore the 'N/A' rating exception, so I decided not to make this become a '0' rating, but instead it will remain blank. However I still need to test the scraper myself, and wife is hogging HTPC as usual :mad:

    So be patient a little longer and I'll be able to verify everything works. These changes only affect the 'TomatoMeter rating' scraper, I couldn't find a problem with the average version.

    In the meantime, is there any interest in the suggested modification to add the RottenTomatoes reviews to the IMDb summary? I would maintain this as a seperate scraper, because not everybody will want this, but if nobody really cares for this right now and prefers to wait until the MovingPictures plugin gets build-in support for reviews, then so do I :D
     

    Surferosa

    Portal Pro
    September 2, 2009
    55
    5
    England England
    Re: IMDb Scraper with RottenTomatoes rating

    RoChess- your a star.

    I'd definitely be interested in tagging the critics reviews into the summary (Id prefer that they came after the summary- but your call.) Dont mind waiting, appreciate everything you've done thus far.

    One thing- is there anyway I could null the existing scores before I run this (so that I can tell which ones have updated and which ones haven't?) Either by sql or within the script itself?

    Cheers.
     

    BlackdogZA

    Portal Pro
    April 22, 2007
    76
    1
    United Kingdom United Kingdom
    Re: IMDb Scraper with RottenTomatoes rating

    Rochess, thank you very much. I did a complete refresh with the replacement file and it has refreshed my collection of 400+ movies without issues. Running on latest SVN/Win7/Moving pics latest beta.

    I watched the latest Transformers movie last night and the IMDB score of 6.1 vs Rotten Tomatoes 20% (agreed) illustrate why this is so essential to anyone who wants a meaningful rating of their film collection.
     

    kiwijunglist

    Super Moderator
  • Team MediaPortal
  • June 10, 2008
    6,746
    1,749
    New Zealand
    New Zealand New Zealand
    Country flag
    Re: IMDb Scraper with RottenTomatoes rating

    yea that movie was terrible, even worse than the first one!
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,345
    1,824
    Country flag
    • Thread starter
    • Moderator
    • #20
    Re: IMDb Scraper with RottenTomatoes rating

    I'll try to release the updated RottenTomatoes scrapers soon, and aside from fixing the TomatoMeter 100% problem and a few other bugs, will also include the improvements I made to the main IMDb scraper.

    @Surfosa, yes, you can erase all the existing scores via SQL.

    Code:
    UPDATE movie_info SET score = NULL;
     

    Users Who Are Viewing This Thread (Users: 0, Guests: 1)

    Top Bottom