IMDb+ Scraper (Force English title, Auto-Rename titles to group, and more) v3.1.7 | Page 2

Discussion in 'Moving Pictures' started by RoChess, February 23, 2011.

?

Should this be the default imdb scraper?

Poll closed March 25, 2011.
  1. Yes, I do not want to re-import

    19 vote(s)
    95.0%
  2. No, keep this one seperate

    0 vote(s)
    0.0%
  3. Who cares, I got movies to watch

    1 vote(s)
    5.0%
  1. damaster

    damaster Portal Pro

    Joined:
    November 23, 2007
    Messages:
    412
    Likes Received:
    35
    Ratings:
    +35 / 0
    Home Country:
    Canada Canada
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    Would love to try this scraper but the note above really throws me off. I have to re-scan all of my existing imported movies? That means I'll lose any custom changes, which would really suck.



    Is there a better way to integrate this scraper into existing movies? A DB hack perhaps? :)
     
  2. Google AdSense Guest Advertisement



    to hide all adverts.
  3. pirivan

    pirivan Portal Pro

    Joined:
    January 19, 2008
    Messages:
    62
    Likes Received:
    2
    Ratings:
    +2 / 0
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    Actually if I understand correctly, you don't have to re-scan all of your existing imported movies IF you modify the scraper ID so that his scraper becomes the new imdb.com scraper for existing movies... Or at least that is how I understand it. However, I didn't like the sounds of doing that so I did do a full re-import. The pain in the ass was again fixing all the titles that didn't scan/match properly (and I am sure there are a number that I will notice are matched wrong as I am using MediaPortal that I missed) and then re-setting all the 'watched' flags. I could see that this could be much more of a challenge for someone who had a lot of customizations beyond simply watched or not!
     
  4. DMember 49125

    DMember 49125 Guest

    Ratings:
    +0 / 0
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    I got french & russian (!) titles too.
     
  5. RoChess
    • Premium Supporter

    RoChess Extension Developer

    Joined:
    March 10, 2006
    Messages:
    4,182
    Likes Received:
    1,304
    Ratings:
    +1,683 / 2
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    There are a few dedicated Danish scrapers based on other website sources that are build and supported by Danish users who know the language. Reason being is that imdb.com is very indifference on what ratings they support. The English speaking markets are well covered, but it gets real iffy on the non-English ones. It would probably be easier for the Danish scraper creator to build an English version of their scraper with Danish ratings then for me to do it the other way around.

    If however the Danish rating is always a direct conversion of the US or UK rating, then I can add a conversion. For example if a US rated movie with PG always means 'A' for Denmark, and you would be fine with that conversion, then it will be indeed easy for me to add it. But prune American ratings don't always match the rest of the world. So let me know.

    On both movies can you please goto the imdb webpage and copy and paste the HTML sources that you get onto paste2.org website. For example I get Paste2: Next Generation Pastebin - Viewing Paste 1268833 when I do that from USA, but you are getting a different result. The key in this case is when you scroll down to line #49, you will see "US". The scraper looks for this and verifies if it is US, UK, CA, AU and NZ as these are all English speaking countries and it will then use the title as-is (as it will be the correct English title). If not, it will jump to the "Also known as: ..." title, which then becomes the "Laisse-moi entrer" title on an already English title page.

    Unfortunatly I am limited to what imdb.com gives me back, so you will have to help me a little to fix this problem :)

    I'll try to come up with a better explanation then, in short, open the scraper XML file in notepad, edit the <id>...</id> part to match the official imdb.com scraper, import the scraper and you will not have to re-import any existing movie. This scraper will then as far as MovingPictures knows become the new imdb.com scraper. However I had to keep this one seperate, because it does a lot more then the default scraper and not everybody might want that (hence the poll). For example the default RottenTomatoes score is a big one.

    So for a proper database you would have to refresh all your movies, so that they can all get this RT score (unless of course you change all the global_options). So on purpose I did not release a version with the same ID as the official imdb.com scraper, because I want the users of this one to look at all the options and be comfortable to edit the scraper settings in notepad.

    However what I will do in the next version is make it easier to switch to the official imdb.com ID by adding it in the header under the comments. That way you can copy and paste from within the scraper source to adjust.

    It would really help me (and others) if you could provide me with a list of filenames that didn't scan/match. Perhaps it will be possible for me to improve the scraper, so that this will go better next time. And I run this scraper myself with the changed ID because I did not wanted to loose my watched status and all the adjustments to titles I had already done. As I explained to damaster the next version will make it a little easier to switch. The only problem then is that you have to make those changes everytime a new version comes out (unless you prefer other global_options then the default settings). But I will adjust the explanation on first post as well then when I release next version to fix the problems reported by the other users. -- let me know if my new explanation is less confusing :D

    Hi Gix, can you please read above to my reply to 'mat123', the same goes for you, I am sure imdb.com has more country codes in English that I need to add, but I will need your help to find them. So please paste2.org me the HTML source on a movie that failed, so I can fix it.
     
  6. DMember 49125

    DMember 49125 Guest

    Ratings:
    +0 / 0
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    Paste2: Next Generation Pastebin - Viewing Paste 1269399

    Thank you for fixing that. It is very annoying.:D
     
  7. pirivan

    pirivan Portal Pro

    Joined:
    January 19, 2008
    Messages:
    62
    Likes Received:
    2
    Ratings:
    +2 / 0
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    RoChess

    I apologize if my previous post came off the wrong way; I didn't mean it was really a problem with the scraper. Every scraper I have ever used has some issues matching titles. I have always felt that this was just the way that it is given that how I have named files may or may not always line up with how they are named in the database they are being matched against (whether that be IMDB, RT, thetvdb etc etc). Not some inherent problem at all necessarily with what you setup. Anyhow, I wish I had written down some of the files I had to 'tweak' names on to get them to match but I did not. I COULD re-send it all to the scanner to find out but I would rather not :). Anyhow, I do recall that it was a bit frustrating that all of my anime films I have titled in English and when they are matched against the DB the "match" is their Asian language title. So, I look at it and not knowing any Japanese etc I either have to A) Look up what the title is in Japanese online and determine that it is indeed a match or B) Just guess that it probably matched correctly (which is what I did). I don't know if this is a huger issue but I thought I would mention it.

    So far the only real oddity I have noticed is that some movies just don't appear to be getting the correct rating for some reason. Two specific examples I have found so far are:

    Big Fan Movie Reviews, Pictures - Rotten Tomatoes
    Private Parts Movie Reviews, Pictures - Rotten Tomatoes

    In the scraper big fan gets returned as having a rating of 6.8 when it should be 8.8 and Private parts gets returned as 6.0 when it should be 7.9. I have set both movies back to the importer or tried refreshing from IMDB+ but I get the same results.

    There are probably other examples of this in my collection but I have yet to notice yet and it is a bit laborious to search for every movie on Rottentomatoes and compare the tomatometer rating with what MP shows :).

    Anyhow, I hope the information is helpful; thanks again for the great scraper work!
     
  8. zicoz
    • Premium Supporter

    zicoz MP Donator

    Joined:
    September 3, 2006
    Messages:
    896
    Likes Received:
    53
    Ratings:
    +60 / 0
    Home Country:
    Norway Norway
    Show System Specs
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    Did they move around their servers or something? For some reason True Grit is imported as Cent dollars pour un shérif.
     
  9. Matt Kirby

    Matt Kirby Portal Member

    Joined:
    June 14, 2009
    Messages:
    43
    Likes Received:
    8
    Ratings:
    +8 / 0
    Home Country:
    United Kingdom United Kingdom
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    I'm in the UK and I've also had issues getting the wrong title for films, and I've come up with a work-around / fix.

    I followed Rochess's notes on looking at the HTML source from IMDB, and I was seeing "GB" for the country-code. As RoChess said, this scraper checks to find one of US|UK|CA|AU|NZ, if it does find one of these codes it uses the main title, if not it uses the "Also known as" title.

    If anyone (in the UK) wants to manually fix this themselves, it's a fairly simple fix:
    Open the .xml file for this scraper
    Search for "US|UK|CA|AU|NZ", and replace with "US|UK|GB|CA|AU|NZ" (not sure if UK is even needed or used, my guess is that it doesn't hurt to leave it in!)
    Save
    You will then need to remove this scrapper from MovingPictures and re-add it I think (not too sure on this one!)
    Then, refresh movie details from internet for each of your incorrect films

    Now for some guess-work. For people who are not in US|UK|CA|AU|NZ who want to use the main IMDB title using this scraper, you might be able to get it to work by doing the following, but a lot of this is guess-work, and I have no idea if this will cause other issues, so do so at your own risk:
    Search for a film on IMDB, and then view the HTML source of the film's page. On line 49 you should (hopefully) find the country code that IMDB is using for your country. It should be a 2 letter code, surrounded by quote marks, and should have "title" in the line above. I am making several assumptions here- the country code might not always be on line 49, and I have no idea how IMDB's region detection works for their website!
    Assuming that you've found your country-code (and you want to use the main IMDB title rather than the "Also known as" title), edit the .xml file (as above) to enter your country code to the filter list.

    Once again, this is based on guess-work and several assumptions, so YMMV.

    RoChess: thanks for all your work on this, this scraper is great! I've always wanted my films collection to show the UK cinema rating, which this scraper does for me excellently. Could you add "GB" to the official version of this scraper, as "GB" is the correct country code for the UK and seems to be what IMDB is using. I don't think that UK would even be needed in your scraper. Many thanks.
     
  10. RoChess
    • Premium Supporter

    RoChess Extension Developer

    Joined:
    March 10, 2006
    Messages:
    4,182
    Likes Received:
    1,304
    Ratings:
    +1,683 / 2
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    No problem, it didn't came off wrong, I just thought I needed to eleborate better. As for the asian titles matching with their original name, I might be able to correct that. See the English title only gets corrected inside the DETAILS node, but the matching occurs in the SEARCH node, and the latter relies on akas.imdb.com which shows more original titles (the Asian title in this case). It is indeed still the correct movie based on the English title in the AKA results, but it is indeed tough to find out if it is the right one. The problem will be complicated as some users will have the Asian title in their filename and then it would fail if the English one is selected.

    But I got an extensive Asian collection myself, so at least I can test this myself. Unfortunatly only to some degree with the geographic location based title translation being a major pain in the :D

    Ok, it looks like my assumptions on how IMDb handles the title in other countries was wrong, my only reference has been based on US (America), NL (The Netherlands) and AU (Australia). I figured I add in New Zealand (NZ), United Kingdom (UK) and Canada (CA) for good measures, but thanks to Matt it already is clear now that IMDB uses GB for Great Britain instead.

    The problem now is that a country like Greece with code GR is not using a translated title, which is totally throwing off the way my script works. So to avoid problems I need to switch from a blacklist to a whitelist method, meaning that I need to find out all the 2-letter country codes where IMDB uses locale titles. This is most likely a long list, and I know already it will include DE, ES, IT, etc. but it will take time to get help from other MovingPictures users to build this list.

    So for the time being I will keep using the blacklist method and replace UK for GB, and add GR (Greece) and NO (Norway) code as well.
     
  11. RoChess
    • Premium Supporter

    RoChess Extension Developer

    Joined:
    March 10, 2006
    Messages:
    4,182
    Likes Received:
    1,304
    Ratings:
    +1,683 / 2
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    Ok this is frustrating.

    IMDb does translate the title into Greece (lol, I hope it is Greece).

    What my script sees on your HTML code is:

    Main Title = Oi epomenes treis meres
    Original Title = The Next Three Days
    Also known as = Les trois prochains jours

    My script doesn't know what language is actually used, so it has to make big assumptions. In this case when the country code is *NOT* an English speaking country it will assume that the main title is translated in local language, so it takes the original title.

    Now when I test my script with Expresso on your HTML code, the result I get is "The Next Three Days", which is the correct English title. And you are saying you get the AKA French one?
     
Loading...

Users Viewing Thread (Users: 0, Guests: 0)

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice
  • About The Project

    The vision of the MediaPortal project is to create a free open source media centre application, which supports all advanced media centre functions, and is accessible to all Windows users.

    In reaching this goal we are working every day to make sure our software is one of the best.

             

  • Support MediaPortal!

    The team works very hard to make sure the community is running the best HTPC-software. We give away MediaPortal for free but hosting and software is not for us.

    Care to support our work with a few bucks? We'd really appreciate it!