IMDb+ Scraper (Force English title, Auto-Rename titles to group, and more) v3.1.7

Discussion in 'Moving Pictures' started by RoChess, February 23, 2011.

?

Should this be the default imdb scraper?

Poll closed March 25, 2011.
  1. Yes, I do not want to re-import

    19 vote(s)
    95.0%
  2. No, keep this one seperate

    0 vote(s)
    0.0%
  3. Who cares, I got movies to watch

    1 vote(s)
    5.0%
  1. RoChess
    • Premium Supporter

    RoChess Extension Developer

    Joined:
    March 10, 2006
    Messages:
    4,172
    Likes Received:
    1,301
    Ratings:
    +1,675 / 2
    NEW Version available at:



    :D

    DO NOT CONTINUE READING IF YOU ARE LOOKING FOR CURRENT IMDb+.
    BUT USE THE LINKS ABOVE TO GET THE PROPER VERSION.

    Show Spoiler

    If you want all your movie titles in English, then this is the scraper for you. It is the first scraper to support custom options, allowing you to change the way this scraper works. Since MovingPictures does not yet have support for scraper options, you will have to edit this scraper in notepad to adjust them, or enjoy the default settings shown later on.

    Configuration changes needed to function exactly the same as the default imdb.com scraper:

    Show Spoiler


    Change Global Options into:​

    Code (Text):
    1.  <set name="global_options_imdb_score" value="true" />[/INDENT][/INDENT][/INDENT]
    2. [INDENT=1][INDENT=1][INDENT=1]<set name="global_options_long_summary" value="true" />

    The rename system to group movie series together by title is also very nice, so check out the benefits to enable it. And if anything else is not self-explanatory or explained in this first post, then please let me know, so I can improve the experience for the next user.​

    PS: If you are dealing with an existing installation then read below on how to adjust the scraper ID to replace the default imdb.com scraper that MovingPictures normally uses.​


    Please 'star' Google issue #319, so that the developers of MovingPictures understand your desire to configure this scraper from within the plugin and not have to resort to editing in notepad.

    On the left is the default imdb.com scraper result and on the right is the IMDb+ result with the rename option enabled:

    [​IMG] » » » » » » [​IMG]

    To install the scraper:

    1. Download the IMDb+ .xml file (attachment at bottom of this post) to a location you remember. I actually put mine in the "C:\ProgramData\Team MediaPortal\MediaPortal\scripts\Moving Pictures\" folder (different location on XP), but it shouldn't matter.
    2. To use the rename system, also download the "Rename dBase IMDb+ Scraper.xml" file containing the default rename entries and store it in the "C:\" root folder.
    3. Open both XML files in notepad if you wish to adjust the default settings to your own preference. For the scraper this means you can edit the "global_options_xxxx" settings, and for the rename dbase you can verify if you agree with the new grouped titles for the movie series included (James Bond, Star Wars, etc).
    4. Open "MediaPortal - Configuration", go to the "plugins" and select "Moving Pictures" and "Config".
    5. Select the "Importer Settings" tab.
    6. In the "Data Sources:" section select the "Manually manage movie data sources" radio button.
    7. Click the "Movie Details Data Sources" button.
    8. In the popup click the arrow just to the right of the "+" button and pick "Add a New Data Source".
    9. Browse to the new .xml scraper file from where you saved it, and click OK.
    10. It should appear as 'IMDb+' in the "Source column". You may need to enable it by pressing the "+" button if it is greyed out, and move it to the position in the list that you prefer priority wise. It is best to place it in the first position to ensure that the auto-approve works.
    Note:This scraper is seperate from the official imdb.com scraper. So adding this scraper will only work for new movies you import. This is because MovingPictures remembers what scraper was used and will continue using that scraper unless you sent the movie back to the importer.

    To solve the problem of having to re-import all your existing movies and loosing your watched status, you can open the scraper XML file in notepad and modify the scraper ID into <id>874902</id>. Just be aware then that you have to report any scraper import issues issues back to this thread and not bother the developers.

    ================================================================================

    Extra information:

    Show Spoiler
    Technical details on scraper:

    • Gets high resolution covers from imdb.com.
    • Solves non-English title problem for users outside of the US.
    • Supports made-for-TV movies, mini-series and straight-to-video movies.
    • Configurable options to change the behaviour of the scraper script, these options have to be adjusted before you import the script, by editing the XML file in your favorite text editor, such as notepad. The options allow you to select between english/english+foreign title, long/short summaries, imdb/rottentomatoes score, rottentomatoes tomatometer/audience score, rottentomatoes percentage/average score, US/UK rating.
    • VITAL: Supports forced English title on foreign movies, for example Chin gei bin (2003) will be imported as 'Vampire Effect'. If you prefer the original title on foreign movies, then simply adjust the global_options_original_title setting inside the XML file. This is different then what non-USA users are experiencing where they see a translated title on an American movie, those movies will now always show English title.
    • Enabling UK ratings uses British-English movie titles.
    • Support for optional "English title (Foreign title)" format. This would make Chin gei bin (2003), be added to your collection as 'Vampire Effect (Chin gei bin)'.
    • NEW: Auto-rename support based on static XML file
    Known issues:

    • If you are bothered by the tiny delay versus the default imdb.com scraper, then please switch to IMDb rating. This removes the extra step to get info from the RottenTomatoes website. You do miss out then on all the extras, such as the much nicer RottenTomatoes rating or the ability of this scraper to use the RottenTomatoes synopsis and/or runtime info when this information is not available on the imdb website.
    • The title found during search node will not match the final title from the details node if a forced-English title conversion is taking place. This is only a nucance inside the "Movie Importer" tab during import of a new movie, and might lead to unexpected results where the "Possible Matches" end result after auto (or manual) approval does not match the "File(s)" input title. A solution for this problem caused severe delays to the import process (takes 3-5x longer). Please voice your opinion as a reply to this thread if you feel the delay is worth it or not.
    • If you experience unexplained delays during import; try again with your anti-virus solution temporary disabled. If you then sent the same movies back to the importer and there is no delay, modify your anti-virus solution to ignore/white-list MediaPortal.



    Default options:

    1. Short summaries (prevents unwanted spoilers)
    2. RottenTomatoes Audience Percentage score (gives a better indication if a movie is good, especially unreleased ones)
    3. Main (English) title only
    4. US Certification rating
    5. No title rename to differentiate special editions
    6. No auto-rename of the title to group movies together
    The RottenTomatoes score needs some extra explanation.

    Show Spoiler
    Take for example Gnomeo and Juliet (2011). The scoring on that one is as follows:


    • All critics = 53% (5.3)
    • All critics average = 5.5/10 (5.5)
    • Top critics = 50% (5.0)
    • Top critics average = 5.7/10 (5.7)
    • Audience = 63% (6.3) (this is the score the IMDb+ scraper uses by default, unless you edit options)
    • Audience average = 3.5/5 (7.0)
    The actual score for MovingPictures is the one between parenthesis, on a 10-scale, which your skin will then view in the scale it uses, usually 5-scale.

    Now on this movie, all the scores are different, but they are relativly close together. This can be an extreme difference on other movies, for example Big Mommas: Like Father, Like Son (2011) has a score of 0.9 from the critics, versus 7.0 from the audience.​

    So please look at some of your favorite movies on RottenTomatoes, and check to see which of the six different scores you agree with most. The reason that audience percentage score is default is mainly based on unreleased movies, such as Thor (2011) which as of this writing has no reviews, but it does have a 9.6 audience rating based on the people that want to see it.​



    ================================================================================

    Changelog (July 4th 2011):

    Show Spoiler

    • v3.0.0 - Public release, the previous revisions were used internal.
    • v3.0.1 - Fixed non-English title problem properly (no IP tricks anymore) as well as summary issue on some movies.
    • v3.0.2 - Forgot to disable non-English title conversion for UK, Canada, Australia and New Zealand as they already show proper English titles (please let me know if I overlooked one).
    • v3.0.3 - Added option for metascore from imdb.com, this requires "global_options_imdb_score" be set to "true" as well.
    • v3.0.4 - Corrected 2-letter language code for Britain into GB, and added Norway (NO) to the blacklist.
    • v3.0.5 - Rewrote the entire English title system. Hopefully this will solve any problems now.
    • v3.0.6 - Fixed a few minor issues and enabled British-English movie titles when UK ratings are enabled.
    • v3.0.7 - Per request of zicoz, added ability to retain the original title on movies created in certain languages. This allows a Dutch user to import "Black Book (2006)" as "Zwartboek" (meaning all Dutch spoken movies will use the Dutch movie title), while all other foreign-language spoken movies get imported with an English title.
    • v3.0.8 - Fixed SortBy method to keep article removal intact. This way "Kinpeibai (1969)" gets imported as "The Concubines", but with a SortBy field of "concubines the" as per the default article removal settings.
    • v3.0.9 - Per request of vpupkin, added "English (Foreign)" title support.
    • v3.1.0 - Fixed a few bugs on rare titles, also added country filter to improve detection of foreign movies. Movie that would fail was for example Arthur and the Revenge of Maltazard (Arthur et la vengeance de Maltazard), which is a French title movie in English language. Filtering on country being USA wasn't enough, because then movies such as The Machine Girl (Kataude mashin gâru) would fail as being released in USA first, but having Japanese language. Unfortunatly this new method adds a small delay, but had to be done to prevent mistakes.
    • v3.1.1 - Fixed rare AKA bug and improved speed in search node, Improved speed in cover node, and RottenTomatoes synopsis is now used when summary is missing from imdb.com with RT ratings enabled (default setting).
    • v3.1.2 - Increased detection of English titles, added method to use RottenTomatoes runtime if missing from imdb.com and included an extra check to see if a new USA title was issued for an English movie. That way a movie like The Tomb (2009) imports correctly as "The Tomb" and not as the original "Ligeia" title.
    • v3.1.3 - Added support for title manipulation to indicate special editions (3D, Unrated, Extended, etc). To make this work, your filenames have to contain this text between brackets as well as the IMDb tt-ID number (or in NFO). Auto-rename support is now included to retain any manual title changes after a refresh or re-import of your collection.
    • v3.1.4 - Per suggestion of 'drealit', it is now possible to adjust the sortby title during the rename process, either by itself or alongside a title rename. You have to edit the rename XML file and add any sortby="..." values to the movies you wish to do this with. Or you can do a mass-replace on 'title=' into 'sortby=', which will leave the movie title intact as used by IMDb site, but will sort them together as a group. This will cause weird results in some cases, for example 'Casino Royale' will be sorted under the 'J' for "James Bond 21". This is why you can also rename both title and sortby title.
    • v3.1.5 - Upon request of 'ninjatobbe' added the option to use "Foreign title first (English title)", and included improvement to get English title on Canadian released movies and some Italian released movies with English language tracks.
    • v3.1.6 - Fixed imdb.com rating, also added support for "(Alternate Ending)" special editions and improved English language title detection.


    • v3.1.7 - Fixed RottenTomatoes rating for users in foreign countries who would get localized RT page with different HTML code, as well as rounded average ratings on some movies where 3.0/5 would show as 3/5 and fail to get any.
    To use the auto-rename system, you have to also download the "Rename dBase IMDb+ Scraper.xml" file and place it in your 'C:\' root folder, as that was the easiest common location to use. You can relocate the file, but then you have to edit the scraper to point to that new location. Please edit the rename dBase to your liking if you do not agree with the default entries, and share any series that I overlooked by replying to this thead.

    Enjoy.



     
    Last edited: June 24, 2012
    • Like Like x 15
  2. Google AdSense Guest Advertisement



    to hide all adverts.
  3. fforde

    fforde Community Plugin Dev

    Joined:
    June 7, 2007
    Messages:
    2,666
    Likes Received:
    1,690
    Occupation:
    Software Engineer
    Location:
    Texas
    Ratings:
    +1,696 / 0
    Home Country:
    United States of America United States of America
    Re: IMDb+ Scraper -- v3.0.0

    For what its worth this script does solve the English title problem but it does so by connecting directly to an IP that currently points to an IMDb server returning English language results (but could change). This is a good solution but could be slightly less reliable as it circumvents the DNS system. For people that are very bothered by this issue and are also unwilling to use themoviedb.org, this script is a good alternative.

    Grabbing high resolution covers is a nice perk and something we will hopefully include in a future version of the official IMDb script. For the time being though, if you chose not to grab this alternate version, the themoviedb.org script provides high quality and high resolution results. This is enabled for all users by default.

    Finally, the IMDb script included in Moving Pictures 1.1.2 and later will properly retrieve made for TV movies, strait to DVD movies as well as miniseries. The lack of retrieval of these items was a temporary bug caused by a html change at imdb.com. It has been corrected in the primary IMDb script and if you are running an up to date version of Moving Pictures, you should not have any problems in this area.
     
  4. RoChess
    • Premium Supporter

    RoChess Extension Developer

    Joined:
    March 10, 2006
    Messages:
    4,172
    Likes Received:
    1,301
    Ratings:
    +1,675 / 2
    Re: IMDb+ Scraper -- v3.0.0

    This has now been (re)fixed in latest version. It was the original method I created to solve the non-English titles for a Dutch friend, but stoopid me grabbed the wrong scraper script when starting out with this one.

    Many users will be fine to keep the default imdb.com scraper, this new scraper was originally intended to only replace the IMDb+RT scraper, as some users like me prefer the RottenTomatoes score. But things grew out of hand when I started to add in the custom options.

    Also awesome work on the improvements made to the plugin starting with version 1.1.3 to auto-approve titles with weird characters in them, such as '&'.
     
    • Like Like x 1
  5. pirivan

    pirivan Portal Pro

    Joined:
    January 19, 2008
    Messages:
    62
    Likes Received:
    2
    Ratings:
    +2 / 0
    Re: IMDb+ Scraper with Custom Options (details+cover) -- v3.0.1

    This is amazing RoChess; I love it, thank you so much!
     
  6. drealit

    drealit Portal Pro

    Joined:
    March 15, 2008
    Messages:
    190
    Likes Received:
    17
    Ratings:
    +17 / 0
    Re: IMDb+ Scraper with Custom Options (details+cover) -- v3.0.1

    Wow RoChess just wow, this is above and beyond what I expected the RT scraper to do. Thank you very much!
     
  7. RoChess
    • Premium Supporter

    RoChess Extension Developer

    Joined:
    March 10, 2006
    Messages:
    4,172
    Likes Received:
    1,301
    Ratings:
    +1,675 / 2
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    :D

    Added v3.0.3 which now also supports the metacritics.com metascore as provided by imdb.com. I'm sure most users of this scraper will prefer the RottenTomatoes scores, but since the data was available I figured why not :D
     
  8. Surferosa

    Surferosa Portal Pro

    Joined:
    September 2, 2009
    Messages:
    55
    Likes Received:
    5
    Ratings:
    +5 / 0
    Home Country:
    England England
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    This looks brilliant RoChess. My server volume is currently degraded- and Im frantcially pulling a 5.6TB backup from all over the place to rebuild it. Once Ive sorted out that mess- this is top of my list.

    Thank you very much for doing this.

    Cheers
     
  9. RoChess
    • Premium Supporter

    RoChess Extension Developer

    Joined:
    March 10, 2006
    Messages:
    4,172
    Likes Received:
    1,301
    Ratings:
    +1,675 / 2
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    I know all about the frantic mess that can cause, so good luck with it.

    Let me know how scraper runs for you when you are ready, because you have a large collection to give it a proper test outside of the US.
     
    • Like Like x 1
  10. JACOB B

    JACOB B Portal Pro

    Joined:
    September 3, 2008
    Messages:
    81
    Likes Received:
    7
    Gender:
    Male
    Ratings:
    +7 / 1
    Home Country:
    Denmark Denmark
    Show System Specs
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    Hi Rochess,
    How can I use danish ratings instead of US or UK - would that be possible?

    Thanks! :D
    Jacob
     
  11. mat123

    mat123 Portal Pro

    Joined:
    February 28, 2009
    Messages:
    102
    Likes Received:
    19
    Gender:
    Male
    Ratings:
    +19 / 0
    Home Country:
    Slovenia Slovenia
    Show System Specs
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    Looks great, Rochess, but I don't get the English movie titles.

    I imported the new movie LET ME IN and got a french title: Laisse-moi entrer

    Also on MONSTERS VS ALIENS I got: Monsters vs Aliens: A Monstrous IMAX 3D Experience.
    It should be without the IMAX 3D Experience part.

    Hopefully you can fix this.
     
Loading...

Users Viewing Thread (Users: 0, Guests: 0)

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice
  • About The Project

    The vision of the MediaPortal project is to create a free open source media centre application, which supports all advanced media centre functions, and is accessible to all Windows users.

    In reaching this goal we are working every day to make sure our software is one of the best.

             

  • Support MediaPortal!

    The team works very hard to make sure the community is running the best HTPC-software. We give away MediaPortal for free but hosting and software is not for us.

    Care to support our work with a few bucks? We'd really appreciate it!