The IMDb+ scraper-script and plugin combination for Moving-Pictures is the first scraper to support custom options, allowing you to change the way this scraper works exactly the way you prefer. The options can be changed from within MediaPortal for full couch control of your movie collection. NEW: There is now multi-language support for Title, Summary, Language, Genres and Certification for; Dutch (moviemeter.nl), German (imdb.de), French (imd.fr), Italian (imdb.it), Spanish (imdb.es), Portugese (imdb.pt), Icelandic (kvikmyndir.is) and Swedish (filmtipset.se). The title can also be forced in English via an additional option for those foreign users who prefer this. The most favorite feature of this scraper is not only to force English titles, but grouping series together. This is best shown visually, so on the left is the default imdb.com scraper result and on the right is the IMDb+ result with the rename option enabled: » » » » » » Important info for foreign users with (comma) as decimal seperator: Spoiler (Move your mouse to the spoiler area to reveal the content) Show Spoiler Hide Spoiler NEW: Scraper version 3.3.0+ has a temporary fix included that does a math node test now to see if things are working correct. If it fails, it will disable any average or scoring system that requires math to be done. You can check the movingpictures.log file for information on what was disabled. The new score stystem calculates an average score based on the IMDb score, Metacritics Metascore, RottenTomatoes All Critics, RottenTomatoes Top Critics and RottenTomatoes Audience score. If you wish to return back to the old method for scores, then enable the 'Single Score' setting via the IMDb+ plugin. The users who have their Windows system configured to use a 'comma' as floating point seperator, are getting errors due to the math part to calculate the average. Please use the 'Single Score' option for the time being on the IMDb or RottenTomatoes average critics scores, as no math is done on those. If you upgrade to MediaPortal v1.2, then you can install the IMDb+ plugin to adjust all the options in real-time. The plugin takes care of installing the IMDb+ scraper script into MovingPictures and will keep it up-to-date as well. For an easy way to get all this goodness, please visit the wiki Install Guide which uses pictures to explain everything as easy as possible. The scraper forces English title on foreign movies. For example the movie Chin gei bin (2003) will be imported as 'Vampire Effect'. - Enable 'Original Title' option via the IMDb+ plugin to import the same movie as "Chin gei bin".- Enable 'Add Foreign Title' option for "Vampire Effect (Chin gei bin)".- Enable 'Foreign Title First' option for "Chin gei bin (Vampire Effect)". Configuration changes needed to function exactly the same as the default imdb.com scraper: Spoiler (Move your mouse to the spoiler area to reveal the content) Show Spoiler Hide Spoiler Use the IMDb+ Plugin to enable the following options: Single Score Long Summary IMDb Score If you enable 'IMDb Score', but not enable the 'Single Score' option then an average between imdb.com score and metacritics score will be used. The rename system to group movie series together by title is also very nice, so check out the benefits to enable it. And if anything else is not self-explanatory or explained in this first post, then please let me know, so I can improve the experience for the next user. If you are dealing with an existing collection and do not want to start over, then read the following: Spoiler (Move your mouse to the spoiler area to reveal the content) Show Spoiler Hide Spoiler The IMDb+ plugin will detect if you have an existing movie collection with movies that got imported via the default IMDb scraper. If it detects that you have those in your collection, you can use the hidden menu on the left side to find the "Force IMDb+..." option. This new option allows you to switch those movies over to the IMDb+ scraper, so that you can enjoy all the new benefits. Please see the wiki install guide for more information. To install the plugin: Open the "MediaPortal Extension Installer" and look for the "IMDb+" plugin under "Known extensions" or download the IMDb+ plugin MPEI package manually. Install the MPEI package Launch MediaPortal and go to the 'plugins' menu, so that you can adjust all the options to your liking. ALERT: If you do not find the IMDb+ plugin in the list, then I'm sorry, but you have then encountered a small bug in MediaPortal. It is being worked on to get it fixed, but in the meantime; simply open MediaPortal configuration and then press 'OK' to properly close the configuration. This should then correctly update the mediaportal.xml file to show the IMDb+ plugin. Please retry the previous step to confirm and verify if all the options are to your liking before proceeding. Then close MediaPortal, and open MovingPictures configuration to setup an import path and import movies, or refresh existing movies via the IMDb+ scraper. Install explanation with pictures can be found @ wiki Install Guide ================================================================================ Extra information: Spoiler (Move your mouse to the spoiler area to reveal the content) Show Spoiler Hide Spoiler Technical details on scraper: Gets high resolution covers from imdb.com. Solves non-English title problem for users outside of the US. Supports made-for-TV movies, mini-series and straight-to-video movies. Configurable options to change the behaviour of the scraper. Use the IMDb+ Plugin to adjust (MediaPortal v1.2). Enabling UK ratings uses British-English movie titles. Auto-rename support based on static XML file for default and custom entries Known issues: If you are bothered by the tiny delay versus the default imdb.com scraper, then please switch to IMDb rating. This removes the extra step to get info from the RottenTomatoes website. You do miss out then on all the extras, such as the much nicer RottenTomatoes rating or the ability of this scraper to use the RottenTomatoes synopsis and/or runtime info when this information is not available on the imdb website. The title found during search node will not match the final title from the details node if a forced-English title conversion is taking place. This is only a nucance inside the "Movie Importer" tab during import of a new movie, and might lead to unexpected results where the "Possible Matches" end result after auto (or manual) approval does not match the "File(s)" input title. A solution for this problem caused severe delays to the import process (takes 3-5x longer). Please voice your opinion as a reply to this thread if you feel the delay is worth it or not. If you experience unexplained delays during import; try again with your anti-virus solution temporary disabled. If you then sent the same movies back to the importer and there is no delay, modify your anti-virus solution to ignore/white-list MediaPortal. ================================================================================ Changelog: Spoiler (Move your mouse to the spoiler area to reveal the content) Show Spoiler Hide Spoiler v3.0.0 - Public release, the previous revisions were used internal. v3.0.1 - Fixed non-English title problem properly (no IP tricks anymore) as well as summary issue on some movies. v3.0.2 - Forgot to disable non-English title conversion for UK, Canada, Australia and New Zealand as they already show proper English titles (please let me know if I overlooked one). v3.0.3 - Added option for metascore from imdb.com, this requires "global_options_imdb_score" be set to "true" as well. v3.0.4 - Corrected 2-letter language code for Britain into GB, and added Norway (NO) to the blacklist. v3.0.5 - Rewrote the entire English title system. Hopefully this will solve any problems now. v3.0.6 - Fixed a few minor issues and enabled British-English movie titles when UK ratings are enabled. v3.0.7 - Per request of zicoz, added ability to retain the original title on movies created in certain languages. This allows a Dutch user to import "Black Book (2006)" as "Zwartboek" (meaning all Dutch spoken movies will use the Dutch movie title), while all other foreign-language spoken movies get imported with an English title. v3.0.8 - Fixed SortBy method to keep article removal intact. This way "Kinpeibai (1969)" gets imported as "The Concubines", but with a SortBy field of "concubines the" as per the default article removal settings. v3.0.9 - Per request of vpupkin, added "English (Foreign)" title support. v3.1.0 - Fixed a few bugs on rare titles, also added country filter to improve detection of foreign movies. Movie that would fail was for example Arthur and the Revenge of Maltazard (Arthur et la vengeance de Maltazard), which is a French title movie in English language. Filtering on country being USA wasn't enough, because then movies such as The Machine Girl (Kataude mashin gâru) would fail as being released in USA first, but having Japanese language. Unfortunatly this new method adds a small delay, but had to be done to prevent mistakes. v3.1.1 - Fixed rare AKA bug and improved speed in search node, Improved speed in cover node, and RottenTomatoes synopsis is now used when summary is missing from imdb.com with RT ratings enabled (default setting). v3.1.2 - Increased detection of English titles, added method to use RottenTomatoes runtime if missing from imdb.com and included an extra check to see if a new USA title was issued for an English movie. That way a movie like The Tomb (2009) imports correctly as "The Tomb" and not as the original "Ligeia" title. v3.1.3 - Added support for title manipulation to indicate special editions (3D, Unrated, Extended, etc). To make this work, your filenames have to contain this text between brackets as well as the IMDb tt-ID number (or in NFO). Auto-rename support is now included to retain any manual title changes after a refresh or re-import of your collection. v3.1.4 - Per suggestion of 'drealit', it is now possible to adjust the sortby title during the rename process, either by itself or alongside a title rename. You have to edit the rename XML file and add any sortby="..." values to the movies you wish to do this with. Or you can do a mass-replace on 'title=' into 'sortby=', which will leave the movie title intact as used by IMDb site, but will sort them together as a group. This will cause weird results in some cases, for example 'Casino Royale' will be sorted under the 'J' for "James Bond 21". This is why you can also rename both title and sortby title. v3.1.5 - Upon request of 'ninjatobbe' added the option to use "Foreign title first (English title)", and included improvement to get English title on Canadian released movies and some Italian released movies with English language tracks. v3.1.6 - Fixed imdb.com rating, also added support for "(Alternate Ending)" special editions and improved English language title detection. v3.1.7 - Fixed RottenTomatoes rating for users in foreign countries who would get localized RT page with different HTML code, as well as rounded average ratings on some movies where 3.0/5 would show as 3/5 and fail to get any. v3.2.0 - Internal version to test new improvements. v3.2.1 - Internal version to test new improvements. v3.2.2 - Improved English title detection. Correctly uses RottenTomatoes runtime info if it is missing from imdb.com details. Average score system for a nice balanced result. Option to have a minimum imdb vote amount of 20 to filter out bad ratings from movie staff voters. Added support for custom rename entries, so that it is easier to upgrade the default one. Option to only refresh empty + some fields that is enabled by default. And finally the global_options can now be adjusted on the fly via XML file in "C:\" root folder, to make it easier to try out different settings and retain your custom profile on a script update. v3.2.3 - Fixed: Adjusted code to compensate for the fact that over at imdb.com they decided that the movie titles needed to be prefixed with an "IMDb - " string. v3.2.4 - Fixed: Language due to change at imdb.com and a rare situation in which some short summaries would get skipped. Also made scraper ready for integration with new plugin for auto-update feature and optimized the regular expression code on all the movie fields. v3.2.5 - Fixed: RottenTomatoes critics scores, and XPATH errors in log files when not all XML files were present in 'C:\'. The special edition system is also modified and will work on imports without IMDb tt-ID as well now. Just remember that the search process itself will get confused by adding more text to the title, so be sure to modify your Advanced Settings -> Noise Filter to compensate. For example: [\(\[\{](?!(?:\d{4}|tt\d{7})).+?[\}\]\)] would filter out anything between brackets, except a 4-digit year or IMDb tt-ID. I'll eleborate more on this after I've ran more tests first. v3.3.0 - Fixed: Added loop limits to avoid delays on bad data. Modified rename system to make it much faster and use less resources. Added: Math node test to avoid errors. New option to limit writers and directors to just one name, to avoid horizontal scrolling in some skins. Simplified logging to make it easier to spot problems without needing scraper-debug mode. Added: Multi-Language support for summary, spoken-language, genres, and certifications (German, French, Icelandic, Italian, Portugese, Spanish, and Swedish) with fallback option to use English summary when non-English one is missing. v3.3.1 - Fixed: Tagline and English title detection. v3.3.2 - Fixed: In preperation for new IMDb+ plugin feature the rename title system is now always used to update a title even with update_all_fields disabled. This is because you will soon be able to mass-update not only your entire collection, but also refresh any movie that has an entry in the rename databases. v3.3.3 - Fixed: New faster Rename system failed on HTML encoded titles with characters such as '&'. v3.3.4 - Fixed: When foreign language is selected, the foreign title is now used. To refresh existing collection, keep in mind to enable the "Refresh all of the fields" setting as well, and disable the "Rename titles...." setting which would otherwise overwrite the foreign title again. v3.3.5 - Fix RottenTomatoes rare case, Fix imdb.com "TV Series" during search node and cleanup titles with single quotes. v3.3.6 - Added Dutch (moviemeter.nl) support, and prepared scraper-script for future updates. v3.3.7 - Improved English title detection system on foreign/asian movies. v3.3.8 - Workaround for MovPic ImdbBuilder not using IMDb tt-ID from filename when 'prefer foldername' setting is enabled and only the filename has the ID we want to use, and improved UK title detection. v3.3.9 - Fixed snafu in UK title detection that messed up some US titles. v4.0.0 - Skipped version due to bug in MovingPictures. v4.0.1 - Added full foreign title support, as well as ability to use English titles with foreign details. The Roman numerals for first movie in a series can now also be removed, as well as using user-reviews when there is no summary. v4.0.2 - Fixed Spanish title problem by adding new blacklisted entries. v4.0.3 - Added 'fake working title' to the blacklist of English title detection. v4.0.4 - Added a bunch more English title detection postfixes to the blacklist. v4.0.5 - Fixed Roman numeral strip system when Part or Volume was involved, and added more English title detection postfixes to the blacklist. v4.1.0 - Skipped due to bug in MovingPictures. v4.1.1 - Added initial support for other English rating systems (AU/CA/NZ) and fixed bad bug in previous code (darn XML encoding). v4.1.2 - Fixed problem with new option when it does not yet exist inside options XML file. v4.1.3 - Fixed problem with XML encoding of '&' char for "Director's Cut" blacklist title detection. v4.1.4 - Added 'alternative spelling' in a double way to the blacklist title detection. v4.1.5 - Allow for 6-digit IMDb tt-IDs in filenames. v4.1.6 - Fix Writers, Directors, Actors, Genre, Year and Language due to changes from imdb.com website. v4.1.7 - Fix English title detection system due to changes from imdb.com website. v4.1.8 - Fixed English title detection system proper this time (crossing fingers), also added in safety checks on other RegExps. v4.1.9 - Fixed Studio due to changes at IMDb website. v4.2.0 - Skipped due to bug in MovingPictures. v4.2.1 - Added custom 'Special Edition' tagging system via ([...]). v4.3.1 - Adjusted the way foreign titles are handled in default rename database so that they respect the options for it. v4.3.2 - Speed improvement on the rename database lookups. v4.3.3 - Adjusted language declaration to 'various'. v4.3.4 - Fixed short summaries issues when it had a direct link to an actor. v4.3.5 - Remove 'See Full Summary' reference introduced by previous fix. v4.3.6 - Fixed 'Genre' problem for Swedish secondary language due to changes at Filmtipset.se website. v4.3.7 - Fixed user review alternative for missing summary not getting HTML tags stripped. v4.3.8 - Blacklisted "(promotional abbreviation)" for English title detection. v4.3.9 - Fixed search node. (and RottenTomatoes improvements to supplement missing info (summary/runtime/etc) v4.4.0 - Weird bug in IMDb new search system were "title+(year)" can fail, added second search with just "title". v4.4.1 - IMDb search uses UTF-8 encoding now v4.4.2 - Fixed Icelandic language by forcing correct encoding v4.4.3 - Adjusted regular expression to match new IMDb results and ensured the no-year search works correct v4.4.4 - Fixed foreign language issues for sources that have unsafe headers v4.4.5 - Fixed Icelandic genres v4.4.6 - Fixed mini-series support v4.4.7 - Fixed Geo-IP search results and added additional logging v4.4.8 - Improved AKA auto approval results at the cost of an additional scrape v4.4.9 - Compensating for Moving-Pictures weird new auto-approval behavior v4.5.1 - IMDb added link tracking for Directors/Writers/Actors, had to fix RegExp to compensate v4.5.2 - Increased timeout for foreign details from default 5s to 10 seconds to mainly compensate for Dutch MovieMeter site v4.5.3 - Fixed a lot of other things that related to the new IMDb tracking system as well v4.5.4 - Small tweaks to avoid HTTP 302 redirects v4.5.5 - Fixed bug introduced by previous tweak and discovered user reviews summary fallback was also broken v4.5.6 - Fixed small bug related to actors, and added temporary workaround for foreign users not getting English titles v4.7.1 - Fixed foreign users not getting English titles for real this time (crossing fingers) v4.7.2 - IMDb broke search node with HTML source changes, all fixed again v4.7.3 - The fun with IMDb changing their HTML code never stops v4.7.4 - IMDb uses different HTML code for actors for European users, so made both methods work v4.7.5 - Temporary disable of DE/ES/IT/FR/PT, as IMDb has disabled their respective localized sites v4.7.6 - Genre translation for DE+FR added and fixed XML bug. ES/IT/PT languages will follow soon v4.7.7 - Genre translation for ES+IT added, only PT language left now v4.7.8 - Fixed Spanish titles getting used as English ones v4.7.9 - *FacePalm* typo on width= when it should be with=, that is what I get for doing HTML all day and then making adjustments to IMDb+ v4.8.1 - Fixed 'Music' genre getting string replaced correct for DE/ES/IT/FR/PT system v4.8.2 - Fixed RegExp character conflict as well as tweaked code on arrays and foreign IMDb summary bug v4.8.3 - The UTF-8 format on the scraper-script got lost somewhere v4.8.4 - Adjusted RegExp to fix long summary (and repair versioning mishap on previous commit) v4.8.5 - Added support for Plot Keywords that will be added to next MovPic release v4.8.6 - Fixed cover node and added extra logging v4.8.7 - Fixed summaries due to new "synopsis" term used by IMDb v4.8.8 - Added support for new Moving-Pictures Collections system v4.8.9 - Forgot to enclose StringList values to stay consistent v4.9.1 - Fixed collections conflict with optional stripping of Roman numerals for 1st movie in a series v4.9.2 - Fixed studios, runtime and score/votes by compensating for new IMDb HTML code v4.9.3 - Optimized collections to merge remakes/reboots/etc v4.9.4 - Added initial support for new Moving-Pictures 'release date' feature v4.9.5 - Fixed bug for custom rename collections and use new 'date' function for release_date v4.9.6 - Double enclosed quotes got me again v4.9.9 - Auto collection system for "IMDb Top 250" v4.9.10 - Taking out :date function, as it is not working correct v4.9.11 - Took out one character too much when taking out :date function v4.9.12 - Fixed regexp to deal with collections for "Kill Bill" and "The Godfather" v4.9.13 - Added support for collections derived from decimal numbered series The auto-rename database is now also updated by the IMDb+ plugin to the latest version available. If you find that I overlooked any big English series for the default rename XML file, then please share so I can add them. For any rename entries that you wish to add for your personal preference, please locate the "Rename dBase IMDb+ Scraper (Custom).xml" file in your MediaPortal user/data folder, inside the "IMDb+" subfolder. A direct link via the start menu will be added by the next MPEI release, but for now locate it manually via the MediaPortal start menu shortcuts to get to your "User Files". Enjoy. Current rename list is as follows: Spoiler (Move your mouse to the spoiler area to reveal the content) Show Spoiler Hide Spoiler Normally the first movie does not need to be renamed, except in the case of article removal movies, this is because the group ordering is done on SortBy title. So "The Matrix" would end up with sortby title of "matrix the", which would order it below the 2nd and 3rd movie in the group. You can now also edit the sortby field seperate from the title if you wish. Note: If you are going to rename/add entries to the custom XML file, please remember to use proper XML format. For example the '&' character needs to be written out as '&'. You can verify that the correct XML syntax was used by copy and pasting your file contents into the tester at: XML DOM - Validate XML The list is now available via the Google Code project SVN @ rename XML database view with only showing the changes. You can use it to quickly see all the new additions and changes to exiting entries. Near the top in the middle you can switch the drop down selection from 'Single-column' to 'Side by side' to see all entries. Near the top at the right side you can adjust the revision as well to look at older versions, but also newer ones incase I forgot to update the link on here As the IMDb+ system has taken up a lot of my time already and will keep consuming a lot of my free time in the foreseeable future, I'm hoping that some of the more fortunate users amongst you are not opposed to a small donation: If you do not have the means and still want to show your appreciation for this scraper, use the 'Thanks' button below New download location @ IMDb+ plugin for MediaPortal's MovingPictures plugin
Re: IMDb+ Scraper (Fix English, Rename+Group, RottenTomatoes, and more) v3.2.5 Just wanted to say i did a clean instal of MovingPictures today and tried out your scraper, It works GREAT. Thanks.
Re: IMDb+ Scraper (Fix English, Rename+Group, RottenTomatoes, and more) v3.2.5 Did you run into any troubles during setup/install? I'm still working on an easy MPEI package for installation, but it is a slow process, so want to make sure the existing manual installation method is as easy to follow as possible.
Re: IMDb+ Scraper (Fix English, Rename+Group, RottenTomatoes, and more) v3.2.5 I thought your instructions were clear and easy to follow. Very nice touch with the grouping movies as in (James Bond, Resident Evil) etc.
Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.0 RoChess Nice work btw, using now so thought i'd stop by and say Regards
Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.4 You're doing awesome RoChess. What started as a simple quick fix for me (English title fixing) has bloomed into an awesome beast of a scraper and necessary add-on for mine and anyone else's MP setup that I touch. Great job and keep up the fantastic and promising work!
Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.6 One little note on new refresh system that is included with IMDb+ plugin v1.3.0.183. When you resume a refresh, you have to manually pick the same refresh method as you used before. Otherwise if you for example pause/abort a refresh on 'all' movies, and then resume by only refreshing the letter 'A', then once those A... movies are refreshed the resume flag is cleared.
Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.7 Just wanted to say thanks for the work on this, I have used your imdb+ since the first one and man has it came along way The whole sorting by series for movies is great this has bothered me for awhile that there wasn't a way to do it automatically
Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.9 Those tasty turkeys have been in the way of doing updates sooner, but I just released rename database update v1.0.7 It is only a part of the mass-update of 350 titles, but since it also includes some new movies that people are bound to add to their collection in the following days, I could not delay an interim update. Well I did not plan to do it right this moment, but was tricked into it If you were waiting to mass-refresh your collection then be patient a little bit longer, but this way any new movie you import, that got added into this revision, will at least get the correct title already. Hopefully later today I can finish testing new IMDb+ plugin, which adds IMDb+ scraper-script installation to the MPEI package, which will make first-time setup easier for new users. It will also correct cover priority to be above default imdb.com scraper (first-time install only, so existing users have to correct it manually and I will fix wiki to explain this) and when you open the IMDb+ plugin and the scraper-script is not in first priority, it will ask you if you would like to correct that. I'm also working on IMDb+ scraper-script enhancements, which require an update to the IMDb+ plugin as well to be able to configure them in the scraper-script options list. So far I got foreign titles working for the supported languages, and added an extra option for those foreign users who want to see localized summary but keep English title. There will also be a new option to disable the Roman Numeral on the first movie in a series and finally you will be able to fix empty summaries by forcing a user-review to be used. So if all goes well I might be able to get it done while one of the biggest shopping days in the world is happening right now