IMDb+ Scraper (Force English title, Auto-Rename titles to group, and more) v3.1.7 (1 Viewer)

Should this be the default imdb scraper?

  • Yes, I do not want to re-import

    Votes: 19 95.0%
  • No, keep this one seperate

    Votes: 0 0.0%
  • Who cares, I got movies to watch

    Votes: 1 5.0%

  • Total voters
    20
  • Poll closed .

vpupkin

Portal Pro
March 26, 2011
84
8
Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

Great job on the scraper RoChess, very useful for importing my Asian movie collection. I can see that it is possible to import as foreign name, as English name, but is it possible to have a third form? I normally import non-English titles as 'EnglishTitle (Foreign Title)', i.e. 'The Legend 2 (Fong Sai Yuk juk jaap)'. Is it possible to implement this?

Don't know if this is of value to the wider audience; I am fine doing the changes in my local scraper copy if you can point out what needs to be changed :)

Thanks! :D
 

RoChess

Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    • Thread starter
    • Moderator
    • #52
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    Great job on the scraper RoChess, very useful for importing my Asian movie collection. I can see that it is possible to import as foreign name, as English name, but is it possible to have a third form? I normally import non-English titles as 'EnglishTitle (Foreign Title)', i.e. 'The Legend 2 (Fong Sai Yuk juk jaap)'. Is it possible to implement this?

    Don't know if this is of value to the wider audience; I am fine doing the changes in my local scraper copy if you can point out what needs to be changed :)

    Thanks! :D

    Nope, no trouble, I can actually see benefits myself, so I'll add another option for it. I'm trying to avoid adding options that will slow down the scraper, but in this case it shouldn't add any significant delay.

    But you have to have a few days patience with me, I just lost my MP-TVSeries database today due to corruption in the 'online_videos' table, and I have to focus on restoring that one first due to WAF :mad:
     

    vpupkin

    Portal Pro
    March 26, 2011
    84
    8
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    But you have to have a few days patience with me, I just lost my MP-TVSeries database today due to corruption in the 'online_videos' table, and I have to focus on restoring that one first due to WAF :mad:

    No problem, can totally understand what do you mean in terms of priorities ;)
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    • Thread starter
    • Moderator
    • #54
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    Great job on the scraper RoChess, very useful for importing my Asian movie collection. I can see that it is possible to import as foreign name, as English name, but is it possible to have a third form? I normally import non-English titles as 'EnglishTitle (Foreign Title)', i.e. 'The Legend 2 (Fong Sai Yuk juk jaap)'. Is it possible to implement this?

    Don't know if this is of value to the wider audience; I am fine doing the changes in my local scraper copy if you can point out what needs to be changed :)

    Thanks! :D

    All done, please download v3.0.9 from first post.

    In your case, edit the scraper *BEDORE* you import and use the following:

    <set name="global_options_foreign_title" value="true" />​

    setting to enable it.

    NOTE: The UK users need to be aware of a side effect if they wish to use this setting as well. And that is on movies such as Harry Potter I, the title result will become "Harry Potter and the Philosopher's Stone (Harry Potter and the Sorcerer's Stone)". So if there are UK users who have a large foreign-title collection and would like to use the "English (Foreign)" title format without the mistake on "British-English (American-English)", then let me know.
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    • Thread starter
    • Moderator
    • #55
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    Please use first post to download v3.1.0

    Finally got around to test the scraper on my own collection, so I was able to finally test on a much wider range of titles. A few bugs surfaced on some rare titles, so I fixed those and I also noticed that some foreign titles didn't change. Turned out those movies where done in English language, but kept the foreign title.

    So now the scraper looks for "Country:" and "Language:" values at the imdb.com website, were it seems that "Country:" represents the area the movie was originally release in. This is why "Language:" is still required as well, because I stumbled upon some Asian movies that showed up as USA, but in Japanese language with foreign title.

    The users that adjusted their settings to retain the foreign title for their native language, will have to adjust the country value as well. They should retain the "us|gb" part, and add their own country codes to that, such as "us|gb|no" for Norway. The language will still be the main factor, and since some foreign movies get released in USA or UK first, it is important to keep those languages included.
     

    vpupkin

    Portal Pro
    March 26, 2011
    84
    8
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    Awesome, this works exactly as I hoped it would! Great job! Now, I haven't tried to re-import what I've done manually already, but a few titles (Japanese, Korean, HK, French, Spanish) that I re-sent to the importer worked out perfectly. I'll keep importing my collection (about halfway there), and will report on issues if any.

    The only problem that I have remaining is that sometimes themoviedb.org scraper is picked before IMDB+. Even though my data sources list looks like this:

    XBMC (Local)
    IMDb+
    imdb.com
    themoviedb.org
    ...

    I suspect this is due to file naming, e.g.

    Seven Swords (Uncut).avi

    shows up as 'Seven Swords' for themoviedb.org and filmtipset.se, but as 'Qi jian' for imdb.com abd IMDb+ (doesn't appear as part of the filename). After I manually select 'Qi jian' (2005) [IMDb+], it properly subs to 'Seven Swords (Qi jian)' (2005) [IMDb+].

    Is there anything that can be done, or should I just go ahead and disable other scrapers? These were mostly in place in case main scraper misses anything, but I doubt I will ever have a case like this.

    Any thoughts?

    ...and thanks again!
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    • Thread starter
    • Moderator
    • #57
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    Is there anything that can be done, or should I just go ahead and disable other scrapers? These were mostly in place in case main scraper misses anything, but I doubt I will ever have a case like this.

    I had serious delays today as well caused by themoviedb scraper, and it was enabled at the lowest priority. I forgot it was still enabled, because I knew of the issues regarding the scraper. In the end I just deleted (actual delete, not disabled) all the other scrapers to clean up the list, and was left with IMDb+ (1st), CDUniverse (2nd) and XBMC (3rd), but I kept themoviedb disabled as 4th.

    Reason for disabling themoviedb and not delete, is because I do still use it for fanarts and cover backup, and if you delete a scraper it is gone for all 3 methods. So for covers I have Local Data (1st), IMDb+ (2nd), themoviedb.org (3rd), moviepostersdb.com (4th), CDUniverse.com (5th), and XBMC (6th) with the rest deleted. For fanarts Local Data (1st), themoviedb (2nd), XBMC (3rd) with the rest deleted.

    However, what you also need to pay attention to is the "Primary Source" field when you look at a movie entry in the "Movie Manager" tab. If this is blank or has another scraper entry besides "IMDb+", then when you refresh the movie the IMDb+ scraper is *NOT* used. To fix this you have to sent the movie back to the importer, and as long as IMDb+ is in the 1st priority position it will be assigned as primary source.

    You can also fix this via other ways, such as editing the db3 database file with an SQLite editor, but it is much easier to use the above methods to avoid any problems.

    PS: All my own movies have all been imported with the imdb.com <id>874902</id> scraper (usually my own modified version), so what I did was replace the ID on the IMDb+ scraper to match this. This way when I updated the scraper script, it automatically adjusted the "Primary Source" on all my movies to "IMDb+", so that a simple refresh was enough to try out the new scraper version. That is when I discovered the bugs in v3.0.9 that lead to v3.1.0 when manually refreshing roughly 350 movies, that was a mix of foreign and English titles.

    Normally I would then highlight my entire collection, by highlighting the first movie, and then SHIFT+END to refresh all of them, but I have a few series grouped by title via manual title changes. So I need to find/verify that list first, so I can redo all the title changes after a mass update to retain the grouped series.

    PPS: "Seven Swords (Uncut).avi" is somewhat of a gamble in regards of a match. The scraper will locate these imdb.com results, and indeed it has Qi jian (2005) as 1st entry, but the '(Uncut)' part of your filename could prevent an automatic match from happening. It is extra work, but if you rename the file to "Seven Swords (Uncut) [tt0429078].avi", then it will always be a 100% auto-approved match. After you have imported a movie, you can actually have MovingPictures rename all your files in such a format by editing the advanced settings. Then in the future for unforseen reasons if you ever have to reimport your collection it will not require any manual verification.
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    • Thread starter
    • Moderator
    • #58
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    Sorry for the quick updates the last few days, but now that I can finally test the scraper on my full collection I find rare bugs a lot faster.

    Since speed is still a main concern with some users, I tried to add as much optimalisation as possible. The regular expression code for covers is super fast now, so that should help, and I improved the speed of the search node.

    The main delay remains the RottenTomatoes rating, because it adds additional delays in obtaining the RT webpage. I actually considered making the imdb.com rating default, but then today I found out that RottenTomatoes sometimes has summaries (synopsis as they call them) for some movies that lack a summary on the imdb.com website. Unfortunatly RottenTomatoes does not have short synopsis, so this does conflict with the default short summary setting of the scraper, but I thought it was better to have a long synopsis then an empty short summary.

    I also did a couple of test imports to measure the speeds of the RT and imdb.com rating settings.

    Default RT rating:

    30-Mar-2011 00:45:31 Info [ MovieImporter]: Watcher queued The Matrix (tt0133093).avi for processing.
    ....
    30-Mar-2011 00:45:35 Info [ FileBasedResource]: Added cover art for "The Matrix" from: http://...
    30-Mar-2011 00:45:36 Info [ FileBasedResource]: Added backdrop for "The Matrix" from: http://...
    ....
    30-Mar-2011 00:45:36 Info [ MovieImporter]: Added "The Matrix" (1999).​

    So that took 5 seconds total for details, cover and backdrop. Then I removed movie from import folder so it was removed and deleted the thumbs folder, adjusted scraper settings and reimported the scraper with imdb.com ratings enabled.

    30-Mar-2011 00:49:36 Info [ MovieImporter]: Watcher queued The Matrix (tt0133093).avi for processing.
    ....
    30-Mar-2011 00:49:38 Info [ FileBasedResource]: Added cover art for "The Matrix" from: http://...
    30-Mar-2011 00:49:39 Info [ FileBasedResource]: Added backdrop for "The Matrix" from: http://...
    ....
    30-Mar-2011 00:49:39 Info [ MovieImporter]: Added "The Matrix" (1999).​

    So that time it took 3 seconds.

    Now this is only a single movie test, so not very accurate, but you get a rough idea on the difference. For me personally it makes no difference how long each import takes, because they happen in the background for me while I watch something else. For those of you who do care this might make you want to use the imdb.com rating, so adjust settings accordingly.

    Ohh and incase it wasn't clear, v3.1.1 is now available to download from first post :D
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    • Thread starter
    • Moderator
    • #59
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    Seems virus scanners don't like my IMDb+ scraper, especially when scraper-debug mode is enabled.

    Microsoft Security Essentials (and I suspect other anti-virus solutions as well) put a serious delay on things, so keep that in mind if you experience slow import behaviour.

    After I disabled MSE, the import times went below 1 second with scraper-debug enabled, and before they would take as long as 3 minutes sometimes with the detailed scraper logging active.

    I do not like to recommend disabling any protection, so I'm going to do more testing first.
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    • Thread starter
    • Moderator
    • #60
    Re: IMDb+ Scraper (short/long summary, imdb/RT score, US/UK rating, and more)

    Ok, there are two solutions for Microsoft Security Essentials that worked for me on my setup.

    Open Microsoft Security Essentials window, and click on the "Settings" tab at the top

    Method 1 = Exclude processes:

    1. Click on "Excluded Processes" at the left side
    2. Use the "Browse" button at the right side
    3. Naviage to the folder that contains your MediaPortal.exe and add this file via double-click or single-click + OK
    4. Use the "Add" button to confirm you want this process excluded.
    5. Repeat those 2 steps for PluginConfigLoader.exe if you use that as well
    6. Save changes and you are good to go

    Method 2 = Exclude folder:

    1. Click on "Excluded files and locations" at the left side
    2. Use the "Browse" button at the right side
    3. Naviage to the folder that contains your MediaPortal data, for example: "C:\ProgramData\Team MediaPortal"
    4. Use the "Add" button to confirm you want this process excluded.
    5. Save changes and you are good to go

    I personally prefer method 2, as it would still secure my system when navigating to a virus infected website with a browser plugin running inside MediaPortal.exe process.

    If you use another anti-virus or malware protection that is causing delays, then read their respective documentation on how to exclude processes/folders as stated above.

    PS: I'm working on v3.1.2 of the scraper to improve detection of English titles, as well as adding Rottentomatoes runtime information when this info is missing from imdb.com (discovered that while testing on the Baby (2008) movie). I'm down to fixing very rare situations, so I'll delay release, incase I find other bugs.
     

    Users who are viewing this thread

    Top Bottom