Article removal issues & issues with some "English origin" titles translated to localized titles

Discussion in 'IMDb+' started by Jovi, October 8, 2014.

  1. Jovi

    Jovi Portal Member

    Joined:
    October 8, 2012
    Messages:
    8
    Likes Received:
    0
    Gender:
    Male
    Ratings:
    +0 / 0
    Home Country:
    Finland Finland
    Hi RoChess !

    Thank you for your hard efforts developing this nice scraper :)



    Thread in Movingpictures section got closed without any warning so I'll continue this here.


    I'm not so sure if IMDb+ is the root cause as article removal does not seem to work no matter which scraper you choose, if you choose scraper manually article filtering just do not work.

    My other problem is indeed IMDb+ related. I would like to have Finnish & Swedish & Mandarin/Cantonese titles with their original names (or to have both English international title & original language title). "English world" movies should be in their original names, not localized names. How ever no matter what IMDb+ options I choose dozens of English titles are scraped with their Finnish titles in case options to get original language names are chosen. Just to mention few: Life of Pi, Apocalypse now, As good as it get etc.
     
  2. Google AdSense Guest Advertisement



    to hide all adverts.
  3. RoChess
    • Premium Supporter

    RoChess Extension Developer

    Joined:
    March 10, 2006
    Messages:
    4,192
    Likes Received:
    1,310
    Ratings:
    +1,691 / 2
    With advanced manipulation of IMDb+ settings, this is all possible. It could also be that you stumbled on a bug though.

    I would like to first focus on the bug, meaning the titles that should have been English, got you a Finnish localized title.

    Just need you to import 'one' of them in a clean-slate situation with a lot of debug options enabled and then provide me the movingpictures.log file.

    Please do the following: http://code.google.com/p/imdbplus/wiki/DebugIMDb

    IMPORTANT: For the prerequisites you also need to do http://code.google.com/p/imdbplus/wiki/WikiInstallScraper

    Such huge log files are difficult to read, especially when asynchrone entries are made (two or more things happening at the same time), so be sure to wait long enough for MovPic to be done with all its background tasks, and then just delete the log file before you do the import "test".

    Once that is done (and bug fixed), I will help you adjust the IMDb+ settings to allow Finnish & Swedish & Mandarin/Cantonese titles to use the original name, or the way I have them myself "English title (Foreign title)". Fixing the bug should actually fix that, because it is one of the core functions of IMDb+, but only works if IMDb.com doesn't mess it up as is the case for you.
     
    • Like Like x 1
  4. Jovi

    Jovi Portal Member

    Joined:
    October 8, 2012
    Messages:
    8
    Likes Received:
    0
    Gender:
    Male
    Ratings:
    +0 / 0
    Home Country:
    Finland Finland
    Thanks for reply...and here we go...

    I don't want to mess up with my HTPC system too much, it's a bit frustrating job to re-install & import all movies & series once again if something gets messed up. So I installed same versions to my desktop PC. Results are same. I sent to importer one of these movies I remember having problems with movie title. Log file attached, hopefully it helps :)

    Configuration:
    MP1.60 Final
    StreamedMP 2.2.0.3
    IMDb+ v2.0.0.281

    Used options:
    Use the original title from the movie ON
    Special editions rename tagging support ON
    Rename titles so that series are grouped together ON
    Obtain additional information in the following language ENGLISH
    Fallback to English summary if foreign one is missing ON
    Refresh all of the fields ON
    English Rating.... USA
    Advanced country filter US|CA|GB|IE|AU|NZ
    Advanced Language filter EN|FI
    All other options OFF.
     

    Attached Files:

    Last edited: October 8, 2014
  5. RoChess
    • Premium Supporter

    RoChess Extension Developer

    Joined:
    March 10, 2006
    Messages:
    4,192
    Likes Received:
    1,310
    Ratings:
    +1,691 / 2
    Switch "Use the original title from the movie" to OFF which should import it as "As Good As It Gets".

    Those settings work as expected when IMDb gives you the content how it would for a USA IP user. So if you want the easy way out, have MediaPortal rely on a proxy to communicate with the internet, and find one that is USA based.

    A Finnish movie would then show as:

    Purge (2012)
    "Puhdistus" (original title)​

    And that is when the "Use the original title from the movie" setting works as expected.

    For you IMDb.com screws things up for me, because it would not even show English title, it would just show:

    Puhdistus (2012)​

    And for English movies with a localized/translated title it would show:

    Elämä on ihanaa (1997)
    As Good as It Gets (original title)​

    But there is no easy way for me to know when/if IMDb is doing that, which has lead to this difficult situation.

    Now the good news is I know how to fool IMDb.com into doing it without a proxy, but it will require a full rewrite of scraper-script and adding new option to IMDb+ plugin. For the last 2 years I have struggled to find the time for that, but I'm going to try to shoot again for this Christmas.

    In the meantime however there are some tricks/workarounds that will fix it till then.

    Then all you have to do is add FI to the Advanced settings for country (both country/language are needed), and it will accept the main title as-is that IMDb gives when you load the page (assuming your browser loads it the same MovPic ends up doing). This will for sure work for Finnish movies, and you can add SE (country) + SV (language) to do the same for Swedish movies. Assuming IMDb.com shows those Titles as-is for their Swedish language, otherwise that is not going to work.
     
    • Thank You! Thank You! x 1
  6. Jovi

    Jovi Portal Member

    Joined:
    October 8, 2012
    Messages:
    8
    Likes Received:
    0
    Gender:
    Male
    Ratings:
    +0 / 0
    Home Country:
    Finland Finland
    Thank you very much for your efforts and help :)

    I tried with settings below:
    Use original title OFF
    Add foreign title ON
    Start the Title with Foreign ON
    Special editions rename tagging support ON
    Rename titles so that series are grouped together ON
    Obtain additional information in the following language ENGLISH
    Fallback to English summary if foreign one is missing ON
    Refresh all of the fields ON
    English Rating.... USA
    Advanced country filter US|CA|GB|IE|AU|NZ|FI|SE|HK|CN
    Advanced Language filter EN|FI|SV|ZH

    Cantonese doesn't seem to have ISO language code?

    There were few weird situation when scraping Hong Kong movies. I resent movies to importer one by one via GUI in MovingPictures. One weird case was this, result got: SAP JI SANG CIU (AS CHINESE ZODIAC) (2012) (IMDB+). How ever the name in movies list in MovingPictures shows only Cantonese name of movie, no English name in the end. And (tt0097202) didn't get original Cantonese name, only English name The Killer. Just to mention these two. Some movies got both titles and some didn't.

    This was wierd too (tt1343092)
    The Great Gatsby - Kultahattu (as Kultahattu), or was it The Great Gatsby - Kultahattu (as The Great Gatsby) not sure anymore which one the result was actually.

    I tried proxy from free proxy list, it fixed the issue with As Good as It Gets. How ever this proxy died in same evening :( Too much hassle if you have to find working one on daily basis. Is there an option in Mediaportal or MovingPictures to use proxy? I changed connection settings in Windows which affects all http traffic.

    I think I will set these options to OFF as majority of movies I store in my server are English origin, and will time to set Foreign title/Start with foreign title to ON and will resend non-english movies to importer one by one to get their English names in the end of title.

    Keep up the good work mate :)
     
    Last edited: October 19, 2014
  7. RoChess
    • Premium Supporter

    RoChess Extension Developer

    Joined:
    March 10, 2006
    Messages:
    4,192
    Likes Received:
    1,310
    Ratings:
    +1,691 / 2
    Altering the regular expression that IMDb+ uses based on how IMDb.com responds for a USA user can lead to extremly weird results if you do not know what it is doing.

    And yeah, the world of free proxies is terrible to rely on, have you tried: http://www.vpngate.net/en/ ?

    This obviously works easiest if the HTPC is a dedicated one, but you can otherwise launch MediaPortal via a batch script that activates the VPN, and stops it once you are done with MediaPortal.
     
    • Thank You! Thank You! x 1
  8. Jovi

    Jovi Portal Member

    Joined:
    October 8, 2012
    Messages:
    8
    Likes Received:
    0
    Gender:
    Male
    Ratings:
    +0 / 0
    Home Country:
    Finland Finland
    Do you mean that I should use these extra country/language codes only together with proxy- or VPN server located in USA ?

    Advanced country filter US|CA|GB|IE|AU|NZ|FI|SE|HK|CN
    Advanced Language filter EN|FI|SV|ZH

    Thanks for vpngate.net link, didn't even know that such free VPN services exists, I might have some other use for that too. US servers are probably hosted by NSA and they inspect your traffic lol, but who cares. My HTPC is not completely dedicated to Mediaportal, I watch some streams with it as well with browser.
     
  9. RoChess
    • Premium Supporter

    RoChess Extension Developer

    Joined:
    March 10, 2006
    Messages:
    4,192
    Likes Received:
    1,310
    Ratings:
    +1,691 / 2
    For me as USA user, the IMDb+ system works as expected (explained in detail before). It is when IMDb.com gives you movie titles as per USA point of view and you want to go beyond what IMDb+ does.

    For example if I want Asian movies to be "Englishtitle (Foreign title)", but Dutch movies to be "Dutch title"... *that* is when I would add Dutch country+language to the regular expression, to trick IMDb+ into thinking the Dutch title is an English title and accept it as-is.
     
    • Thank You! Thank You! x 1
Loading...

Users Viewing Thread (Users: 0, Guests: 0)

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice
  • About The Project

    The vision of the MediaPortal project is to create a free open source media centre application, which supports all advanced media centre functions, and is accessible to all Windows users.

    In reaching this goal we are working every day to make sure our software is one of the best.

             

  • Support MediaPortal!

    The team works very hard to make sure the community is running the best HTPC-software. We give away MediaPortal for free but hosting and software is not for us.

    Care to support our work with a few bucks? We'd really appreciate it!