IMDb tt-ID expansion (1 Viewer)

RoChess

Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    @ltfearme not sure how much you want to fix the scraper-scripts caught in the search, but please find time to fix the NFO and search node populator.

    damienhaynes/moving-pictures (moron GitHub online search filters out a search like "\d{7}" on online search, but local git search works, still "tt" finds most references online)

    IMDb is starting to use more and more 8-digit tt-IDs, and this causes issues due to clipping when \d{7} is used.

    Easy fix would be to mass-replace \d{7} into \d{7,8} which would only allow 7-digit and 8-digit IMdb tt-IDs, as I'll be dead by the time IMDb uses 9-digits :whistle:

    Example that fails = Dave Chappelle: Sticks & Stones (2019) - IMDb

    This gets changed to "tt1081042" which then matches with: "All My Children - Episode dated 3 April 1998"
     

    fischy667

    Super User
  • Team MediaPortal
  • Super User
  • May 5, 2010
    958
    283
    41
    Rostock
    Home Country
    Germany Germany
    I have a similar problem with How to train your dragon - Homecoming. How to Train Your Dragon: Homecoming (TV Short 2019) - IMDb

    I name my files the following way:
    name (year) [imdbid]

    The imdbid for this film is tt11112140. As you can see in the attached screenshot the last digit from the imdbid is missing and another film is matched. (imdbid tt1111214)
    9d0c95c0a72a95dcc4b005447d60c47d.jpg
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    • Thread starter
    • Moderator
    • #3
    @fischy667 for IMDb+ I've added full 8-digit support a while ago, so you can fix it by manually forcing a new match inside MovPic importer and doing a new search match by re-adding the missing 8th digit (I do it inside GUI, but you can do it in your screenshot input box as well for "IMDb ID"). This will then import the correct movie if your scraper-script supports it, but obviously requires manual steps that at first appeared rare enough where I didn't care, but it is occurring more frequently now.

    Hopefully, @ltfearme finds time amidst all the other crazy stuff, not to mention bushfires.
     

    fischy667

    Super User
  • Team MediaPortal
  • Super User
  • May 5, 2010
    958
    283
    41
    Rostock
    Home Country
    Germany Germany
    I added this movie like you said some time ago. But for better understanding I replayed the scenario and made a screenshot. And yes hopefully MovingPictures will fully support 7 and 8 digit IMDB-IDs in the near future.

    Edit: It took me some time to understand the "bushfire". Hopefully we will hearing some good news from Australia about that too.
     
    Last edited:

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    • Thread starter
    • Moderator
    • #6
    @ltfearme awesome job as usual, but curious on one thing:

    damienhaynes/moving-pictures

    Rusty on C#, but to me: s = "tt" + match.Value.Substring(6, 7); would always restrict it to a 7-char tt-ID?

    Not sure if s = "tt" + match.Value.Substring(6, 8); will work as-is for 7-digit IDs, or cause a problem. Having a breakpoint and being able to look at values is much easier, but I have yet to setup my dev box again for MediaPortal. Definitely one of my New Years resolution though :D
     

    ltfearme

    Community Plugin Dev
  • Premium Supporter
  • June 10, 2007
    6,751
    7,196
    Sydney
    Home Country
    Australia Australia
    Rusty on C#, but to me: s = "tt" + match.Value.Substring(6, 7); would always restrict it to a 7-char tt-ID?

    Not sure if s = "tt" + match.Value.Substring(6, 8); will work as-is for 7-digit IDs, or cause a problem. Having a breakpoint and being able to look at values is much easier, but I have yet to setup my dev box again for MediaPortal. Definitely one of my New Years resolution though :D
    hehe, I knew you might pick up on that...I saw it as a risk to change that because Substring would throw an exception if we try to get more than is available.

    I don't know too much about what that function so decided to leave as is until I understand what it does :)
     

    ltfearme

    Community Plugin Dev
  • Premium Supporter
  • June 10, 2007
    6,751
    7,196
    Sydney
    Home Country
    Australia Australia
    Here is the full method:
    Code:
            public static string parseFile(string filePath) {
                logger.Info("Parsing NFO file: {0}", filePath);
                // Read the nfo file content into a string
                string s = File.ReadAllText(filePath);
                // Check for the existance of a imdb id
                Match match = Regex.Match(s, @"tt\d{7,8}", RegexOptions.IgnoreCase);
                // If success return the id, on failure return empty.
                if (match.Success) {
                    s = match.Value;
                    logger.Debug("ImdbID Found: {0}", s);
                }
                else {
                    match = Regex.Match(s, @"title\?\d{7,8}", RegexOptions.IgnoreCase);
                    if (match.Success) {
                        s = "tt" + match.Value.Substring(6, 7);
                        logger.Debug("ImdbID Found: {0}", s);
                    }
                    else {
                        s = null;
                        logger.Debug("No ImdbID Found.");
                    }
                }
                // return the string
                return s;
            }

    I think most cases it should suceed on this:
    Code:
              logger.Info("Parsing NFO file: {0}", filePath);
                // Read the nfo file content into a string
                string s = File.ReadAllText(filePath);
                // Check for the existance of a imdb id
                Match match = Regex.Match(s, @"tt\d{7,8}", RegexOptions.IgnoreCase);
                // If success return the id, on failure return empty.
                if (match.Success) {
                    s = match.Value;
                    logger.Debug("ImdbID Found: {0}", s);
                }

    i.e. it finds an IMDb ID in an nfo file in the form ttNNNNNNN or ttNNNNNNNN.

    Just need to understand what it is trying to find when doing :
    Code:
                    match = Regex.Match(s, @"title\?\d{7,8}", RegexOptions.IgnoreCase);
                    if (match.Success) {
                        s = "tt" + match.Value.Substring(6, 7);
                        logger.Debug("ImdbID Found: {0}", s);
                    }

    i.e. what type of values do match.Value return?
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    • Thread starter
    • Moderator
    • #9
    After I posted it I figured it out.

    I think in a dark old past IMDb uses title1234567 format, before moving to tt1234567 and now tt12345678

    If it was me, I would just purge that entire block as Amazon is never going to reintroduce it, and every entry in their system has been converted.

    The following should work fine:

    Code:
            public static string parseFile(string filePath) {
                logger.Info("Parsing NFO file: {0}", filePath);
                // Read the nfo file content into a string
                string s = File.ReadAllText(filePath);
                // Check for the existance of a imdb id
                Match match = Regex.Match(s, @"tt\d{7,8}", RegexOptions.IgnoreCase);
                // If success return the id, on failure return empty.
                if (match.Success) {
                    s = match.Value;
                    logger.Debug("ImdbID Found: {0}", s);
                }
                else {
                    s = null;
                    logger.Debug("No ImdbID Found.");
                }
                // return the string
                return s;
            }

    Another solution would be to adjust the RegExp capture group and do a RegExp replace, where "title" is replaced by "tt", so it has the same results and then RegExp sorts out the 7 or 8 character length.

    My vote is to purge it, cleaner code and time to dump legacy (y)
     

    Users who are viewing this thread

    Top Bottom