[Approved] Improve movie name and year parsing (1 Viewer)

Brownard

Development Group
  • Team MediaPortal
  • March 21, 2007
    2,290
    1,872
    Home Country
    United Kingdom United Kingdom
    I've changed the name parsing for movies so that '.' and '_' are replaced with spaces (same as already happens for series parsing) and changed the year regex so that years don't have to be surrounded by parenthesis. It gives much better results for me.
     

    Attachments

    • Improve-movie-name-and-year-parsing.patch
      2.6 KB

    morpheus_xx

    Retired Team Member
  • Team MediaPortal
  • March 24, 2007
    12,073
    7,459
    Home Country
    Germany Germany
    We have some unit tests for Movie/Series matching here:
    Could you add some example name patterns to TheMovieDb test class? Currently there are only "clean" names included, but for testing the changed regexps it would be good to consider more file names (like in SeriesNameMatcher).

    Can you create a branch for this change (including Test project), please?
     

    Brownard

    Development Group
  • Team MediaPortal
  • March 21, 2007
    2,290
    1,872
    Home Country
    United Kingdom United Kingdom
    • Thread starter
    • Moderator
    • #4
    We have some unit tests for Movie/Series matching here:
    Could you add some example name patterns to TheMovieDb test class? Currently there are only "clean" names included, but for testing the changed regexps it would be good to consider more file names (like in SeriesNameMatcher).

    Can you create a branch for this change (including Test project), please?
    Will do...though my broadband has completely died on me today so could be a while :-(
     

    Brownard

    Development Group
  • Team MediaPortal
  • March 21, 2007
    2,290
    1,872
    Home Country
    United Kingdom United Kingdom
    • Thread starter
    • Moderator
    • #5
    The MovieMatchesAreCorrect unit test is failing for me, even before my modifications. Can you confirm whether its working for you?
    I modified the NoLocalization class to force the German CultureInfo which fixed some of the matches but these titles are still failing:

    "Unbeugsam - Defiance"
    "The Artist"
    "Quarantäne"
    "Oben"
    "Iron Sky"
    "Hangover"

    "Unbeugsam - Defiance" fails the Levenshtein check, the rest fail because there are multiple exact matches.

    Thanks
     

    morpheus_xx

    Retired Team Member
  • Team MediaPortal
  • March 24, 2007
    12,073
    7,459
    Home Country
    Germany Germany
    I have to admit that I didn't run them recently, so they could fail already :oops:
    The problem with this "test" is the dynamic nature of online databases, changed data cause different results (this is what happened to "Unbeugsam - Defiance" it was changed online to "Defiance - Unbeugsam").

    Not sure how we should handle this...
     

    Users who are viewing this thread

    Top Bottom