[fixed] Series Plugin - Error in parsing Series Numbering

JimCatMP

Documentation Group
  • Team MediaPortal
  • April 1, 2010
    647
    43
    Leeds
    Country flag
    Have a small collection of Top Gear Episodes.

    On import, \\BNAS\Disk1\TV\All\Comedy\Top Gear\Series 11\01 - Cop Car.mkv is identified as S01E01.

    Have same with Episodes in Series 14,16 & 17 - identified as S04E06, S06E02 etc etc. I have NO series 4, 6 or 7 episodes, so no directory to match against.

    Also, have S08E03 displayed twice - once is really Series 8/03 - Amphibious Car.mkv and the other is really Series 18/03 - Sweeny.mkv - the correct thumbnails have been generated.

    It seems my sparse directory structure is playing havoc here - where have all episodes, 100% solid.

    Client/Server on same unit, Win 7 64 bit.

    Anything more in terms of files or info, happy to provide.

    Cheers - JCMP
     

    Attachments

    chefkoch

    Retired Team Member
  • Premium Supporter
  • October 5, 2004
    3,130
    113
    Dresden / Munich / Maastricht
    On import, \\BNAS\Disk1\TV\All\Comedy\Top Gear\Series 11\01 - Cop Car.mkv is identified as S01E01.
    What exactly is your naming scheme?
    I guess you meant Season here instead of Series?

    The currently implemented regular expressions can be found in source code: https://github.com/MediaPortal/MediaPortal-2/blob/dev/MediaPortal/Source/Extensions/MetadataExtractors/SeriesMetadataExtractor/NameMatchers/SeriesMatcher.cs#L44-L60
    Only one RegEx is being listed as a Folder+filename pattern for series/episode identification.

    While this will be configurable in future sometime, I would suggest you to tag your mkv files properly.
    MediaPortal 2 is able to read the series name, season and episode index from mkv files.
    This way it does not matter anymore which folder structur you have or how you want to name your files.
    Similar to your music files, which you want to have identified by the the ID3 tags instead of filename ;)

    If you re-muxed your files to mkv, you've done the most important already. While you can add a xml file as tags when muxing, it is also possible to get access to the tags without re-muxing.
    More infos about Matroska tags incl. examples can be found on the Matroska homepage.
    To make it easier to edit those tags, I recently started a small tool MatroskaTagger.
    It does not support many tags, yet but the most important for identification are included.
     

    Smeulf

    Test Group
  • Team MediaPortal
  • October 27, 2010
    672
    43
    France
    Hi,

    IMO, there's a problem in the regular expression, as it only find the second last number as the season number. Please see attached picture. I changed 11 to 15 for the example, easier to see ;)

    All the best.

    [Edit] I spent half an hour trying to find why that expression does not match correctly, and definitely I'm not good enough :( [/Edit]
     

    Attachments

    Last edited:

    JimCatMP

    Documentation Group
  • Team MediaPortal
  • April 1, 2010
    647
    43
    Leeds
    Country flag
    On import, \\BNAS\Disk1\TV\All\Comedy\Top Gear\Series 11\01 - Cop Car.mkv is identified as S01E01.
    What exactly is your naming scheme?
    I guess you meant Season here instead of Series?
    No, by Series I mean Series - British don't you know! By Series can you infer Season, yes:)

    The currently implemented regular expressions can be found in source code: https://github.com/MediaPortal/MediaPortal-2/blob/dev/MediaPortal/Source/Extensions/MetadataExtractors/SeriesMetadataExtractor/NameMatchers/SeriesMatcher.cs#L44-L60
    Only one RegEx is being listed as a Folder+filename pattern for series/episode identification.
    Not had a look at that, and I'm fairly poor at regex :whistle:, however I had a brief discussion with Morpheus last year.

    Thanks JimCatMP! I've checked your file list: your naming convention is already supported in the current version (no release yet). So after all I think we are on a good way with MP2 series support :)
    Great news Morpheus, look forward to new release & a chance to do some testing.

    TTFN - J
    This is not about naming convention but a glitch in the regex parsing, if it was just down to my naming, I would not have posted as a bug :(.

    While this will be configurable in future sometime, I would suggest you to tag your mkv files properly.
    MediaPortal 2 is able to read the series name, season and episode index from mkv files.
    This way it does not matter anymore which folder structur you have or how you want to name your files.
    Similar to your music files, which you want to have identified by the the ID3 tags instead of filename ;)

    If you re-muxed your files to mkv, you've done the most important already. While you can add a xml file as tags when muxing, it is also possible to get access to the tags without re-muxing.
    More infos about Matroska tags incl. examples can be found on the Matroska homepage.
    To make it easier to edit those tags, I recently started a small tool MatroskaTagger.
    It does not support many tags, yet but the most important for identification are included.
    While the example is specific to MKV files, I'm not sure the same would not be true for AVI etc etc - I'll copy in some valid AVI files from another directory (and mp4's) and retest (running re-import at the moment.

    Cheers - JCMP

    Update - AVI & mo4 import - still incorrectly identified.

    The Regex:

    new Regex(@"(?<series>[^\\]+) - \((?<episode>.*)\) S(?<seasonnum>[0-9]+?)[\s|\.|\-|_]{0,1}E(?<episodenum>[0-9]+?)", RegexOptions.IgnoreCase),
    // "Series 1x1 - Episode" and multi-episodes "Series 1x1_2 - Episodes"


    Should it not include .... <seasonnum>[0-9]{1,3}........... Allowed for Season index of +100 for Doctor Who PRE 2005 ;) - or something similar, as said, not a regex person - fairly sure just sticking in the above would break it:)

    TTFN
     
    Last edited:

    morpheus_xx

    Lead Dev MP2
  • Team MediaPortal
  • March 24, 2007
    11,184
    113
    Country flag
    The Regex: new Regex(@"(?[^\\]+) - \((?.*)\) S(?[0-9]+?)[\s|\.|\-|_]{0,1}E(?[0-9]+?)", RegexOptions.IgnoreCase), // "Series 1x1 - Episode" and multi-episodes "Series 1x1_2 - Episodes" Should it not include .... [0-9]{1,3}........... Allowed for Season index of +100 for Doctor Who PRE 2005 ;) - or something similar, as said, not a regex person - fairly sure just sticking in the above would break it:)
    the "+" after [0-9] matches one or more items:

    Quote from wikipedia:
    "+ Matches the preceding element one or more times. For example, ba+ matches "ba", "baa", "baaa", and so on."

    Hi,

    IMO, there's a problem in the regular expression, as it only find the second last number as the season number. Please see attached picture. I changed 11 to 15 for the example, easier to see ;)

    All the best.

    [Edit] I spent half an hour trying to find why that expression does not match correctly, and definitely I'm not good enough :( [/Edit]
    I will check this error, thanks for testing. Didn't know this tool yet, going to use it myself ;)

    Also it seems to be time to allow custom patterns to read from xml configuration
     

    morpheus_xx

    Lead Dev MP2
  • Team MediaPortal
  • March 24, 2007
    11,184
    113
    Country flag
    This expression fixes the series parsing for multiple digits in paths:
    Code:
    (?<series>[^\\]*)\\[^\\|\d]*(?<seasonnum>\d+)[^\\]*\\(?<episodenum>\d+)\s*-*\s*(?<episode>.*)\.
     
    Last edited:

    JimCatMP

    Documentation Group
  • Team MediaPortal
  • April 1, 2010
    647
    43
    Leeds
    Country flag
    Quote from wikipedia:
    "+ Matches the preceding element one or more times. For example, ba+ matches "ba", "baa", "baaa", and so on."
    This regex works for all "path\Series Name\S[eries|eason|affel|*] NN\XX - name.ext" Where NN can be N, XX can be X - that I've tested - not exhaustive:-?



    (?<series>[^\\]*)\\[^\\]* (?<seasonnum>\d+)[^\\]*\\(?<episodenum>\d+)\s*-*\s*(?<episode>.*)\.
    ^ (single space added).

    It does NOT work for "path\Series Name\NN S[eries|eason|affel|*]\XX - name.ext" but this does...

    (?<series>[^\\]*)\\(?<seasonnum>\d+)[^\\]*\\(?<episodenum>\d+)\s*-*\s*(?<episode>.*)\.

    I know that's just patching the problem, not fixing the regex, but best I can do (and nice tool Smeulf).

    TTFN - JCMP.
     
    Last edited:

    morpheus_xx

    Lead Dev MP2
  • Team MediaPortal
  • March 24, 2007
    11,184
    113
    Country flag
    I've tried your regex, it doesn't match all my constructed test cases:
    \\BNAS\Disk1\TV\All\Comedy\Top Gear\Series 111\01 - Cop Car.mkv
    \\BNAS\Disk1\TV\All\Comedy\Top Gear\13\01 - Cop Car.mkv
    \\BNAS\Disk1\TV\All\Comedy\Top Gear\12. Staffel\01 - Cop Car.mkv
    \\BNAS\Disk1\TV\All\Comedy\Top Gear\Temporada 11\01 - Cop Car.mkv
    Where my version from above works on all:
    (?<series>[^\\]*)\\[^\\|\d]*(?<seasonnum>\d+)[^\\]*\\(?<episodenum>\d+)\s*-*\s*(?<episode>.*)\.

    Any more test cases for the pattern? If we still can improve it, I will continue :)
     

    Smeulf

    Test Group
  • Team MediaPortal
  • October 27, 2010
    672
    43
    France
    This expression fixes the series parsing for multiple digits in paths:
    Code:
    (?<series>[^\\]*)\\[^\\|\d]*(?<seasonnum>\d+)[^\\]*\\(?<episodenum>\d+)\s*-*\s*(?<episode>.*)\.
    That one seems really good whatever is the string, match fine for multiple cases.

    But it can fail in a specific case :

    - If there is a number in the serie name AND you don't use any word but the season number in the season folder.

    Please see attached pic.

    and nice tool Smeulf
    Yep, very usefull indeed :)

    Cheers.
     

    Attachments

    Last edited:

    morpheus_xx

    Lead Dev MP2
  • Team MediaPortal
  • March 24, 2007
    11,184
    113
    Country flag
    ok, next version:
    (?<series>[^\\]*)\\[^\\|\d]*(?<seasonnum>\d{1,3})[^\\|\d]*\\(?<episodenum>\d+)\s*-*\s*(?<episode>.*)\.
    Works also for BSG 2003, but limits the "Season" to 3 digits
     
    Last edited:
    Top Bottom