Making RegEx list for series/movie matching extensible for users (1 Viewer)

morpheus_xx

Retired Team Member
  • Team MediaPortal
  • March 24, 2007
    12,073
    7,459
    Home Country
    Germany Germany
    Currently all regular expressions used inside our MDEs are hard coded. There are requests to make them extensible:

    It makes no sense, they are very specific. And will be used only in 1% of cases. And then not always. I have a lot of rules for Russian series. And each rule applies for one of the two series. There is also a replacement rules that simplify the work of imports.
    But it all individually. And MP1 allows me to set it up.


    Question at @MJGraf:
    I was just about to make the regular expression I use for parsing an IMDB-ID a setting and had some interesting reading about the C# regex class. Although it is probably only interesting for developers, I want to write it down somewhere so that it is not forgotten.

    So the question was: How can we serialize a C# Regex object. The Regex class implements ISerializable - but unfortunately not IXmlSerializable, which we need for our settings system. My first thought was that we just inherit from the Regex class and implement IXmlSerializable on our own (something like public class XmlSerializableRegex : Regex, IXmlSerializable). The problem here is that once a Regex object is created, it is immutable - meaning you cannot change its regular expression anymore. But during deserialization, first an object is created and then its fields are set, which doesn't work here. According to the MS reference source, it is not possible that we manually do this, because the classes we need to initialize all the fields of the Regex class are "internal".

    The next idea was to just serialize the regular expression as string and create a Regex object from that on demand. The downside of this approach is that we don't serialize the RegexTimeout (which IMHO would be acceptable) but also not the RegexOptions (such as IgnoreCase or InvariantCulture). These would then still be hardcoded which I don't think is a good idea.

    The best solution I can currently think of is wrapping a Regex object into our own class that implements IXmlSerializable and besides this only as one (readonly) property: Regex. Using this regex then requires a call like OurSettingsObject.ImdbRegex.Regex.Match(...), which is not extremely beautiful, but IMHO acceptable.

    And there is another advantage of this:
    We currently create lots of Regex objects on demand as local variables. This is anything but ideal from a performance perspective. The constructor of the Regex class compiles the regular expression, which is why instantiating a Regex object is a relatively time consuming operation (ok, we are talking about milliseconds here, but if we instantiate 10 Regex objects per MediaItem in an import and the import has tens of thousands of MediaItems, we suddenly talk about several minutes...).
    For that reason, there is an internal cache for compiled Regex objects in the Regex class. But this cache is according to MSDN only used, when calling the static methods of the Regex class (which we currently don't do). Instantiated Regex objects are not cached. But calling the static methods instead might also not be ideal, as by default, the cache has a maximum size of 15 Regex objects, which for the whole MP2 system might not be enough. We can set a higher size limit, but in the end it is quite difficult to estimate the maximum number of Regex objects in the MP2 system.
    Now if we use the last approach above, we basically use our settings cache to cache our own Regex objects (meaning: every (XmlSerializable)Regex we have in a settings file is automatically cached so that we only have one instance of it and compilation only has to happen once). According to MSDN, the whole Regex class is thread safe so that the whole MP2 system can use that single instance of the Regex object without problems.

    Conclusion: I'll implement the last approach described above. I'll for now put the class in the OnlineLibraries project as there we will likely need it most. We can later move it to MediaPortal.Utilities.Xml, but this (if I finally understood it correctly) requires a version bump of MediaPortal.Utilities so that it might be better to do that in the end...
    If anyone has more experience with the Regex class than me and disagrees with the above, please let me know...
    did you already implement a serialization for Regex instances?

    This we would need for putting them into settings and sending them to server (via ServerSettings).

    I could also image some more attributes like
    XML:
    <Pattern Enabled="True" Code="(?<series>[^\\]*)\\[^\\]*(?<seasonnum>\d+)[^\\]*\\S*(?<seasonnum>\d+)[EX](?<episodenum>\d+)*(?<episode>.*)\." Option="IgnoreCase" />
    the "Enabled" attribute could be used for temporary disabling patterns without needing them to be removed/commented from code. Construction of an RegExp by (string, RegexOptions) is possible when we deserialize this type. With a custom class we probably not even need to implement own XML serialization, just using strings and enum and creating the RegEx instance once both properties are set.

    How should the extension be provided to users? Once the serialization is implemented, the user could manually edit .xml files inside configf folder. Would that be enough for advanced users?
    If not, how could a GUI in MP2 could look like to edit RegExs?

    The change itself is quite easy to make, but a config GUI would be quite complicated I think.

    @Developers @Testers
     

    aspik

    Retired Team Member
  • Team MediaPortal
  • April 14, 2008
    1,322
    586
    How should the extension be provided to users?
    I think it would be enough to manually edit the settings .xml file. Writing regex patterns in the GUI in where you mostly (always?) operate with remote, would be not so good.
     

    ajs

    Development Group
  • Team MediaPortal
  • February 29, 2008
    15,492
    10,371
    Kyiv
    Home Country
    Ukraine Ukraine
    Regular expressions without replacement rules, powerless. No flexibility.
     

    morpheus_xx

    Retired Team Member
  • Team MediaPortal
  • March 24, 2007
    12,073
    7,459
    Home Country
    Germany Germany
    • Thread starter
    • Moderator
    • #4
    If we work on this feature it's also possible to add such replacement rules as well. That's why I've asked you for your examples to have something to test.
     

    ajs

    Development Group
  • Team MediaPortal
  • February 29, 2008
    15,492
    10,371
    Kyiv
    Home Country
    Ukraine Ukraine
    That's why I've asked you for your examples to have something to test.
    Examples of files? Examples of regular expressions? Examples replacement rules? My regular expressions are not very beautiful, but if it is necessary to give them something. Examples of files as I can give. Examples of replacement as well. What exactly is needed?
     

    morpheus_xx

    Retired Team Member
  • Team MediaPortal
  • March 24, 2007
    12,073
    7,459
    Home Country
    Germany Germany
    • Thread starter
    • Moderator
    • #6
    Yes, all of them:
    • Full paths for file names
    • Replacement rules
    • Match regexs
    And a definition in what order they should be applied (I guess replacements before matching)
     

    ajs

    Development Group
  • Team MediaPortal
  • February 29, 2008
    15,492
    10,371
    Kyiv
    Home Country
    Ukraine Ukraine
    And a definition in what order they should be applied (I guess replacements before matching)
    It is configured in the TV series plug. It can be so, but it is possible and vice versa. You can use regular expressions, you can not use them.
    My substitution rules and regular expressions are not optimal and not optimized. But I come home, I'll send them to you. However, file names can send only those that are currently available, send a series of remote likely will not.
     

    morpheus_xx

    Retired Team Member
  • Team MediaPortal
  • March 24, 2007
    12,073
    7,459
    Home Country
    Germany Germany
    • Thread starter
    • Moderator
    • #8
    I'm currently implementing the classes for saving regex patterns in settings.

    Do we need to define "rules" in such a way:
    1. Replace string A by string B
    2. (optionally more replaces)
    3. Match series by RegEx
    ?

    Or is a single list of replaces and the a list of matches enough?

    @ajs would be good if you find a few minutes for examples! :)
     

    morpheus_xx

    Retired Team Member
  • Team MediaPortal
  • March 24, 2007
    12,073
    7,459
    Home Country
    Germany Germany
    • Thread starter
    • Moderator
    • #9
    The default contained regex look like this when saved:
    XML:
      <Property Name="Patterns">
        <ArrayOfMatchPattern xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
          <MatchPattern>
            <Enabled>true</Enabled>
            <Pattern>(?&lt;series&gt;[^\\]*)\\[^\\]*(?&lt;seasonnum&gt;\d+)[^\\]*\\S*(?&lt;seasonnum&gt;\d+)[EX](?&lt;episodenum&gt;\d+)*(?&lt;episode&gt;.*)\.</Pattern>
            <RegexOptions>IgnoreCase</RegexOptions>
          </MatchPattern>
          <MatchPattern>
            <Enabled>true</Enabled>
            <Pattern>(?&lt;series&gt;[^\\]+) - \((?&lt;episode&gt;.*)\) S(?&lt;seasonnum&gt;[0-9]+?)[\s|\.|\-|_]{0,1}E(?&lt;episodenum&gt;[0-9]+?)</Pattern>
            <RegexOptions>IgnoreCase</RegexOptions>
          </MatchPattern>
          <MatchPattern>
            <Enabled>true</Enabled>
            <Pattern>(?&lt;series&gt;[^\\]+)\W(?&lt;seasonnum&gt;\d+)x((?&lt;episodenum&gt;\d+)_?)+ - (?&lt;episode&gt;.*)\.</Pattern>
            <RegexOptions>IgnoreCase</RegexOptions>
          </MatchPattern>
          <MatchPattern>
            <Enabled>true</Enabled>
            <Pattern>(?&lt;series&gt;[^\\]+)\WS(?&lt;seasonnum&gt;\d+)[\s|\.|\-|_]{0,1}E((?&lt;episodenum&gt;\d+)_?)+ - (?&lt;episode&gt;.*)\.</Pattern>
            <RegexOptions>IgnoreCase</RegexOptions>
          </MatchPattern>
          <MatchPattern>
            <Enabled>true</Enabled>
            <Pattern>(?&lt;series&gt;[^\\]+).(?&lt;seasonnum&gt;\d+)x((?&lt;episodenum&gt;\d+)_?)+(?&lt;episode&gt;.*)\.</Pattern>
            <RegexOptions>IgnoreCase</RegexOptions>
          </MatchPattern>
          <MatchPattern>
            <Enabled>true</Enabled>
            <Pattern>(?&lt;series&gt;[^\\]+).S(?&lt;seasonnum&gt;\d+)[\s|\.|\-|_]{0,1}E((?&lt;episodenum&gt;\d+)_?)+(?&lt;episode&gt;.*)\.</Pattern>
            <RegexOptions>IgnoreCase</RegexOptions>
          </MatchPattern>
          <MatchPattern>
            <Enabled>true</Enabled>
            <Pattern>(?&lt;series&gt;[^\\]*)\\[^\\|\d]*(?&lt;seasonnum&gt;\d+)\D*\\(?&lt;episodenum&gt;\d+)\s*-\s*(?&lt;episode&gt;[^\\]+)\.</Pattern>
            <RegexOptions>IgnoreCase</RegexOptions>
          </MatchPattern>
          <MatchPattern>
            <Enabled>true</Enabled>
            <Pattern>(?&lt;series&gt;[^\\]+).\W(?&lt;seasonnum&gt;\d{1})(?&lt;episodenum&gt;\d{2})\W(?&lt;episode&gt;.*)\.</Pattern>
            <RegexOptions>IgnoreCase</RegexOptions>
          </MatchPattern>
        </ArrayOfMatchPattern>
      </Property>

    The regex instance will be created on first demand, like:
    C#:
     if (matchPattern.GetRegex(out regex))
     {
        // Do matching
     }

    The "GetRegex" method will consider the "Enabled" flag and will return "false" is not enabled.
     

    ajs

    Development Group
  • Team MediaPortal
  • February 29, 2008
    15,492
    10,371
    Kyiv
    Home Country
    Ukraine Ukraine
    would be good if you find a few minutes for examples!
    Read: https://forum.team-mediaportal.com/threads/expressions-rules-requests.21978/ and https://forum.team-mediaportal.com/threads/expressions-rules-exchange.21977/ :)

    Code:
    ID    enabled    tagEnabled    before    toreplace    with    isRegex
    0    1    0    0    .    <space>    0
    1    1    0    0    _    <space>    0
    2    1    0    0    -<space>    <empty>    0
    3    1    1    1    720p    <empty>    0
    4    1    1    1    1080i    <empty>    0
    5    1    1    1    1080p    <empty>    0
    6    1    1    1    x264    <empty>    0
    7    1    1    0    DSR    <empty>    0
    8    1    1    0    HR-HDTV    <empty>    0
    9    1    1    0    HR.HDTV    <empty>    0
    10    1    1    0    HDTV    <empty>    0
    11    1    1    0    DVDMux    <empty>    0
    12    1    0    1    rgfootball    <empty>    0
    13    1    0    1    (?<=(\s?\.?P[ar]*t\s?)) (X)?(IX|IV|V?I{0,3})    <RomanToArabic>    1
    14    1    0    1    (?<!(?:S\d+.?E\\d+\-E\d+.*|S\d+.?E\d+.*|\s\d+x\d+.*))P[ar]*t\s?(\d+)(\s?of\s\d{1,2})?     S01E${1}     1
    15    1    0    1    (?<!(?:S\d+.?E\\d+\-E\d+.*|S\d+.?E\d+.*|\s\d+x\d+\s.*))(\d{1,2})\s?of\s\d{1,2}     S01E${1}     1
    16    1    0    1    serija.iz    <empty>    0
    17    1    0    1    (    <empty>    0
    18    1    0    1    )    <empty>    0
    19    1    0    1    Tron.Uprising    Tron Uprising    0
    20    1    0    1    Doroga.na.ostrov.Pashi.    The Road to Easter Island.    0
    21    1    0    1    Dnevnik.doktora.Zaytsevoy    Dnevnik Doktora Zaitsevoy    0
    22    1    0    1    The.Legend.of.Korra    The Legend of Korra    0
    23    1    0    1    _e    .e    0
    24    1    1    1    Бонус     Бонус.s0e    0
    25    1    1    1     - 2. Серия №    .S02E    0
    26    1    1    1    Дневник доктора Зайцевой    Dnevnik Doktora Zaitsevoy    0
    27    1    1    1    2012    0.2012    0
    28    1    1    1    Метод Фрейда.    Metod Freyda.E    0
    29    1    1    1    Delo.Gastronoma.N1.    Delo Gastronoma N1.    0
    30    1    1    1    Сваты6.    Svaty.S06E    0
    31    1    1    1    Kuhnja.2.    Кухня.S02E    0
    32    1    1    1    Jeskadrilja.Chernaja.ovca    Black Sheep Squadron    0
    33    1    1    1    obratnaya_storona_luny_    Обратная сторона луны.S01E    0
    34    1    1    1    _serya    .seriya    0
    35    1    1    1    Sindrom.drakona.    Синдром дракона.S01E    0
    36    1    1    1    Однажды.в.Одессе-Жизнь.Мишки.Япончика    Однажды в Одессе: Жизнь и приключения Мишки Япончика    0
    37    1    1    1    Пепел -     Pepel.S01E    0
    38    1    1    1     серия    .Серия    0
    39    1    1    1    Убить Сталина -     Убить Сталина.S01E    0
    40    1    1    1    SATRip.nnm-club    <empty>    0
    41    1    1    1    Восьмидесятые-5.2015.    Восьмидесятые.S05E    0
    42    1    1    1    Шерлок Холмс -     Шерлок Холмс.S01E    0
    43    1    1    1    Belka.i.Strelka.Ozornaya.semeyka    Белка и Стрелка    0
    44    1    0    1    (?-i)([A-Z])\.(?=[A-Z])    ${1}    1
    45    1    0    0    <space><space>    <space>    0
    46    1    1    1    Dragons - Defenders of Berk [Season 2] 2013-2014 WEB-DL 720p [Rus, Eng]    Драконы: Всадники Олуха    0
    47    1    1    1    James.May's.Cars.of.the.People.E    James May's Cars of the People.S01E    0
    48    1    1    1    Кухня_    Кухня.S05E    0
    49    1    1    1    Ressurection    Resurrection    0
    50    1    1    1    Драконы Всадники Олуха - Dragons Riders of Berk    Драконы: Всадники Олуха    0
    51    1    1    1    Fizruk-2.Seriya.    Физрук.S02E    0
    52    1    1    1    Legavyy-2.    Legavyy.S02E    0
    53    1    1    1    Legavyy    Легавый    0
    54    1    1    1    Интерны 12 -     Интерны.S12E    0
    55    1    1    1    .WEB.400p    <empty>    0
    56    1    1    1    .SAT.400p    <empty>    0
    57    1    1    1    Ostanniy.moskal\\Ostanniy.moskal.[0-9]+\\Ostanniy.moskal.    Последний москаль.S01E    1
    58    1    1    1    Akta.Manniskor(Real.Humans)    Akta Manniskor    0
    59    1    1    1    Akta ManniskorReal Humans    Akta Manniskor    0
    60    1    1    1    Интерны 13 -     Интерны.S13E    0
    61    1    1    1    Слуга народа 2015    Слуга народа.S01    0
    62    1    1    1    \s*(\d+).Серия    .E${1}.Серия    1

    Code:
    ID    enabled    type    expression
    0    1    regexp    ^.*?\\?(?<series>[^\\$]+?)[ .-]+(?:[s]?(?<season>\d+)[ .-]?[ex](?<episode>\d+)|(?:\#|\-\s)(?<season>\d+)\.(?<episode>\d+))(?:[ex+-]*(?<episode2>\d+))?[ .-]*(?<title>(?![^\\]*?sample[ .-])[^$]*?)\.(?<ext>[^.]*)$
    1    1    regexp    ^.*?\\?(?<series>[^\\$]+?)(?:s(?<season>[0-3]?\d)\s?ep?(?<episode>\d\d)|(?<season>(?:[0-1]\d|(?<!\d)\d))x?(?<episode>\d\d))(?!\d)(?:[ .-]?(?:s\k<season>e?(?<episode2>\d{2}(?!\d))|\k<season>x?(?<episode2>\d{2}(?!\d))|(?<episode2>\d\d(?!\d))|E(?<episode2>\d\d))|)[ -.]*(?<title>(?![^\\]*?sample)[^\\]*?[^\\]*?)\.(?<ext>[^.]*)$
    2    1    regexp    ^(?<series>[^\\$]+)\\[^\\$]*?(?:s(?<season>[0-1]?\d)ep?(?<episode>\d\d)|(?<season>(?:[0-1]\d|(?<!\d)\d))x?(?<episode>\d\d))(?!\d)(?:[ .-]?(?:s\k<season>e?(?<episode2>\d{2}(?!\d))|\k<season>x?(?<episode2>\d{2}(?!\d))|(?<episode2>\d\d(?!\d))|E(?<episode2>\d\d))|)[ -.]*(?<title>(?!.*sample)[^\\]*?[^\\]*?)\.(?<ext>[^.]*)$
    3    1    regexp    (?<series>[^\\\[]*) - \[(?<season>[0-9]{1,2})x(?<episode>[0-9\W]+)\](( |)(-( |)|))(?<title>(?![^\\]*?sample)[^$]*?)\.(?<ext>[^.]*)
    4    1    regexp    (?<series>[^\\$]*) - season (?<season>[0-9]{1,2}) - (?<title>(?![^\\]*?sample)[^$]*?)\.(?<ext>[^.]*)
    5    1    simple    <series> - <season>x<episode> - <title>.<ext>
    6    1    simple    <series>\Season <season>\Episode <episode> - <title>.<ext>
    7    1    simple    <series>\<season>x<episode> - <title>.<ext>
    8    1    regexp    <series>\\Book\s<season>.*\\<series>\s[<season>x<episode>]\s<title>\.<ext>
    9    1    simple    <series>\Сезон <season>\<series>.<episode>.<ext>
    10    1    simple    <series>\Сезон <season>\<series>_<episode>_<title>.<ext>
    11    1    simple    <series>\Season <season>\<title>.sezon.<episode>.<title>.<ext>
    12    1    regexp    (^.*?\\?(?<series>[^\\$]+?)[ .-]+(?<firstaired>\d{2,4}[.-]\d{2}[.-]\d{2,4})[ .-]*(?<title>(?![^\\]*?(?<!the)[ .(-]sample[ .)-]).*?)\.(?<ext>[^.]*)$)
    13    1    simple    <series>\Season <season>\<series>.DVDRip.<episode>.<ext>
    14    1    simple    <series>\Season <season>\<series>.<episode>.<title>.<ext>
    15    1    simple    <series>\Season <season>\<episode>. <title>.<ext>
    16    1    simple    <series>\Season <season>\<series>.e<episode>.<title>.<ext>
    17    1    simple    <series>\Season <season>\<series>.e<episode>.<ext>
    18    1    simple    <series>\Season <season>\S<season>E<episode> - <title>.<ext>
    19    1    simple    <series>\Season <season>\<episode>.<title>.<ext>
    20    1    simple    <series>\<series>\S<season>E<episode> - <title>.<ext>
    21    1    simple    <series>\<series>. Сезон <season> (2014)\<series>. Сезон <season>.Серия <episode> <title>.<ext>
    24    0    simple    <series>\Vosmidesjatye.<season>.2014.SATRip.NNM-CLUB\<episode>.<title>.<ext>
    25    0    simple    <series>\Season <season>\<series>.(<episode>.из<title>.<ext>
    22    1    simple    <series>.S<season>E\.E<episode>.<title>.<ext>
    23    1    simple    <series>.S<season>\<series>.E<episode>.<title>.<ext>

    M:\Video\Series\Физрук - Fizruk\Fizruk-2.2014.SATRip\Fizruk-2.Seriya.01.SATRip.avi
    M:\Video\Series\Интерны - Interny\Интерны (13 сезон) 2015 WEB-DLRip 720\Интерны 13 - 18 серия.mkv ...
    M:\Video\Series\Восьмидесятые - Vosmidesyatye\80-e.s05.2015.SATRip.by.ivandubskoj\80-e.s05.e01.2015.SATRip.avi ...
    M:\Video\Series\Однажды в Одессе - Odnagdy v Odesse\Season 1\Однажды.в.Одессе-Жизнь.Мишки.Япончика.(01.из.12).2011.XviD.DVDRip.avi ...
    M:\Video\Series\Дело гастронома N 1 - Delo gastronoma N 1\Season 1\Delo.Gastronoma.N1.01.DVDRip.avi
    M:\Video\Series\Легавый - Legavyy\Season 2\Legavyy-2.(01.seriya).2014.WEB-DLRip.(AVC).mkv
    M:\Video\Series\Пепел - Pepel\Season 1\Пепел - 01 серия.mpg
    M:\Video\Series\Последний моcкаль - Останній моcкаль\Season 1\Ostanniy.moskal.WEB.400p\Ostanniy.moskal.01.WEB.400p\Ostanniy.moskal.01.WEB.400p.avi
    M:\Video\Series\Слуга народа - Sluga Naroda\Слуга народа (2015)\Слуга народа (01 серия) SATRip.avi
    M:\Video\Series\Фиксики - Fiksiki\Season 1\01. Компакт-диск.avi
    M:\Video\Series\Фиксики - Fiksiki\Season 2\01. Команда.avi
    M:\Video\Series\Фиксики - Fiksiki\Special\Бонус 10. Помогатор.avi

    To start think enough :)
     

    Users who are viewing this thread

    Top Bottom