- March 24, 2007
- 12,073
- 7,459
- Home Country
- Germany
- Moderator
- #1
Currently all regular expressions used inside our MDEs are hard coded. There are requests to make them extensible:
Question at @MJGraf:
This we would need for putting them into settings and sending them to server (via ServerSettings).
I could also image some more attributes like
the "Enabled" attribute could be used for temporary disabling patterns without needing them to be removed/commented from code. Construction of an RegExp by (string, RegexOptions) is possible when we deserialize this type. With a custom class we probably not even need to implement own XML serialization, just using strings and enum and creating the RegEx instance once both properties are set.
How should the extension be provided to users? Once the serialization is implemented, the user could manually edit .xml files inside configf folder. Would that be enough for advanced users?
If not, how could a GUI in MP2 could look like to edit RegExs?
The change itself is quite easy to make, but a config GUI would be quite complicated I think.
@Developers @Testers
It makes no sense, they are very specific. And will be used only in 1% of cases. And then not always. I have a lot of rules for Russian series. And each rule applies for one of the two series. There is also a replacement rules that simplify the work of imports.
But it all individually. And MP1 allows me to set it up.
Question at @MJGraf:
did you already implement a serialization for Regex instances?I was just about to make the regular expression I use for parsing an IMDB-ID a setting and had some interesting reading about the C# regex class. Although it is probably only interesting for developers, I want to write it down somewhere so that it is not forgotten.
So the question was: How can we serialize a C# Regex object. The Regex class implements ISerializable - but unfortunately not IXmlSerializable, which we need for our settings system. My first thought was that we just inherit from the Regex class and implement IXmlSerializable on our own (something like public class XmlSerializableRegex : Regex, IXmlSerializable). The problem here is that once a Regex object is created, it is immutable - meaning you cannot change its regular expression anymore. But during deserialization, first an object is created and then its fields are set, which doesn't work here. According to the MS reference source, it is not possible that we manually do this, because the classes we need to initialize all the fields of the Regex class are "internal".
The next idea was to just serialize the regular expression as string and create a Regex object from that on demand. The downside of this approach is that we don't serialize the RegexTimeout (which IMHO would be acceptable) but also not the RegexOptions (such as IgnoreCase or InvariantCulture). These would then still be hardcoded which I don't think is a good idea.
The best solution I can currently think of is wrapping a Regex object into our own class that implements IXmlSerializable and besides this only as one (readonly) property: Regex. Using this regex then requires a call like OurSettingsObject.ImdbRegex.Regex.Match(...), which is not extremely beautiful, but IMHO acceptable.
And there is another advantage of this:
We currently create lots of Regex objects on demand as local variables. This is anything but ideal from a performance perspective. The constructor of the Regex class compiles the regular expression, which is why instantiating a Regex object is a relatively time consuming operation (ok, we are talking about milliseconds here, but if we instantiate 10 Regex objects per MediaItem in an import and the import has tens of thousands of MediaItems, we suddenly talk about several minutes...).
For that reason, there is an internal cache for compiled Regex objects in the Regex class. But this cache is according to MSDN only used, when calling the static methods of the Regex class (which we currently don't do). Instantiated Regex objects are not cached. But calling the static methods instead might also not be ideal, as by default, the cache has a maximum size of 15 Regex objects, which for the whole MP2 system might not be enough. We can set a higher size limit, but in the end it is quite difficult to estimate the maximum number of Regex objects in the MP2 system.
Now if we use the last approach above, we basically use our settings cache to cache our own Regex objects (meaning: every (XmlSerializable)Regex we have in a settings file is automatically cached so that we only have one instance of it and compilation only has to happen once). According to MSDN, the whole Regex class is thread safe so that the whole MP2 system can use that single instance of the Regex object without problems.
Conclusion: I'll implement the last approach described above. I'll for now put the class in the OnlineLibraries project as there we will likely need it most. We can later move it to MediaPortal.Utilities.Xml, but this (if I finally understood it correctly) requires a version bump of MediaPortal.Utilities so that it might be better to do that in the end...
If anyone has more experience with the Regex class than me and disagrees with the above, please let me know...
This we would need for putting them into settings and sending them to server (via ServerSettings).
I could also image some more attributes like
XML:
<Pattern Enabled="True" Code="(?<series>[^\\]*)\\[^\\]*(?<seasonnum>\d+)[^\\]*\\S*(?<seasonnum>\d+)[EX](?<episodenum>\d+)*(?<episode>.*)\." Option="IgnoreCase" />
How should the extension be provided to users? Once the serialization is implemented, the user could manually edit .xml files inside configf folder. Would that be enough for advanced users?
If not, how could a GUI in MP2 could look like to edit RegExs?
The change itself is quite easy to make, but a config GUI would be quite complicated I think.
@Developers @Testers