[fixed] Sorting Order is not "natural" (2 Viewers)

MJGraf

Retired Team Member
  • Premium Supporter
  • January 13, 2006
    2,478
    1,385
    That's funny - I never realized that computers can sort strings this way... Now Lenny Kravitz "5" comes before Herbert Grönemeyer "12" :D

    The import test now took about 29:30 - so four minutes longer than without the natural number sorting and between 9 and 10 minutes longer than with just bitwise sorting. It is not nice, but taking into account the overkill we are doing here (SQLite is written in C, we use System.Data.SQLite as layer to C# and now we provide a sorting algorithm from C# via a P/Invoke to native C code - what a marshalling mess :D ) it seems acceptable.

    The code is still ugly, so this is only the first test version. There will surely be another one, once the code is clean. But could you please test whether it does what it is supposed to do? In particular whether numbers are sorted 1, 2, 3, 10, 200 instead of 1, 10, 2, 200, 3, also whether casing is still ignored and umlauts or other special characters are treated as you would expect it?

    I don't have too many examples to try this. But here are some examples in the order they are sorted now and it seems quite "natural" to me:
    #1 Hits ["#" is a special character that comes before numbers]
    1
    '03 Bonnie & Clyde [Apostrophs are ignored and "03" is treated like "3"]
    10 000 Hz Legend [see below...]
    10 Things I hate about you
    A Beautiful Mind
    Bravo Hits 8 [8 comes before 12 even if it appears later in a string and the two strings are identical before the number]
    Bravo Hits 12 [yes, I'm old :D ]

    The only thing I had to look twice at was "10 000 Hz Legend" but I think it is correct anyway because there is a space between "10" and "000"...

    So happy testing for now!
    Michael

    [Edit: Binaries removed. New test version some posts down]
     
    Last edited:

    MJGraf

    Retired Team Member
  • Premium Supporter
  • January 13, 2006
    2,478
    1,385
    Hi Valk,
    Couldn't we add extra indexes for the extra cultures? I imagine a situation where we do our import and once that process is complete the culture based indexes get updated.
    Well, in theory that's possible (at least with SQLite) - but that's nothing we can do purely from a database perspective.
    In SQLite we can use COLLATE together with CREATE INDEX. So it's possible to use different collations for otherwise identical indexes if we choose different names for the indexes. But then we have to specify for every SELECT statement which index to use (which probably requires a major rewrite of the MediaLibrary and its surroundings) and then MP2 Client would have to transfer with every MediaItem query to the server which culture to use (which would require a rework on the client and UPnP system as well). So generally we can do that but that's a HELL lot of work.
    And the question is whether it is worth that. because we are talking about a case here in which a user wants to have different cultural sorting sequences for different MP2 clients using the same MP2 server. That may be necessary if MP2 Server is located in Germany, Client1 is located in Denmark and Client 2 is located in China - or if the user suffers from schizophrenia insofar as he can only follow a German sort order as long as he is in the living room but completely loses this ability and can only understand Hindu sort order as soon as he is in the bedroom... The latter case may be more likely :D
    To cut a long story short: I'm happy to do this on the database side - but I would need lots of help for all the other affected parts of MP2
     

    MJGraf

    Retired Team Member
  • Premium Supporter
  • January 13, 2006
    2,478
    1,385
    Oh and for those interested: The test version above uses

    locale = LOCALE_NAME_INVARIANT
    flags = LINGUISTIC_IGNORECASE | SORT_DIGITSASNUMBERS
    Definition as per here: http://msdn.microsoft.com/en-us/library/windows/desktop/dd317761(v=vs.85).aspx

    From the comments I see that maybe we should additionally use NORM_LINGUISTIC_CASING to take care of the "turkish i" problem. But I have no clue about all this and we would need someone to test it...
     

    Valk

    Portal Pro
    February 25, 2006
    302
    108
    Home Country
    Australia Australia
    Well I'll keep it in mind as I work on the media library as I'll be reworking the queries already for the limits. As for UPNP since the user can't specify that probably best to stick with the "default" so hopefully that'll have minimal if any changes.
     

    MJGraf

    Retired Team Member
  • Premium Supporter
  • January 13, 2006
    2,478
    1,385
    Thanks, Valk! But this one I don't understand:
    As for UPNP since the user can't specify that probably best to stick with the "default" so hopefully that'll have minimal if any changes.
    Don't we have a possibility in the MP2 Client to chose the culture - which is also responsible for changing the language? I never use this as I'm fine with English, but I thought that's implemented. Need to have a look at our localization implementation...
    At least this was my fear above: I thought the user can choose a different culture for every MP2 Client - which makes things really difficult. If there is only one culture to be chosen on the server side, things are much easier. In that case I would just recreate the indexes every time a user changes the culture. No need for multiple indexes then...
     

    morpheus_xx

    Retired Team Member
  • Team MediaPortal
  • March 24, 2007
    12,073
    7,459
    Home Country
    Germany Germany
    Yes, each MP2 component (clients + server) do have their own LocalizationSettings (not sure about right class name), and they are not synched. In fact there is no way to change the defaults for server currently, so it will take Windows default culture.
     

    Valk

    Portal Pro
    February 25, 2006
    302
    108
    Home Country
    Australia Australia
    Ah! I'm a bit out of touch (forgot we use UPNP to communicate with the client when I wrote the comment).

    Not sure if it's defined in specification for UPNP to tell the server what the local is but if there is a provision we should implement and use it. Otherwise maybe add a value to the HTTP header for the server to use.

    Anyway some proper research is required and not my rambling :p
     

    MJGraf

    Retired Team Member
  • Premium Supporter
  • January 13, 2006
    2,478
    1,385
    And every time I dig deeper into some (for me) new part of the MP2 code base I'm surprised how many things have already been thought of :D
    We even have a LocalizationMessaging channel where we broadcast LanguageChangedMessages when the user changes the culture...
     

    Valk

    Portal Pro
    February 25, 2006
    302
    108
    Home Country
    Australia Australia
    And every time I dig deeper into some (for me) new part of the MP2 code base I'm surprised how many things have already been thought of :D
    We even have a LocalizationMessaging channel where we broadcast LanguageChangedMessages when the user changes the culture...

    I get surprised by this also. So much so that I was surprised when I didn't see anything related to data virtualisation at least commented in the code.
     

    MJGraf

    Retired Team Member
  • Premium Supporter
  • January 13, 2006
    2,478
    1,385
    ok, code is cleaned up and import test ran without problems for me. New test version attached - changes will be pushed to GitHub in a few minutes.
    I have extended the LexicographicComparer in a way so that it supports different cultures. Currently, however, I still use InvariantCulture. Just wanted to make sure that it can be used also if we decide to support multiple "sort-cultures" in one MP2 system.

    Please test this version whether it does anything you want it to do for now and does not introduce any new bugs.

    Some more thoughts on cultures:
    As it seems, MP2 is designed to use its own culture system - completely independent from the operating system culture. The only overlap seems to be that if no culture has been selected in MP2, yet, MP2 gets the CultureInfo.CurrentUICulture and uses that one as "MP2 culture". This is absolutely the right way imho, because MP2 has its own rendering engine and therefore does not have to rely on anything the operating system does with respect to cultures.
    However, from the user perspective it currently seems that the user can only select the "language" - not the culture. The respective setting is under "Regional\language" in the settings and even in the code, the ILocalization interface is implemented by the StringManager class, which sounds like the respective settings would only have an influence on Strings, i.e. the language.
    This is wrong already now. I usually use my own builds of MP2 made without any transiflex languages because I'm fine with MP2 using English language. When I startup MP2, as I explained above, MP2 uses CultureInfo.CurrentUICulture, which is German (from the log: "de-DE") for me. But there are no German language files so that MP2 uses the English ones as fallback. What I realized only now, after having a closer look at it, is that MP2 Client nevertheless displays "Sonntag, 24. November 2013" in its upper right corner. That means that MP2 not only uses German language (which it cannot, because there are no such language files and therefore absolutely correctly falls back to English), it uses the German culture, which not only includes the language, but also the formatting of date/time strings and numbers as well as "CompareInfo", which is responsible for sorting.
    So already now we not really select the "language", we select the "culture". Even more if we decide to make MP2 sorting dependent on this choice, we should make clear to the user that he not only selects the "language" but that this setting has influence on more parts of the MP2 behavior.

    Now what is necessary to make sorting culture dependent (at least when SQLite is used as a database):
    Unlike other "bigger" database systems, SQLite does not have included culture sensitive collations. There is an extension to SQLite which does have such, but we cannot use them because they do not take care of natural number sorting. That is why we use our own collation now.
    Other database systems (such as MySQL and SQLCE) do have inbuilt culture-sensitive collations. But they also do not take into account natural number sorting. I have no clue whether it is possible for these other database systems to provide self-made collation sequences. So the following only applies to SQLite for now.
    In SQLite you can specify the collation (at least) in three different places:
    • In the CREATE TABLE statement for every field, a collation can be specified. This is what we currently do in the SQLiteDatabase test builds above for fields that hold a char, a VarLengthString or a FixedLengthString. If we set the collation here and later create an index without specifying the collation, the index will automatically use the collation that was specified in CREATE TABLE. Apparently, if we want to support different collations for different cultures, we cannot use this anymore, because you can only specify one collation for one field in a table.
    • In the CREATE INDEX statement. As mentioned before, we could create different indexes with the same fields, but different collations and give these indexes different names. In select statements we would then probably (I haven't found anything about multiple identical idexes with different collations in the internet so far) have to specify the index to be used for every table.
    • In the SELECT statement in the WHERE clause and after ORDER BY. This would probably be the easiest solution and from what I have found in the internet (http://sqlite.1065341.n5.nabble.com/Custom-collate-on-field-or-index-or-both-td20047.html) it seems that SQLite would then automatically use the correct index. But to be hones, I don't believe this unless either Richard Hipp (the developer of SQLite) has confirmed this personally or we have tested this on our own...
    To implement this, it is therefore probably necessary to do the following:
    • We need to tell the server somehow that a client changed its culture - most likely with a UPnP statevariable.
    • If a client (or the server) changes its culture, we need to register a new collation at SQLite for that specific culture if such is not yet present. Ideally we would also create all the indexes we have in the database once more but with a different collation.
    • If a client (or the server itself) issues a media item query to the database, we need to know the culture to be applied. Either we carry the culture info within the media item query, or we make sure that the server knows the culture of every client itself and so the media item query only needs to know the ID of the respective client and can choose the culture on its own.
    • For the database side we have to amend ISQLDatabase and extend it by something like "CreateOrderByStatement(String field, CultureInfo cultureInfo). Additionally some of the other methods of ISQLDatabase would have to be amended to also take a CultureInfo object as parameter to amend the returned strings accordingly.
    • The MediaLibrary would have to be amended to provide the respective CultureInfo to ISQLDatabase if needed.
    • And most importantly we need people to test all this - because to be honest, the current InvatiantCulture already has so many "natural" things inbuilt, that at least for me it will be hard to test whether there is any difference when choosing a different culture...
    As I said, I'm happy to help from the database side with SQLiteDatabase. But by no means I can do all this alone. To be honest, I'm not sure whether it is worth all the necessary work in the current state of MP2. There are so many other things that I would consider more important. The only important thing is IMHO that we keep this in mind and make our current implementations in a way so that this can be added later...

    Now back to basic and happy testing!
    Michael
     

    Attachments

    • SQLiteDatabase_natural number sort_v2.7z
      478.5 KB

    Users who are viewing this thread

    Top Bottom