That's funny - I never realized that computers can sort strings this way... Now Lenny Kravitz "5" comes before Herbert Grönemeyer "12"
The import test now took about 29:30 - so four minutes longer than without the natural number sorting and between 9 and 10 minutes longer than with just bitwise sorting. It is not nice, but taking into account the overkill we are doing here (SQLite is written in C, we use System.Data.SQLite as layer to C# and now we provide a sorting algorithm from C# via a P/Invoke to native C code - what a marshalling mess ) it seems acceptable.
The code is still ugly, so this is only the first test version. There will surely be another one, once the code is clean. But could you please test whether it does what it is supposed to do? In particular whether numbers are sorted 1, 2, 3, 10, 200 instead of 1, 10, 2, 200, 3, also whether casing is still ignored and umlauts or other special characters are treated as you would expect it?
I don't have too many examples to try this. But here are some examples in the order they are sorted now and it seems quite "natural" to me:
#1 Hits ["#" is a special character that comes before numbers]
1
'03 Bonnie & Clyde [Apostrophs are ignored and "03" is treated like "3"]
10 000 Hz Legend [see below...]
10 Things I hate about you
A Beautiful Mind
Bravo Hits 8 [8 comes before 12 even if it appears later in a string and the two strings are identical before the number]
Bravo Hits 12 [yes, I'm old ]
The only thing I had to look twice at was "10 000 Hz Legend" but I think it is correct anyway because there is a space between "10" and "000"...
So happy testing for now!
Michael
[Edit: Binaries removed. New test version some posts down]
The import test now took about 29:30 - so four minutes longer than without the natural number sorting and between 9 and 10 minutes longer than with just bitwise sorting. It is not nice, but taking into account the overkill we are doing here (SQLite is written in C, we use System.Data.SQLite as layer to C# and now we provide a sorting algorithm from C# via a P/Invoke to native C code - what a marshalling mess ) it seems acceptable.
The code is still ugly, so this is only the first test version. There will surely be another one, once the code is clean. But could you please test whether it does what it is supposed to do? In particular whether numbers are sorted 1, 2, 3, 10, 200 instead of 1, 10, 2, 200, 3, also whether casing is still ignored and umlauts or other special characters are treated as you would expect it?
I don't have too many examples to try this. But here are some examples in the order they are sorted now and it seems quite "natural" to me:
#1 Hits ["#" is a special character that comes before numbers]
1
'03 Bonnie & Clyde [Apostrophs are ignored and "03" is treated like "3"]
10 000 Hz Legend [see below...]
10 Things I hate about you
A Beautiful Mind
Bravo Hits 8 [8 comes before 12 even if it appears later in a string and the two strings are identical before the number]
Bravo Hits 12 [yes, I'm old ]
The only thing I had to look twice at was "10 000 Hz Legend" but I think it is correct anyway because there is a space between "10" and "000"...
So happy testing for now!
Michael
[Edit: Binaries removed. New test version some posts down]
Last edited: