| |||||||
| WebEPG everything related to WebEPG in here |
![]() |
| | Thread Tools | Display Modes |
| | #1 (permalink) |
| Portal Member Join Date: Feb 2007 Location: Athens Age: 37
Posts: 9
Thanks: 0
Thanked 0 Times in 0 Posts
Country: | Hi, I've been using MP for some 6 months now and I love it. I live in Greece and lately I tried creating some grabber files for Greek channels to supplement the one included (GR/www_in_gr.xml which i have also enhanced to include descriptions and genres as well as more channels). These new grabber files use the websites of the respective broadcasters but unfortunately the sites use all-caps for the program titles. WebEPG tries to title-case those tiltes and uses the standard .NET ToLower() method. Unfortunately .NET does not correctly handle some special cases in Greek and IIRC some other languages. Specifically for Greek lower casing the letter Sigma is context sensitive: it becomes "lower case Sigma Final" if it is at the end of the word but "lower case Sigma Not_Final" otherwise (i.e. in the middle of the word). ToLower() incorrectly always turns it to "lower case Sigma Not_Final". And although the meaning is not altered (as happens in some other languages) it is still plain wrong (imagine if HELLO was title cased as HellO: you can still understand the meaning but it doesn't seem right does it?) I could patch this in WebEPG (just replace non_final with final sigma if it is at the end of a word) but since there are special cases in other languages too, perhaps there should be a more structured way to handle this (e.g. an extensible class in Utils to handle special cases of case folding) As a side note: I noticed that (almost) all comparisons of program titles, genres and channel names are binary which makes them fast but case and accent sensitive (e.g. if I schedule to record a program "every time" but then the site changes the case of the titles, the program is no longer considered to be the same and it is not recorded) Regards, Panayotis PS: I will post the grabber files once finished for those interested. |
| | |
| | #2 (permalink) |
| Portal Developer Join Date: May 2005 Location: Switzerland
Posts: 1,348
Thanks: 4
Thanked 55 Times in 34 Posts
| Hi Panayotis, Thanks for your detailed comments. Would you be able to test the following code with your example: CultureInfo cultureInfo = Thread.CurrentThread.CurrentCulture; TextInfo textInfo = cultureInfo.TextInfo; string titlecase = textInfo.ToTitleCase(uppercase); This I believe should use the cultural information to perform the title case conversion. If this works then I will add it to WebEPG. If you are not able to test it, can you provide me with the example text. Thanks, /James Last edited by James; 2007-07-05 at 10:16. |
| | |
| | #3 (permalink) |
| Portal Member Join Date: Feb 2007 Location: Athens Age: 37
Posts: 9
Thanks: 0
Thanked 0 Times in 0 Posts
Country: | Hi james, I tried your code this morning but it didn't work. I also remember reading some article about Greek Final Sigma special casing rules in Unicode. According to the article the special casing rules where initialy included in the Unicode draft but later on they where dropped to avoid complexity. However the current version of Unicode includes those rules. Anyway it seems the NET team decided not to implement those rules (regardless of Unicode standards). Actually, I think the problem is not with ToTitleCase() but ToLower(). ToTitleCase() takes a lower case string and uppercases the first letter of each word. If you pass an upper case string it returns it unchanged. In WebEPG, when a title with upper case only letters is found it is first turned to lower case via ToLower() and then the result is fed to ToTitleCase(). It is ToLower that fails to properly lower case Greek Sigmas. E.g. (hope you can see Greek characters) "ΙΣΩΣ" should become "ισως" but ToLower() returns "ισωσ" ("Σ" becomes "σ" in the middle of a word but "ς" at the end) Along the above notes I have also tried the following code (that didn't work either): Code: CultureInfo cultureInfo = Thread.CurrentThread.CurrentCulture; TextInfo textInfo = cultureInfo.TextInfo; string titlecase = textInfo.ToTitleCase(textInfo.ToLower(uppercase)); The only way I could make it work is using RegEx: Code: CultureInfo cultureInfo = Thread.CurrentThread.CurrentCulture;
TextInfo textInfo = cultureInfo.TextInfo;
Regex re = new Regex("\\u03c3(?=($|\\W))");
string titlecase = textInfo.ToTitleCase(re.Replace(textInfo.ToLower(uppercase), "\u03c2"));
Panayotis |
| | |
| | #4 (permalink) |
| Portal Developer Join Date: May 2005 Location: Switzerland
Posts: 1,348
Thanks: 4
Thanked 55 Times in 34 Posts
| Thanks for the info and test. I was wondering if it is better to leave these titles in upper case? I made this system because in many languages all upper case looks bad, but maybe that is not the case in Greek? |
| | |
| | #5 (permalink) |
| Portal Member Join Date: Feb 2007 Location: Athens Age: 37
Posts: 9
Thanks: 0
Thanked 0 Times in 0 Posts
Country: | Hi James, Actually it does look bad in Greek too. However, lower casing Greek is really hard, not just because of the final sigma. In fact, in Greek, all uppercase is never accented but mixed case and lower case (almost) always is. And you need a dictionary to know where to put the accent, so we just settle with simple handling of the final sigma. Anyway, I believe adding an option in the grabber xml to leave the titles as they are, is a good thought. The option could be per grabber or per template. |
| | |
| | #6 (permalink) |
| Portal Developer Join Date: May 2005 Location: Switzerland
Posts: 1,348
Thanks: 4
Thanked 55 Times in 34 Posts
| Hi Panayotis, I've modified the actions to support regex, so adding: <Modify channel="*" field="TITLE" search="\\u03c3(?=($|\\W))" action="Replace">\u03c2</Modify> Should work. See the wiki for more details about where the modify actions are added in the grabber file. Cheers, /James |
| | |
| | #7 (permalink) |
| Portal Member Join Date: Feb 2007 Location: Athens Age: 37
Posts: 9
Thanks: 0
Thanked 0 Times in 0 Posts
Country: | Hi James, I am sorry I couldn't reply earlier. I just found some time to get the new version and try it out. The change you made does the job nicely. I hope I have the new grabbers ready pretty soon. Thanks, Panayotis |
| | |
![]() |
| Bookmarks |
| Tags |
| greek, proper, titlecasing, webepg |
| Thread Tools | |
| Display Modes | |
| |
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| How do I get Really Proper Gapless and Playlists | Eddy5 | Support | 2 | 2007-08-07 16:22 |
| Forum: after editing title of first post, the whole thread title is NOT edited | chefkoch | Website/Forum/Wiki Feedback | 17 | 2006-11-15 07:39 |
| XMLTV not showing proper show times | hobbes487 | WebEPG | 11 | 2006-05-18 16:06 |
| How to keep the proper video capture settings? | weeweewee | Codecs, External Players | 0 | 2005-12-29 13:21 |
| Proper aspect ratios for oddball tv's. | Anonymous | Tips and Tricks | 0 | 2004-11-02 20:05 |