MediaPortal Forums HTPC/MediaCenter

Go Back   MediaPortal Forum » MediaPortal 1 » Main Features (talk, share your ideas, get support) » Electronic Program Guide » WebEPG


WebEPG everything related to WebEPG in here

Reply
 
Thread Tools Display Modes
Old 2007-07-05, 09:54   #1 (permalink)
Portal Member
 
Join Date: Feb 2007
Location: Athens
Age: 37
Posts: 9
Thanks: 0
Thanked 0 Times in 0 Posts

Country:

My System

Default WebEPG and proper Title-casing in Greek

Hi,
I've been using MP for some 6 months now and I love it.
I live in Greece and lately I tried creating some grabber files for Greek channels to supplement the one included (GR/www_in_gr.xml which i have also enhanced to include descriptions and genres as well as more channels).
These new grabber files use the websites of the respective broadcasters but unfortunately the sites use all-caps for the program titles. WebEPG tries to title-case those tiltes and uses the standard .NET ToLower() method. Unfortunately .NET does not correctly handle some special cases in Greek and IIRC some other languages. Specifically for Greek lower casing the letter Sigma is context sensitive: it becomes "lower case Sigma Final" if it is at the end of the word but "lower case Sigma Not_Final" otherwise (i.e. in the middle of the word). ToLower() incorrectly always turns it to "lower case Sigma Not_Final". And although the meaning is not altered (as happens in some other languages) it is still plain wrong (imagine if HELLO was title cased as HellO: you can still understand the meaning but it doesn't seem right does it?)
I could patch this in WebEPG (just replace non_final with final sigma if it is at the end of a word) but since there are special cases in other languages too, perhaps there should be a more structured way to handle this (e.g. an extensible class in Utils to handle special cases of case folding)

As a side note: I noticed that (almost) all comparisons of program titles, genres and channel names are binary which makes them fast but case and accent sensitive (e.g. if I schedule to record a program "every time" but then the site changes the case of the titles, the program is no longer considered to be the same and it is not recorded)

Regards,
Panayotis

PS: I will post the grabber files once finished for those interested.
arion_p is offline   Reply With Quote
Old 2007-07-05, 10:13   #2 (permalink)
Portal Developer
 
Join Date: May 2005
Location: Switzerland
Posts: 1,348
Thanks: 4
Thanked 55 Times in 34 Posts


Default

Hi Panayotis,

Thanks for your detailed comments. Would you be able to test the following code with your example:


CultureInfo cultureInfo = Thread.CurrentThread.CurrentCulture;
TextInfo textInfo = cultureInfo.TextInfo;
string titlecase = textInfo.ToTitleCase(uppercase);


This I believe should use the cultural information to perform the title case conversion. If this works then I will add it to WebEPG. If you are not able to test it, can you provide me with the example text.

Thanks,

/James

Last edited by James; 2007-07-05 at 10:16.
James is offline   Reply With Quote
Old 2007-07-06, 10:51   #3 (permalink)
Portal Member
 
Join Date: Feb 2007
Location: Athens
Age: 37
Posts: 9
Thanks: 0
Thanked 0 Times in 0 Posts

Country:

My System

Default

Hi james,

I tried your code this morning but it didn't work. I also remember reading some article about Greek Final Sigma special casing rules in Unicode. According to the article the special casing rules where initialy included in the Unicode draft but later on they where dropped to avoid complexity. However the current version of Unicode includes those rules. Anyway it seems the NET team decided not to implement those rules (regardless of Unicode standards).

Actually, I think the problem is not with ToTitleCase() but ToLower(). ToTitleCase() takes a lower case string and uppercases the first letter of each word. If you pass an upper case string it returns it unchanged. In WebEPG, when a title with upper case only letters is found it is first turned to lower case via ToLower() and then the result is fed to ToTitleCase(). It is ToLower that fails to properly lower case Greek Sigmas.
E.g. (hope you can see Greek characters)
"ΙΣΩΣ" should become "ισως" but ToLower() returns "ισωσ"
("Σ" becomes "σ" in the middle of a word but "ς" at the end)

Along the above notes I have also tried the following code (that didn't work either):
Code:
CultureInfo cultureInfo = Thread.CurrentThread.CurrentCulture;
TextInfo textInfo = cultureInfo.TextInfo;

string titlecase = textInfo.ToTitleCase(textInfo.ToLower(uppercase));
I also tried specific Culture (both "el" and 1032) and InvariantCulture (shouldn't work anyway)

The only way I could make it work is using RegEx:

Code:
CultureInfo cultureInfo = Thread.CurrentThread.CurrentCulture;
TextInfo textInfo = cultureInfo.TextInfo;
Regex re = new Regex("\\u03c3(?=($|\\W))");

string titlecase = textInfo.ToTitleCase(re.Replace(textInfo.ToLower(uppercase), "\u03c2"));
Thanks,
Panayotis
arion_p is offline   Reply With Quote
Old 2007-07-06, 22:49   #4 (permalink)
Portal Developer
 
Join Date: May 2005
Location: Switzerland
Posts: 1,348
Thanks: 4
Thanked 55 Times in 34 Posts


Default

Thanks for the info and test.

I was wondering if it is better to leave these titles in upper case?

I made this system because in many languages all upper case looks bad, but maybe that is not the case in Greek?
James is offline   Reply With Quote
Old 2007-07-07, 19:21   #5 (permalink)
Portal Member
 
Join Date: Feb 2007
Location: Athens
Age: 37
Posts: 9
Thanks: 0
Thanked 0 Times in 0 Posts

Country:

My System

Default

Hi James,

Actually it does look bad in Greek too. However, lower casing Greek is really hard, not just because of the final sigma. In fact, in Greek, all uppercase is never accented but mixed case and lower case (almost) always is. And you need a dictionary to know where to put the accent, so we just settle with simple handling of the final sigma.

Anyway, I believe adding an option in the grabber xml to leave the titles as they are, is a good thought. The option could be per grabber or per template.
arion_p is offline   Reply With Quote
Old 2007-07-15, 15:00   #6 (permalink)
Portal Developer
 
Join Date: May 2005
Location: Switzerland
Posts: 1,348
Thanks: 4
Thanked 55 Times in 34 Posts


Default

Hi Panayotis,

I've modified the actions to support regex, so adding:

<Modify channel="*" field="TITLE" search="\\u03c3(?=($|\\W))" action="Replace">\u03c2</Modify>

Should work. See the wiki for more details about where the modify actions are added in the grabber file.

Cheers,

/James
James is offline   Reply With Quote
Old 2007-07-19, 00:12   #7 (permalink)
Portal Member
 
Join Date: Feb 2007
Location: Athens
Age: 37
Posts: 9
Thanks: 0
Thanked 0 Times in 0 Posts

Country:

My System

fixed

Hi James,

I am sorry I couldn't reply earlier. I just found some time to get the new version and try it out. The change you made does the job nicely. I hope I have the new grabbers ready pretty soon.

Thanks,

Panayotis
arion_p is offline   Reply With Quote
Reply

Bookmarks

Tags
greek, proper, titlecasing, webepg

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
How do I get Really Proper Gapless and Playlists Eddy5 Support 2 2007-08-07 16:22
Forum: after editing title of first post, the whole thread title is NOT edited chefkoch Website/Forum/Wiki Feedback 17 2006-11-15 07:39
XMLTV not showing proper show times hobbes487 WebEPG 11 2006-05-18 16:06
How to keep the proper video capture settings? weeweewee Codecs, External Players 0 2005-12-29 13:21
Proper aspect ratios for oddball tv's. Anonymous Tips and Tricks 0 2004-11-02 20:05


All times are GMT +1. The time now is 04:34.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0 Protected by Akismet Blog with WordPress
Advertisement System V2.6 By   Branden