Accents/diacritics in the Freeview (UK) OTA EPG (DVB-T) (1 Viewer)

Alphathon

Portal Pro
July 15, 2010
64
8
Home Country
Scotland Scotland
Hi all. I've noticed an issue with the Freeview EPG regarding how accented characters (á, è etc) are encoded (and by extension how they show up in MediaPortal). This isn't so much an issue with MediaPortal as it is with the EPG data that is being broadcast - it shows up on both my TV cards (Hauppauge Nova-T-500 and BlackGold BGT3620) via both MediaPortal and EPG Centre, as well as at least one of the set-top-boxes downstairs (I have only checked that one; it probably also happens on the others). However, the fact that the issue is caused by what is broadcast doesn't necessarily mean it can't/shouldn't be rectified by MediaPortal. I'm uncertain as to whether this applies to other Freeview transmitters throughout the country, but it certainly applies to the data transmitted from Knock More. EDIT: This seems to be an issue with how how MediaPortal handles non-UTF-8 encoded text. The Freeview EPG seems to be encoded using ISO 6937. AFAIK MP's EPG data is stored as UTF-8 in the database, which causes it to show up incorrectly. ISO 6937 uses 0xC1 - 0xCF to encode combining accent characters, which in UTF-8 correspond to other characters. As a result, when a letter is supposed to have an acute accent (such as the first e in fiancée), it is displayed with a "capital A with circumflex" (Â, i.e. 0xC2) in front of it instead (so fiancée would become fiancÂee). Similarly, letters which are supposed to have grave accents appear with a "capital A with accute" (Á, i.e. 0xC1) in front of them (e.g. the Scottish Gaelic word Fàilte would become FÁailte). This applies to both lower and upper case letters (ÁA = À etc).

I'm pretty sure a similar rule also applies to letters with umlauts/diaereses (although those occur so infrequently that I haven't made a note what actually happens) and likely applies to other diacritics such as circumflexes and tildes as well (although I have no evidence to support this). EDIT: it does apply. The issue applies to both the program title and description (and may well apply to subtitles too; I haven't checked).

I'm currently using a collection of MySQL scheduled tasks to scan the program database and correct these errors every 12 hours, which seems to be working but isn't an ideal solution.

Basically I was wondering whether there was a way to fix the issue through TV Server itself, and if not, whether such functionality could be implemented in a later version of MP (or has been in a newer version than I'm using, which I think is 1.1.3). EDIT: Is there any way to get MediaPortal to auto-convert entries from certain muxes/cards (when the user selects it of course) from ISO 6937 to UTF-8 before storing it? This post is as much to make sure the issue is known about as it is to get a fix (as I said, I have a functional workaround).

I'd also be curious to know exactly why this is happening - it seems like a very odd error to me. As I said, I don't know whether this applies to anyone not picking up the signal from Knock More, so it may well be a very niche issue, and as such it may not be worth the effort to work around it in MP itself. Of course if anyone else is having this issue I'd be more than happy to share the MySQL commands (I've no idea how MS SQL would differ though).
 

Users who are viewing this thread

Top Bottom