DVB-S EPG wrong character encoding for Hungarian language (2 Viewers)

mm1352000 · January 6, 2014

I was wondering if it might be possible to specify the endianess (byte order) for decoding by using a BOM. Seems it is not possible for ISO 6937...

Vasilich · January 6, 2014

mm1352000 said:
I was wondering if it might be possible to specify the endianess (byte order) for decoding by using a BOM

no, it is not possible, as ISO6937 clearly specifies byte order, not assuming that both byte orders can be used, and .NET implementation just does it wrong.

The characters which are not represented in the primary set are coded on two bytes. The first byte the "non spacing diacritical mark" is followed by a letter from the base set e.g.:
small e with acute accent (é) = [Acute]+e

taken from http://en.wikipedia.org/wiki/ISO/IEC_6937#Two_byte_characters

Vasilich · January 6, 2014

So here is the code for testing. It uses prepared text file with almost all two-byte chars defined in ISO 6937. The code for converting isn't the latest, but pretty close to it. Use it to see the wrong decoding with 20269.

gurabli · January 17, 2014

Vasilich said:
So here is the code for testing. It uses prepared text file with almost all two-byte chars defined in ISO 6937. The code for converting isn't the latest, but pretty close to it. Use it to see the wrong decoding with 20269.

Hi
I'm sorry for being away for such a long time, but since my little girl has been born in December you can imagine how much free time I have

I read your posts (although do not understand many of it, doesn't matter). I hope this will be fixes somehow

Can I try something with this code or it is not intended for me? I will do my best to help, but I will definitely have less free time in the upcoming months.

gurabli · January 25, 2014

@Vasilich : I wonder if you had any time to look into this issue?

I have just upgraded to MP 1.6, and now I'm not able to test the TVLibrary files, as they were for the 1.5 version. Of course, UPC EPG is wrong now (Digi is fine). Could you please send me the library files for 1.6?

Thanks!

mm1352000 · February 9, 2014

@Vasilich

I've been working on integrating your code into TVE 3.5. I found a few more bugs and have a few more questions...

1. Bug: decoder used for ISO/IEC 8859-13 and 8859-15 (3 byte encoding) is wrong.
https://github.com/MediaPortal/Medi...TvLibrary.Interfaces/DvbTextConverter.cs#L135
Should be 28603 and 28605, not 28591.

2. Bug (?): according to EN 300 468 annex A figure A1 character conversion for ISO/IEC 6937-1 character 0xd0 should be 0x2015 not 0x2014:
https://github.com/MediaPortal/Medi...mentations/DVB/Graphs/Iso6937ToUnicode.cs#L86

3. Bug (?): according to EN 300 468 annex A figure A1 character conversion for ISO/IEC 6937-1 character 0xe2 should be 0x0110 not 0xd0:
https://github.com/MediaPortal/Medi...entations/DVB/Graphs/Iso6937ToUnicode.cs#L128

4. Question: regarding ISO/IEC 6937-1...
I'm a bit confused about how this support should be implemented. The comment under EN 300 468 annex A table A1 says "This table is a superset of ISO/IEC 6937..." but it only shows single byte characters. Our Iso6937ToUnicode.cs class clearly treats 0xc* as two byte characters... but in EN 300 468 annex A figure A1 (version 1.13.1 or 1.14.1 are best), 0xc1..0xc9, 0xca, 0xcb and 0xcd..0xcf have a single byte mapping for diacritical characters. Which is the correct interpretation - EN 300 468 or our class?

Vasilich · February 10, 2014

mm1352000 said:
1. Bug: decoder used for ISO/IEC 8859-13 and 8859-15 (3 byte encoding) is wrong.
https://github.com/MediaPortal/Medi...TvLibrary.Interfaces/DvbTextConverter.cs#L135
Should be 28603 and 28605, not 28591.

yes, i also have fixed it in my code for tests

mm1352000 said:
2. Bug (?): according to EN 300 468 annex A figure A1 character conversion for ISO/IEC 6937-1 character 0xd0 should be 0x2015 not 0x2014:
https://github.com/MediaPortal/Medi...mentations/DVB/Graphs/Iso6937ToUnicode.cs#L86

the difference isn't so big (2014 - EmDash "—", 2015 - horisontal bar "―"), but yes, you are correct, there should be 2015. My miss - this was almost completely copied from our existing Iso6937ToUnicode.cs from TVE3

mm1352000 said:
3. Bug (?): according to EN 300 468 annex A figure A1 character conversion for ISO/IEC 6937-1 character 0xe2 should be 0x0110 not 0xd0:
https://github.com/MediaPortal/Medi...entations/DVB/Graphs/Iso6937ToUnicode.cs#L128

yes, i checked also by IEC 6937 r2001 - it states "14/02 = LATIN CAPITAL LETTER D WITH STROKE", and "Đ" has code 0110. The same failure - I copied it from our existing tve3 code.

mm1352000 said:
The comment under EN 300 468 annex A table A1 says "This table is a superset of ISO/IEC 6937..." but it only shows single byte characters

i believe that in ETSI they didn't want to put all 332 chars covered in IEC 6937 into their document, so they put a remark "light pink non-spacing symbols (diacritical marks)", mentioning that these symbols put these diacritical signs to the following letter. THis is not very precise expression, but from what i have seen in logs from gurabli the encoding "characted table 00" in ETSI is the IEC 6937 with one extra char €.

I attached 2 files from my tests - just to compare if you wanted to implement this in similar way.

gurabli · February 10, 2014

@Vasilich Looking at your discussion with mm1352000, it's all Greek for me

Not that it was intended for me to understand

Nice to see that there is progress on this issue! I very much hope it will be resolved in some of the upcoming versions.

The files attached to your last post are the ones I can also try with MP 1.6 final?

Vasilich · February 10, 2014

no, these are source files.
I still can't resolve problems with updating my local patched GIT repository so sorry that i cannot supply you with patched files for 1.6

Will do so as soon as i get my local repo updated.
maybe @mm1352000 can tell us if he already fixed code in tve35 getString468A in TsWriter DvbUtil.cpp - then we need to patch this part also

gurabli · February 10, 2014

Vasilich said:
no, these are source files.
I still can't resolve problems with updating my local patched GIT repository so sorry that i cannot supply you with patched files for 1.6 Will do so as soon as i get my local repo updated.
maybe @mm1352000 can tell us if he already fixed code in tve35 getString468A in TsWriter DvbUtil.cpp - then we need to patch this part also

OK, I understand! Do not make the patch for 1.6 a priority, only if you have time and you need some logs from my side. Thank you!

DVB-S EPG wrong character encoding for Hungarian language (2 Viewers)

mm1352000

Retired Team Member

Vasilich

Portal Pro

Vasilich

Portal Pro

Attachments

gurabli

Portal Pro

gurabli

Portal Pro

mm1352000

Retired Team Member

Vasilich

Portal Pro

Attachments

gurabli

Portal Pro

Vasilich

Portal Pro

gurabli

Portal Pro

Users who are viewing this thread