DVB-S EPG wrong character encoding for Hungarian language (2 Viewers)

mm1352000

Retired Team Member
  • Premium Supporter
  • September 1, 2008
    21,577
    8,224
    Home Country
    New Zealand New Zealand
    I was wondering if it might be possible to specify the endianess (byte order) for decoding by using a BOM. Seems it is not possible for ISO 6937... :(
     

    Vasilich

    Portal Pro
    August 30, 2009
    3,394
    1,170
    Germany, Mayence
    Home Country
    Russian Federation Russian Federation
    I was wondering if it might be possible to specify the endianess (byte order) for decoding by using a BOM
    no, it is not possible, as ISO6937 clearly specifies byte order, not assuming that both byte orders can be used, and .NET implementation just does it wrong.
    The characters which are not represented in the primary set are coded on two bytes. The first byte the "non spacing diacritical mark" is followed by a letter from the base set e.g.:
    small e with acute accent (é) = [Acute]+e
    taken from http://en.wikipedia.org/wiki/ISO/IEC_6937#Two_byte_characters
     

    Vasilich

    Portal Pro
    August 30, 2009
    3,394
    1,170
    Germany, Mayence
    Home Country
    Russian Federation Russian Federation
    So here is the code for testing. It uses prepared text file with almost all two-byte chars defined in ISO 6937. The code for converting isn't the latest, but pretty close to it. Use it to see the wrong decoding with 20269.
     

    Attachments

    • testDVBEncoding.zip
      383.6 KB

    gurabli

    Portal Pro
    July 20, 2010
    242
    5
    Home Country
    Hungary Hungary
    So here is the code for testing. It uses prepared text file with almost all two-byte chars defined in ISO 6937. The code for converting isn't the latest, but pretty close to it. Use it to see the wrong decoding with 20269.

    Hi
    I'm sorry for being away for such a long time, but since my little girl has been born in December you can imagine how much free time I have:)
    I read your posts (although do not understand many of it, doesn't matter). I hope this will be fixes somehow:)

    Can I try something with this code or it is not intended for me? I will do my best to help, but I will definitely have less free time in the upcoming months.
     

    gurabli

    Portal Pro
    July 20, 2010
    242
    5
    Home Country
    Hungary Hungary
    @Vasilich : I wonder if you had any time to look into this issue?

    I have just upgraded to MP 1.6, and now I'm not able to test the TVLibrary files, as they were for the 1.5 version. Of course, UPC EPG is wrong now (Digi is fine). Could you please send me the library files for 1.6?

    Thanks!
     

    mm1352000

    Retired Team Member
  • Premium Supporter
  • September 1, 2008
    21,577
    8,224
    Home Country
    New Zealand New Zealand
    @Vasilich

    I've been working on integrating your code into TVE 3.5. I found a few more bugs and have a few more questions...

    1. Bug: decoder used for ISO/IEC 8859-13 and 8859-15 (3 byte encoding) is wrong.
    https://github.com/MediaPortal/Medi...TvLibrary.Interfaces/DvbTextConverter.cs#L135
    Should be 28603 and 28605, not 28591.

    2. Bug (?): according to EN 300 468 annex A figure A1 character conversion for ISO/IEC 6937-1 character 0xd0 should be 0x2015 not 0x2014:
    https://github.com/MediaPortal/Medi...mentations/DVB/Graphs/Iso6937ToUnicode.cs#L86

    3. Bug (?): according to EN 300 468 annex A figure A1 character conversion for ISO/IEC 6937-1 character 0xe2 should be 0x0110 not 0xd0:
    https://github.com/MediaPortal/Medi...entations/DVB/Graphs/Iso6937ToUnicode.cs#L128

    4. Question: regarding ISO/IEC 6937-1...
    I'm a bit confused about how this support should be implemented. The comment under EN 300 468 annex A table A1 says "This table is a superset of ISO/IEC 6937..." but it only shows single byte characters. Our Iso6937ToUnicode.cs class clearly treats 0xc* as two byte characters... but in EN 300 468 annex A figure A1 (version 1.13.1 or 1.14.1 are best), 0xc1..0xc9, 0xca, 0xcb and 0xcd..0xcf have a single byte mapping for diacritical characters. Which is the correct interpretation - EN 300 468 or our class?
     

    Vasilich

    Portal Pro
    August 30, 2009
    3,394
    1,170
    Germany, Mayence
    Home Country
    Russian Federation Russian Federation
    1. Bug: decoder used for ISO/IEC 8859-13 and 8859-15 (3 byte encoding) is wrong.
    https://github.com/MediaPortal/Medi...TvLibrary.Interfaces/DvbTextConverter.cs#L135
    Should be 28603 and 28605, not 28591.
    yes, i also have fixed it in my code for tests

    2. Bug (?): according to EN 300 468 annex A figure A1 character conversion for ISO/IEC 6937-1 character 0xd0 should be 0x2015 not 0x2014:
    https://github.com/MediaPortal/Medi...mentations/DVB/Graphs/Iso6937ToUnicode.cs#L86
    the difference isn't so big (2014 - EmDash "—", 2015 - horisontal bar "―"), but yes, you are correct, there should be 2015. My miss - this was almost completely copied from our existing Iso6937ToUnicode.cs from TVE3

    3. Bug (?): according to EN 300 468 annex A figure A1 character conversion for ISO/IEC 6937-1 character 0xe2 should be 0x0110 not 0xd0:
    https://github.com/MediaPortal/Medi...entations/DVB/Graphs/Iso6937ToUnicode.cs#L128
    yes, i checked also by IEC 6937 r2001 - it states "14/02 = LATIN CAPITAL LETTER D WITH STROKE", and "Đ" has code 0110. The same failure - I copied it from our existing tve3 code.

    The comment under EN 300 468 annex A table A1 says "This table is a superset of ISO/IEC 6937..." but it only shows single byte characters
    i believe that in ETSI they didn't want to put all 332 chars covered in IEC 6937 into their document, so they put a remark "light pink non-spacing symbols (diacritical marks)", mentioning that these symbols put these diacritical signs to the following letter. THis is not very precise expression, but from what i have seen in logs from gurabli the encoding "characted table 00" in ETSI is the IEC 6937 with one extra char €.

    I attached 2 files from my tests - just to compare if you wanted to implement this in similar way.
     

    Attachments

    • DVBEnc.zip
      6.1 KB
    Last edited:

    gurabli

    Portal Pro
    July 20, 2010
    242
    5
    Home Country
    Hungary Hungary
    @Vasilich Looking at your discussion with mm1352000, it's all Greek for me:) Not that it was intended for me to understand :)

    Nice to see that there is progress on this issue! I very much hope it will be resolved in some of the upcoming versions.

    The files attached to your last post are the ones I can also try with MP 1.6 final?
     

    Vasilich

    Portal Pro
    August 30, 2009
    3,394
    1,170
    Germany, Mayence
    Home Country
    Russian Federation Russian Federation
    no, these are source files.
    I still can't resolve problems with updating my local patched GIT repository so sorry that i cannot supply you with patched files for 1.6 :( Will do so as soon as i get my local repo updated.
    maybe @mm1352000 can tell us if he already fixed code in tve35 getString468A in TsWriter DvbUtil.cpp - then we need to patch this part also
     

    gurabli

    Portal Pro
    July 20, 2010
    242
    5
    Home Country
    Hungary Hungary
    no, these are source files.
    I still can't resolve problems with updating my local patched GIT repository so sorry that i cannot supply you with patched files for 1.6 :( Will do so as soon as i get my local repo updated.
    maybe @mm1352000 can tell us if he already fixed code in tve35 getString468A in TsWriter DvbUtil.cpp - then we need to patch this part also

    OK, I understand! Do not make the patch for 1.6 a priority, only if you have time and you need some logs from my side. Thank you!
     

    Users who are viewing this thread

    Top Bottom