[further infos missing] Unicode Error in UK EPG (1 Viewer)

CyberSimian

Test Group
  • Team MediaPortal
  • June 10, 2013
    2,873
    1,801
    Southampton
    Home Country
    United Kingdom United Kingdom
    I suspect you'd find that the EPG data is huffman encoded as Owlsroost has said.
    It is not Huffman encoded in the TS file that I provided, as that was from an SD MUX. I chose an SD MUX because:

    (1) The EPG for the SD channel also showed the error.
    (2) An SD TS file is going to be much smaller than an HD TS file.

    However, as I mentioned earlier, my understanding is that all MUXes broadcast the EPG for all channels (except that the SD MUXes do not broadcast the EPG for the HD channels). If my understanding is true, the reason that I found an error in the EPG for an SD channel (and Caesium did not find that particular error) could be because the EPG data for my SD channel was received via the HD MUX, and hence was Huffman encoded, whereas the EPG data for Caesium's SD channel was received via an SD MUX. I don't know how MP selects which MUX to use -- I assume that is related to channels being watched or recorded (so will vary from one person to another).

    So you might need to use a TS file from an HD MUX in order to identify the source of this problem.

    -- from CyberSimian in the UK
     

    Vasilich

    Portal Pro
    August 30, 2009
    3,394
    1,170
    Germany, Mayence
    Home Country
    Russian Federation Russian Federation
    You mean TsWriter?
    NO, i meant that TSReader
    Not sure about that sample to be huffman-encoded, because TSReader from above link can decrypt it on my PC (and i definitely don't have any CI cards/modules, that are needed (?) to descramble Huffman encoding), and i see the proper decoded pound symbol. Could that be that case where TSWriter doesn't properly write bytes for encoding for tables > 5 (do you remember the case with hungarian weirdness by sat providers?) - you have fixed something for that in TVE35.
     

    mm1352000

    Retired Team Member
  • Premium Supporter
  • September 1, 2008
    21,577
    8,224
    Home Country
    New Zealand New Zealand
    It is not Huffman encoded in the TS file that I provided
    Okay. Like I said, it is too big for me to download so I wasn't actually able to check.

    NO, i meant that TSReader
    I'd use NotePad++ with the HexEditor plugin.

    TSReader from above link can decrypt it on my PC (and i definitely don't have any CI cards/modules, that are needed (?) to descramble Huffman encoding)
    Huffman encoding is just run length encoding (compression), not encryption.

    Could that be that case where TSWriter doesn't properly write bytes for encoding for tables > 5 (do you remember the case with hungarian weirdness by sat providers?) - you have fixed something for that in TVE35.
    Could be... and in that case I would add some debug to TsWriter and run the dump through it to see what the output looks like.
     

    Vasilich

    Portal Pro
    August 30, 2009
    3,394
    1,170
    Germany, Mayence
    Home Country
    Russian Federation Russian Federation
    hex1.png
    I'd use NotePad++ with the HexEditor plugin.
    checked it with hex viewer - no encoding byte -> Latin 1. Code for pound there is 0xA3, and the stream contains the same code. So i assume it is some kind of problems while converting it to unicode and storing it in DB.
    @CyberSimian can you check this entry in your SQL DB? it is in the table 'program' filed 'description'. BTW what DB server do you use - MS SQL or MySQL?
     

    CyberSimian

    Test Group
  • Team MediaPortal
  • June 10, 2013
    2,873
    1,801
    Southampton
    Home Country
    United Kingdom United Kingdom
    @CyberSimian can you check this entry in your SQL DB? it is in the table 'program' filed 'description'. BTW what DB server do you use - MS SQL or MySQL?
    The problem is present in MP 1.8.0, but since I raised the issue I have decided to move permanently to MP (from WMC), so I did a fresh install (of MP 1.9.0 pre) in a Vista partition that had never seen any trace of MP. I accepted all of the defaults, so does that mean that I ended up with MySQL?

    Either way, I don't know how to query the DB to determine the info that you need. Can you specify the actions necessary to extract this info from the DB? Thanks

    -- from CyberSimian in the UK
     

    CyberSimian

    Test Group
  • Team MediaPortal
  • June 10, 2013
    2,873
    1,801
    Southampton
    Home Country
    United Kingdom United Kingdom
    So i assume it is some kind of problems while converting it to unicode and storing it in DB.
    I noticed another occurrence of the corrupted "£" sign in the EPG, so this time I recorded the programme, as I was curious to see what would end up in the XML file containing the programme info.

    The XML file has "encoding=UTF-8", and my file editor confirms that the file contains SBCS chars (Single Byte Character Set), but the file contains these four bytes for the "£" sign:

    0xc3, 0x82, 0xc2, 0xa3.

    -- from CyberSimian in the UK
     

    CyberSimian

    Test Group
  • Team MediaPortal
  • June 10, 2013
    2,873
    1,801
    Southampton
    Home Country
    United Kingdom United Kingdom
    So i assume it is some kind of problems while converting it to unicode and storing it in DB.
    I was wandering around the web recently and came across this page:

    http://www.fairoak.org/properties.php

    The interesting thing about this page is that on my Windows XP laptop it exhibits the same problem as the UK EPG does on my Windows Vista HTPC, namely the UK currency symbol displays as an upper case "A" with circumflex accent, followed by the UK pound symbol. I use the Opera browser, and on the "View" drop-down menu there is an "Encoding" selection that changes the encoding used to interpret the web page. I tried several selections, but the only one that improved matters was selecting "UTF-8", which eliminated the problem!

    I think that the Opera default is "Automatic Selection" of the encoding, so I looked at the source for the web page to see if that was where Opera was getting the (incorrect) encoding. The web page source contains this line:

    Code:
    meta http-equiv="Content-Type" content="text/html; charset=windows-1252" />
    So it looks as though the incorrect display results from Opera using "Windows 1252" as the encoding, instead of "UTF-8". Of course, we don't actually know what encoding was used for the web-page source; just because it claims "Windows 1252" does not mean that that is what it is actually using.

    -- from CyberSimian in the UK
     

    Users who are viewing this thread

    Top Bottom