Trouble with Importing (1 Viewer)

Tesla

Portal Pro
January 30, 2009
138
4
Texas
Home Country
United States of America United States of America
Thanks for the on-going assistance.

I finally got the majority or my TV Series into the DB and moved on (actually got the new MP box installed).

However, I plan to re-visit this issue in a week or so. By then, I will have some new episodes that need to be added. I'll delete the DB, turn on full logging and try again from scratch. If that works, I'll also try just turning off "compressed files" in MS-SE.
 

Guzzi

Retired Team Member
  • Premium Supporter
  • August 20, 2007
    2,161
    747
    AW: Re: Trouble with Importing

    Edit MSE settings and disable the option "scan in compressed archives", this disables processing of ZIP files.

    If this in turn allows you to run import scans without a problem, then the cause is clearly from having a slow system not capable of having MSE lock the ZIP file, unzip it, scan all files, and release the lock for MP-TVSeries to resume before the timeout value passes.

    Maybe the timeout value can be increased then, or keep the option disabled to scan ZIP files.

    Hi RoChess,

    the last lines in the TVS log are as follows:

    [...]
    00000008 - 14.08.2010 18:55:29 - Found 048 episodes online for "Die Abenteuer von Paddington Bär"
    00000001 - 14.08.2010 18:55:29 - progress received: IdentifyNewEpisodes [502/2191] Die Abenteuer von Paddington Bär
    00000008 - 14.08.2010 18:55:29 - Retrieving Data from: http://thetvdb.com/api/<apikey>/series/113071/all/de.zip
    00000008 - 14.08.2010 18:55:30 - Decompressing Stream...
    00000001 - 14.08.2010 18:55:32 - Online Parsing Completed in 00:32:59.2033981


    No Virusscan active - it seems to stop just while it should decompress? Would it help to run with Debug&SQL Logs?

    Thanks,

    Guzzi
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    Re: AW: Re: Trouble with Importing

    Edit MSE settings and disable the option "scan in compressed archives", this disables processing of ZIP files.

    If this in turn allows you to run import scans without a problem, then the cause is clearly from having a slow system not capable of having MSE lock the ZIP file, unzip it, scan all files, and release the lock for MP-TVSeries to resume before the timeout value passes.

    Maybe the timeout value can be increased then, or keep the option disabled to scan ZIP files.

    Hi RoChess,

    the last lines in the TVS log are as follows:

    [...]
    00000008 - 14.08.2010 18:55:29 - Found 048 episodes online for "Die Abenteuer von Paddington Bär"
    00000001 - 14.08.2010 18:55:29 - progress received: IdentifyNewEpisodes [502/2191] Die Abenteuer von Paddington Bär
    00000008 - 14.08.2010 18:55:29 - Retrieving Data from: http://thetvdb.com/api/<apikey>/series/113071/all/de.zip
    00000008 - 14.08.2010 18:55:30 - Decompressing Stream...
    00000001 - 14.08.2010 18:55:32 - Online Parsing Completed in 00:32:59.2033981


    No Virusscan active - it seems to stop just while it should decompress? Would it help to run with Debug&SQL Logs?

    Thanks,

    Guzzi

    Looks like that show lacks some serious TheTVdB love, so sign up and edit away: : Series Info

    Complete the German information (code ID 14), and preferably add some minor info for English, such as "Episode 1", because that's just a pitty. You should be able to get most info from tvrage or wikipedia websites if you want to really be helpful to the rest of the community. And it might also be needed to fix your own problem.

    It clearly looks like a bug somewhere in the parsing of those bad fields, but yeah would need Debug+SQL logs then.

    But if you just edit the data proper, and wait roughly 15 minutes (it takes a while before thoze ZIP files are regenerated that MP-TVSeries downloads), it should be all good.

    In light of hunting down and fix the parsing bug, rerun the import as-is and enable Debug+SQL and post those logs.
     

    Guzzi

    Retired Team Member
  • Premium Supporter
  • August 20, 2007
    2,161
    747
    AW: Re: AW: Re: Trouble with Importing

    In light of hunting down and fix the parsing bug, rerun the import as-is and enable Debug+SQL and post those logs.


    Hi RoChess,

    I have rerun the importer - with following details:

    1.) The "hanging" occurred somewhere else - series 113071 from last scan didn't show up in the logs at all (until "hang" in newepisodes) - or there has been a cleanup in previous scan done? This meets my experience so far, that it does not always hang on the same position.

    2.) In the part "updating metadata" 4 series should have been be updated - it stopped at series no. 3 with a parsing error and didn't proceed to no. 4 - nevertheless TVseries continued with other steps and did not "finish importer".

    Log:

    00000008 - 14.08.2010 18:42:14 - Retrieving updated Metadata for series der letzte zeuge
    00000008 - 14.08.2010 18:42:14 - Retrieving Data from: http://thetvdb.com/api/<apikey>/series/81980/all/de.zip
    00000008 - 14.08.2010 18:42:14 - Decompressing Stream...
    00000008 - 14.08.2010 18:42:14 - Decompressing Entry: de.xml
    00000008 - 14.08.2010 18:42:14 - Decompression done, now loading as XML...
    00000008 - 14.08.2010 18:42:15 - Failed to load XML: ' ', hexidezimaler Wert 0x0B, ist ein ung├╝ltiges Zeichen. Zeile 1895, Position 549.
    00000008 - 14.08.2010 18:42:15 - Decompressing Entry: banners.xml
    00000008 - 14.08.2010 18:42:15 - Decompression done, now loading as XML...
    00000008 - 14.08.2010 18:42:15 - Loaded as valid XML
    00000008 - 14.08.2010 18:42:15 - Decompressing Entry: actors.xml
    00000008 - 14.08.2010 18:42:15 - Decompression done, now loading as XML...
    00000008 - 14.08.2010 18:42:15 - Loaded as valid XML
    00000008 - 14.08.2010 18:42:15 - Decompression returned null or not the requested entry

    So seems, that the parsing error prevents further processing ?

    3.) In the part "Begin Parsing action: IdentifyNewEpisodes" - it stopped at seriesID 79811 - I have attached the last part of the log.

    Remarks:
    Thanks for the hint for "bad series info" on TheTVDB - but I should have mentioned, that this DB is my testDB and uses lot's of "pseudoseries" I don't have anyway - I once created a batch for generating a structure to get a big DB to find some problems with Damien - he should have that big testDB and it would probably be the best, if he could once run the importer with that testDB, because it is big (>100 MB) and creates huge logs (>50MB in this example, where it didn't yet complete).
    I have added the batch - so if you want, you could give it a try (run the batch in a directory and include that dir in the importer path) - please disable fanart/episodeimage/cover download to avoid huge load on TheTVDB - it's resulting in > 2000 series & >100k episodes DB ...

    Are you sure, that the problem is not related to async operations? Is there a possibility to disable that - i.e. forcing sequential processing of all steps of the importer to exclude the problem of a "lock situation"?

    Thanks for the support,

    Guzzi
     

    Attachments

    • MP-TVSeries-logdebugsql(lastpart).zip
      4.3 KB
    • TV-Series-TestDB.zip
      217.1 KB

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    Re: AW: Re: AW: Re: Trouble with Importing

    So seems, that the parsing error prevents further processing ?

    It could also be related to a fault at thetvdb.com

    There was problems with series 'Lost' as well, which required fixing by TheTVdB admins to resolve it.

    MP-TVSeries tries to compensate for as much as possible, but it is impossible to counter-act all the screwups. I don't have time to analyze the ZIP file that TheTVdB generates that is causing the problems, but I'm sure the admins at thetvdb.om will be happy to look into it.

    Since the actual data at TheTVdB can be edited by everybody (until admins lock it), you might be able to correct the problem yourself by just reviewing all the series you import. Indeed the asynchrone nature (which is done to speed things up), does make it a little bit more complicated sometimes to read the log and know which one is causing the problem, but it is not impossible. I know there is an easy config option in MovingPictures to limit the import process to a single item at a time, but I've never run into these type of problems on my system with MP-TVSeries, so frankly I don't know if it has the same option.

    Of course an easy workaround for that is to just create a new temporary setup and only import one series at a time. That way you know exactly which show is causing a problem. If all shows work individually, but they fail on an asynchrone mass-import, then there would indeed be a problem in MP-TVSeries plugin. In that case, I'm sure the developers would appreciate as many debug+sql enabled log files (and not just snipplets of what you think is the cause, but the entire file).

    PS: Giving a quick look at the series:

    http://www.thetvdb.com/?tab=seasonall&id=81980&lid=14

    I see that episode S03E09 for example lacks info. But the error in your log file seems to indicate a problem at a different level.

    http://www.thetvdb.com/?tab=series&id=79811&lid=14

    Seems to lack a lot of information as well (series info, and airdates), so would be nice to complete the data. If you complete it all, it might just make the problem go away.
     

    Guzzi

    Retired Team Member
  • Premium Supporter
  • August 20, 2007
    2,161
    747
    AW: Trouble with Importing

    Hi RoChess,
    sorry, but I think you misunderstood me - I am not talking about completing missing infos on TVDB - which is for sure valuable for the community, 100% ack - but I am talking about a problem, that the importer stops - and I would help to find the reason for that, if I can - if this problem is caused by missing or incorrect data, that should be noted in the log and the importer should continue with next series - which is happening in some other situations, e.g.:

    00000008 - 14.08.2010 18:42:23 - Retrieving Data from: http://thetvdb.com/api/<apikey>/series/81980/all/de.zip
    00000008 - 14.08.2010 18:42:23 - Decompressing Stream...
    00000008 - 14.08.2010 18:42:23 - Decompressing Entry: de.xml
    00000008 - 14.08.2010 18:42:24 - Decompression done, now loading as XML...
    00000008 - 14.08.2010 18:42:24 - Failed to load XML: ' ', hexidezimaler Wert 0x0B, ist ein ung├╝ltiges Zeichen. Zeile 1895, Position 549.
    00000008 - 14.08.2010 18:42:24 - Decompressing Entry: banners.xml
    00000008 - 14.08.2010 18:42:24 - Decompression done, now loading as XML...
    00000008 - 14.08.2010 18:42:24 - Loaded as valid XML
    00000008 - 14.08.2010 18:42:24 - Decompressing Entry: actors.xml
    00000008 - 14.08.2010 18:42:24 - Decompression done, now loading as XML...
    00000008 - 14.08.2010 18:42:24 - Loaded as valid XML
    00000008 - 14.08.2010 18:42:24 - Decompression returned null or not the requested entry
    00000001 - 14.08.2010 18:42:24 - progress received: IdentifyNewEpisodes [5/2191] der letzte zeuge

    So as you also stated, the problem seems to be between "Decompressing Stream" and "Decompressing Entry...." - maybe some exception causing to completely quit the importer routine? So an additional exceptionhandling would be required?

    damien: Can you point me to the part of the code - if I find the time, I would try to add some more logging to narrow it down ...

    I think, fixing the infos on TVDB is not a solution for this problem - cause this would in consequence mean, that one broken series could prevent the whole import (if it's in an early stage)...
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    The problem is at TheTVdB, they put the reserved byte value in there \0x0B, which freaks out the XML parser that MP-TVSeries uses.

    I analyzed the de.XML file and it is for:

    <Overview>An unterschiedlichen Fundorten werden zwei Frauen tot aufgefunden - jeweils nackt, an H<bh:c3><bh:a4>nden und Beinen gefesselt und mit einer Plastikt<bh:c3><bh:bc>te <bh:c3><bh:bc>ber dem Kopf. Kripochef Joe Hoffer und die beiden Gerichtsmediziner Dr. Robert Kolmaar und Dr. Judith Sommer stehen vor einem R<bh:c3><bh:a4>tsel. Trotz v<bh:c3><bh:b6>lliger <bh:c3><bh:9c>bereinstimmung der Morde k<bh:c3><bh:b6>nnen sie zun<bh:c3><bh:a4>chst keine Verbindung zwischen den Toten erkennen. Unter Verdacht kommt dann Torsten Kraft, ein verurteilter Sexualstraft<bh:c3><bh:a4>ter, der nach 16 Jahren Haft entlassen wurde und zum Tatzeitpunkt wieder auf freiem<bh:0b>Fu<bh:c3><bh:9f> war. Doch nachdem Torsten Kraft in Untersuchungshaft genommen wurde, wird eine weitere Frauenleiche gefunden, ermordet nach dem identischen Schema.
    Die Ermittlungen erweisen sich als <bh:c3><bh:a4>u<bh:c3><bh:9f>erst schwierig, zumal Kripochef Joe Hoffer <bh:c3><bh:bc>berzeugt ist, dass nur jemand f<bh:c3><bh:bc>r die Tat in Frage kommt, der das entsprechende Hintergrundwissen hatte, um einen Mord derart inszenieren zu k<bh:c3><bh:b6>nnen. Oberstaatsanwalt Koerner beauftragt die Gerichtsmediziner zu analysieren, inwieweit die Morde Kraft als dem Hauptverd<bh:c3><bh:a4>chtigen zuzuordnen sind. Das bringt<bh:0b>Dr. Judith Sommer in akute Lebensgefahr...\</Overview>

    To be exact the "freiem<bh:0b>Fu" and "Das bringt<bh:0b>Dr. Judith" sections.

    \0x0b represents vertical TAB, which is the weirdest thing I've seen at TheTVdB, so no wonder C# XML Parser freaks out over it.

    The problem is that the XML v1.0 specifications do not allow for vertical tabs to be used.

    I believe v1.1 does, but TheTVdB clearly identifies itself as: "<?xml version="1.0" encoding="UTF-8" ?>"

    I'm sure damienh can code in a workaround (and I'm asking already), but where does it end. The TheTVdB admins need to adhere to the correct standards and fix their code. So eventhough a workaround might get added, it will be much faster if you notify the TheTVdB admins to scan their database for vertical tab chars and replace them with the usual [LF][CR] sequence. They should also filter it then on input level, because if this is possible, it usually means an SQL injection is possible as well with the right sequence.
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    PS: In the meantime I've edited the episode summary in question (Episode 9x01), so if you wait about 15min, the problem at least for that series should be gone.

    That unfortunatly doesn't mean it is fixed for all series, so once again notify TheTVdB admins :)

    PS2: You say that TheTVdB shouldn't bother fixing it, and MP-TVSeries should fix their error handling around it. But that's the wrong way to go around it. Next thing you know TheTVdB starts flipping languages around, German becomes Klingon language on random. Now if you can invent artificial intelligence, then it can be added to MP-TVSeries to automatically fix anything, in the meantime, standards are there for a reason, to adhere to them :)
     

    Guzzi

    Retired Team Member
  • Premium Supporter
  • August 20, 2007
    2,161
    747
    AW: Re: Trouble with Importing

    The problem is at TheTVdB, they put the reserved byte value in there \0x0B, which freaks out the XML parser that MP-TVSeries uses.

    [...]

    The problem is that the XML v1.0 specifications do not allow for vertical tabs to be used.

    [...]

    They should also filter it then on input level, because if this is possible, it usually means an SQL injection is possible as well with the right sequence.

    strike! - Agree 100% on that - it should be rejected on input side (probably resulting of copy/paste operations of users editing).

    First: Thanks for the analysis - this made the circumstances and dependencies clear to me.

    So 3 things are required:

    1.) TVseries should check valid XML1.0 and reject processing if data not valid (this ensures stability of the plugin)
    2.) TVDB requires a cleanup for existing data
    3.) TVDB requires an inputfilter to either correct or reject wrong inputdata

    No 1 is something Damien or some MPTVS devs could do - but how and to whom should/can No2 & 3 be adressed?

    If you have a contact to the guys developing and administering TVDB - could you forward it to them?

    Thanks for all your help.

    Guzzi

    PS: Klingon sounds good - I always wanted to learn it ;-) - and I will try fixing wrong or missing data

    PPS: Didn't find the correct id for klingon .... :)

    Available Languages:
    Language Language ID (lid)
    Chinese 6
    English 7
    Svenska 8
    Norsk 9
    Dansk 10
    Suomeksi 11
    Nederlands 13
    Deutsch 14
    Italiano 15
    Español 16
    Français 17
    Polski 18
    Magyar 19
    Greek 20
    Turkish 21
    Russian 22
    Hebrew 24
    Japanese 25
    Portuguese 26
     

    Inker

    Retired Team Member
  • Premium Supporter
  • December 6, 2004
    2,055
    318
    Re: AW: Re: Trouble with Importing

    The problem is at TheTVdB, they put the reserved byte value in there \0x0B, which freaks out the XML parser that MP-TVSeries uses.

    [...]

    The problem is that the XML v1.0 specifications do not allow for vertical tabs to be used.

    [...]

    They should also filter it then on input level, because if this is possible, it usually means an SQL injection is possible as well with the right sequence.

    strike! - Agree 100% on that - it should be rejected on input side (probably resulting of copy/paste operations of users editing).

    First: Thanks for the analysis - this made the circumstances and dependencies clear to me.

    So 3 things are required:

    1.) TVseries should check valid XML1.0 and reject processing if data not valid (this ensures stability of the plugin)
    2.) TVDB requires a cleanup for existing data
    3.) TVDB requires an inputfilter to either correct or reject wrong inputdata

    No 1 is something Damien or some MPTVS devs could do - but how and to whom should/can No2 & 3 be adressed?

    If you have a contact to the guys developing and administering TVDB - could you forward it to them?

    Thanks for all your help.

    Guzzi

    The fact that you see the xml parsing error in the log means that the exception thrown by invalid xml is caught, the xml is rejected. If this causes other problems further down its simply a bug.

    Thetvdb has a forum where invalid data can be reported, or for unlocked series/episodes, you are encouraged to correct it yourself if you can. Of course input sanitizing should really be done, but they are working on a new site for a while now, and I suspect it's on hold until then.

    Also, you asked about the code in question, have a look here:
    http://code.google.com/p/mptvseries...eries/Online Parsing Classes/OnlineAPI.cs#442
     

    Users who are viewing this thread


    Write your reply...
    Top Bottom