SiteParser: Force a specific charset ?? (1 Viewer)

doskabouter

Development Group
  • Team MediaPortal
  • September 27, 2009
    4,584
    2,978
    Nuenen
    Home Country
    Netherlands Netherlands
    Couldn't you add an optional option to the generic parser like the "forceUTF8Encoding" so that the matcher behaves the described way?
    Implementing a duplicate of the generic parser so that 10 lines are changed (one more loop for parse, categories, subcategories) could lead to future incompatibilities ...

    not needed, I just figured it out:
    put
    (?<!class=2.*)(?<=class=1.*) before your regex, and only the matches between "class=1" and "class=2" will match
     

    ScRePt

    Portal Pro
    August 2, 2010
    170
    96
    Athens
    Home Country
    Greece Greece
    Oh, you are good!!! It worked
    I bumped on an other problem: The site seems to dynamically loading it's content. As a result, the DOM I am trying to parse is not the same as the "live" DOM I'm seeing on the browser. Did you ever come to this problem? how did you solve this ?

    Example link

    (look for addPrototypeElement)
     

    doskabouter

    Development Group
  • Team MediaPortal
  • September 27, 2009
    4,584
    2,978
    Nuenen
    Home Country
    Netherlands Netherlands
    You could use a html-sniffer like fiddler2 to figure what html's are loaded. One of those should contain the video
     

    ScRePt

    Portal Pro
    August 2, 2010
    170
    96
    Athens
    Home Country
    Greece Greece
    Even if I sniff the video source, there is no way to parse the categories for the videos since the final DOM is not built and is not available for parsing. I am wondering if this was ever an issue for other sites.
     

    offbyone

    Development Group
  • Team MediaPortal
  • April 26, 2008
    3,989
    3,712
    Stuttgart
    Home Country
    Germany Germany
    If the final page is build from secondary requests (e.g. using ajax), and the data coming from those results is what you need in the first place, you should use the urls of those requests with your regex, as they contains the data you need. If you need the data in the primary html to make the ajax request, it's now time to build your own util in c# that does so. This can in no way be solved with one generic util without making it overly complex.
     

    Users who are viewing this thread

    Top Bottom