Need help with another new WebEPG Grabber. (1 Viewer)

Juppe

Portal Pro
November 17, 2006
315
45
Home Country
Sweden Sweden
I'm trying to develop a new WebEPG grabber for sweden, but I got a few problem in parsing the html-file to get all programs.

In the file attached to this post the problem is that there are a few different ways to do things.

the first thing is in the program at 11:30.
The <h3> tag is the title of the program, but as you see there is some extra info in the tag, there is a <span> tag that I dont need and dont know how to remove.

the second thing is in the program at 19:30.
That program i "missing" the subtitle tag, that is the <span> tag between the <h3> and the <p>.

I have a TemplateText that is almost working, but I'm missing four of the programs from this file and it is because I cant figure out the problems above.
If I make those tags optional, I'll get all programs, but not all info for the programs.
 

Attachments

  • web-epg-grabber.txt
    11.1 KB
  • tv24_se.xml
    1.5 KB

vuego

Documentation Group
  • Team MediaPortal
  • August 5, 2006
    1,645
    764
    Göteborg
    Home Country
    Sweden Sweden
    Hi, that's a good idea since kolla.tv stopped working properly.
    However tv24.se has a problem in that there's no end times listed in the tv-guide which means that the last program of the day will sometimes appear several hours long until the first show starts in the morning. The information is available when clicking each description of every tv show so the grabber will have to open each tv show to grab this information. The function is called sublinks I believe but I do not have much experience with that.

    You could try my Work in Progress here. I've modified your template and added the tv channel names however I think that the names will have to be fixed to match MediaPortal's channel database.

    And another issue is that the "subtitle" contains the season and episode numbers and description which should be imported to the correct database fields to be useful for the recorder and scheduler.
     

    Attachments

    • WiP_tv24_se.xml
      16.8 KB

    Juppe

    Portal Pro
    November 17, 2006
    315
    45
    Home Country
    Sweden Sweden
    Hi, I think that I've fixed the problem with the season and episode numbers and the episode name also.

    But there are still problem with the end time, I'll see if I can fix it, but I dont think so.

    There are still some programs that not will be detected and those have a title row that looks something like this:
    <h3>FIS Längdåkning: Världscupen<span class="live">LIVE</span></h3>

    There's an extra <span> tag in the title row and that's the problem I think, I dont know how to make that tag optional.
    Do you have any ide about that?

    And there is another problem to, and that is that not all program have a <span> after the title before the description

    One that has that <span>
    <h3>Handbolls-EM 2024 Kvinnor<span class="live">LIVE</span></h3>
    <span class="desc">Final</span>
    <p>Bevakning av handbollsturneringen med 24 lag som tävlar pÃ¥ arenor i Ãsterrike, Ungern och Schweiz.</p>

    And one that dont
    <h3>Handbolls-EM Studio</h3>
    <p>Handbolls-EM Studio.</p>

    If you want an example of teh problems, look at tv6 2024-12-15
     

    Attachments

    • tv24_se.xml
      17.1 KB

    Juppe

    Portal Pro
    November 17, 2006
    315
    45
    Home Country
    Sweden Sweden
    Here is another update, where more programs are detected, but still no end time.
    The problem that I've fixed is the one with the extra span after the title row.
     

    Attachments

    • tv24_se.xml
      17 KB

    Juppe

    Portal Pro
    November 17, 2006
    315
    45
    Home Country
    Sweden Sweden
    Does anyone know how I can make the span in this line optional:
    <h3>FIS Längdåkning: Världscupen<span class="live">LIVE</span></h3>

    I've another line that a made optional with <z(>...</z)?> but if I add the same around the <span> in the above line averything just messes up.

    Seams like you cant have two <z> or am I doing something wrong?

    If I only have the z-tag around the line above and remove the other line with z-tag, I can get the programs that has the line above, but only those.
     

    vuego

    Documentation Group
  • Team MediaPortal
  • August 5, 2006
    1,645
    764
    Göteborg
    Home Country
    Sweden Sweden
    Hi, I tried to make a grabber with a sublink. It did parse every sublink according to the log file however it didn't save the #END tag to the output file so I gave up on it for now.

    I also had the same troubles when using several Z-tags so I ended up simplifying the template instead. When ignoring all S-tags (such as Span) it's possible to grab all three different formats of programs (1. subtitle 2. no subtitle and 3. LIVE shows). I also changed the removal of the colon since it would remove all colons of all fields. "FIS Längdåkning: Världscupen" would be "FIS LängdåkningVärldscupen". The only small issue with this is that the live shows will have the string LIVE added without a space, for example: "FIS Längdåkning: VärldscupenLIVE". Perhaps we could find a workaround or just live with it :)
    Please try this file to see that it doesn't have any other issues.
     

    Attachments

    • No S-tag_tv24_se.xml
      16.9 KB

    Juppe

    Portal Pro
    November 17, 2006
    315
    45
    Home Country
    Sweden Sweden
    Nice work.
    I did see that if the colons are removed with the #EPISODE search then programs with no subtitle after the episode-number would get the episode-number as the episode name, something like "Avsnitt 2" as the episode name and no episode-number. So I added another #EPISODE search like the one that where there before and now I think that both episode-number and episode-name is working.

    Another thing that I did was that I added a search for the live span and removed it, so now there are no LIVE after the title. I think that that is better, but I think that that can differ from person to preson. So if you want the LIVE after the title you can just remove that search.

    I think the reason you dont get the sublink to work is because you get a JSON-file that contains a HTML-part that has the end time in it, but I'm not sure about that.
     

    Attachments

    • tv24_se.xml
      17 KB
    Last edited:

    Juppe

    Portal Pro
    November 17, 2006
    315
    45
    Home Country
    Sweden Sweden
    Here's another update.

    I have change so Live now is in the title, but instead of this "FIS Längdåkning: VärldscupenLIVE" it's looking like this "FIS Längdåkning: Världscupen: Live"

    And I did another update where there where only a season-number and not an episode-number, but a subtitle.

    I think that the file is as good as it can be now, lets try it for some time and then it can be in the official release.
     

    Attachments

    • tv24_se.xml
      17.2 KB

    vuego

    Documentation Group
  • Team MediaPortal
  • August 5, 2006
    1,645
    764
    Göteborg
    Home Country
    Sweden Sweden
    I see, the colon should be optional. We could make it optional by adding a question mark like this:
    HTML:
    <Search match="[Aa]vsnitt \d+:?" field="#EPISODE" remove="true"/>

    It seems that sublinks can not be used for start or end times. Other fields works without problems. I just found out that I've had this same problem many years ago.

    Good work with the Live shows (y)
    I double checked that it doesn't alter other show names like "Alive" or "Livet" but it leaves them as is, I guess it is case sensitive. I would perhaps just change the colon for a dash instead and maybe even leave it in caps ("FIS Längdåkning: Världscupen - LIVE") but it's not that important.

    I cleaned up the channel list and removed many channels which have no data and some duplicates however tv24.se seems to be missing BBC Earth. I also noticed that SVT Barn/SVT24 is missing all shows from 4.00 to 6.00 but they are available on SVT Barn.
     

    Attachments

    • tv24_se.xml
      14.4 KB

    Juppe

    Portal Pro
    November 17, 2006
    315
    45
    Home Country
    Sweden Sweden
    I agree about the dash instead of colon on the LIVE thing.

    I did a little change to get the current program, just removed the D from the tags. Not that important, so I dont upload any new file.

    A little sad about the end time, but nothing to do about it.
     

    Users who are viewing this thread

    Top Bottom