Can WebEPG grab from sites like these ... (1 Viewer)

Khurram

Portal Pro
May 12, 2008
211
5
Home Country
Pakistan Pakistan
I have been trying to make grabbers for different channels for my area but I have been stumped by the following sites:

1) A website has one page for each day in the week but gives no dates (cant match with YYYY, MM or DD tokens). For example, http://<some website>/mon for Monday, http://<some website>/tue for Tuesday and so on. On any given day, all 7 pages are available. What url do I use in the grabber?

2) A website lists data for all 7 days in the week, uses <BR> tag to separate the programs and doest show any dates in YYYY, MM or DD tokens. Please see Complete Programme Schedule for what I am referring to. I cant figure out the template in this case.

3) A website list programs for both today and multiple days on same page (there seems to be no info available by browing to a particular date). Multiple days are calculated by dividing the month into 4 parts depending upon which part today falls in: multiple days = days 1 through 7 if today is in the first week, days 8 to 14 if in second week, days 15 to 21 if in 3rd week and 22 to last day of the month if today doesnt fall in the first 3 cases. How do I make a grabber for this site? The site url is Sports - Ten Sports

Thanks.
 

Khurram

Portal Pro
May 12, 2008
211
5
Home Country
Pakistan Pakistan
Hmm, still no reply. I was hoping someone would have an idea here :(

In the meantime, can WebEPG grab schedule if the web page contains data for 7 days or 14 days? The WebEPG examples deal with data for 1 day only. How do I extend it to cover for 7/14 days? The website in question is the M-Net site from South Africa (MNET - WHERE MAGIC LIVES) which shows data for 1 day, 7 days and 14 days. It is easy to make a grabber for 1 day. But I would like to grab data for 7 days or even 14 days as it would gather more EPG data without making so many trips to the server. Or is WebEPG not designed for such web pages?
 

James

Retired Team Member
  • Premium Supporter
  • May 6, 2005
    1,385
    67
    Switzerland
    1) See [WEEKDAY] : MediaPortal_WebEPG_Grabber - MediaPortal Manual Documentation

    2) and 4) WebEPG is able to read multiple days on the same page. The time will be used to work out that the next day has come.

    3) I'm not sure from your description what the site does. However as I said above WebEPG can handle multiple days on a single page. With two conditions. First that the days are one after another (top to bottom in the html source). Second it always starts with the current day and not in the past.
     

    Khurram

    Portal Pro
    May 12, 2008
    211
    5
    Home Country
    Pakistan Pakistan
    I looked at [WEEKDAY] token, but the url uses the shortened notation; that is, mon instead of Monday. I cant find a token for the shortened weekday name.

    2) and 4) WebEPG is able to read multiple days on the same page. The time will be used to work out that the next day has come.

    3) I'm not sure from your description what the site does. However as I said above WebEPG can handle multiple days on a single page. With two conditions. First that the days are one after another (top to bottom in the html source). Second it always starts with the current day and not in the past.
    Ok, its much clear now how multiple days are handled. It seems that the site in (2) cant be handled as it is showing the schedule for the entire week starting from Friday. Also, it seems that the web site in (3) cant be handled as it is showing the entire first (or second or third or fourth) week depending upon which week "today" falls in; also the "4th week" is not really a week as it goes from the 22nd to the end of the month and so it can be from 7 to 10 days.

    2 more questions:

    5) If a website shows the schedule for 1 day in multiple pages, how can I handle this?

    6) I cant seem to parse this html:
    Code:
    <tr class="r1">
    <td width="40" class="time">01:30<div class="interval">02:00</div>
    <div class="interval">02:30</div>
    <div class="interval">03:00</div>
    </td><td width="60" valign="top"></td><td width="470" class="programme" valign="top">Barclays Premier League 2008/9<BR>Match: Tottenham Hotspur vs. Fulham</td>
    </tr>
    The template I am using is
    Code:
    <td width="40" class="time"><#START><div class="interval">00:30</div>
    </td><td width="60" valign="top"></td><td width="470" class="programme" valign="top"><#TITLE><BR><#SUBTITLE></td>
    Tags are TB. I dont know what all those <div class="interval">...</div> tags are all about but I want to skip over them. But after parsing, #START contains 01:3002:0002:3003:00. What am I doing wrong?

    Thanks again.
     

    James

    Retired Team Member
  • Premium Supporter
  • May 6, 2005
    1,385
    67
    Switzerland
    I looked at [WEEKDAY] token, but the url uses the shortened notation; that is, mon instead of Monday. I cant find a token for the shortened weekday name.

    If you read a little further down you see how to change the weekday from long to short: <Search weekday="ddd" />

    5) If a website shows the schedule for 1 day in multiple pages, how can I handle this?

    Have a look at: [PAGE_OFFSET]

    6) I cant seem to parse this html:
    Code:
    <tr class="r1">
    <td width="40" class="time">01:30<div class="interval">02:00</div>
    <div class="interval">02:30</div>
    <div class="interval">03:00</div>
    </td><td width="60" valign="top"></td><td width="470" class="programme" valign="top">Barclays Premier League 2008/9<BR>Match: Tottenham Hotspur vs. Fulham</td>
    </tr>
    The template I am using is
    Code:
    <td width="40" class="time"><#START><div class="interval">00:30</div>
    </td><td width="60" valign="top"></td><td width="470" class="programme" valign="top"><#TITLE><BR><#SUBTITLE></td>
    Tags are TB. I dont know what all those <div class="interval">...</div> tags are all about but I want to skip over them. But after parsing, #START contains 01:3002:0002:3003:00. What am I doing wrong?

    Thanks again.

    I think you will need to try and use the <div> tags for parsing too. The <BR> tag cannot be used as a parsing tag, it is turned into formatting = line break. It can and is used as a separator for parsing data inside tag groups.

    So Tags=TD, Template=

    Code:
    <td><#START><div></div><div></div><div></div></td>
    <td></td>
    <td><#TITLE><BR><#SUBTITLE></td>
     

    Khurram

    Portal Pro
    May 12, 2008
    211
    5
    Home Country
    Pakistan Pakistan
    If you read a little further down you see how to change the weekday from long to short: <Search weekday="ddd" />
    Oh, I totally missed that :oops:
    Have a look at: [PAGE_OFFSET]
    I have re-read the sections on LIST_OFFSET and PAGE_OFFSET. I am getting confused here. Could you please explain LIST_OFFSET and PAGE_OFFSET? They both talk about MaxCount. Where is that coming from?
    I think you will need to try and use the <div> tags for parsing too. The <BR> tag cannot be used as a parsing tag, it is turned into formatting = line break. It can and is used as a separator for parsing data inside tag groups.So Tags=TD, Template=

    Code:
    <td><#START><div></div><div></div><div></div></td>
    <td></td>
    <td><#TITLE><BR><#SUBTITLE></td>
    I can try that but the thing is that the number of <div></div> tags are variable. So it can 0, 1 or more. Can I use some sort of regular expression here since the starting time is always in 00:00 format? Maybe I can use <Actions><Modify ....> (is the search string a regex)? Or the <Searches><Search ...> tag?
     

    James

    Retired Team Member
  • Premium Supporter
  • May 6, 2005
    1,385
    67
    Switzerland
    If using the above template with only Tags=T doesn't work (ignoring the div tags), then try this:

    Tags=TD
    Code:
    <td><#START><z(><div></div></z)?><z(><div></div></z)?><z(><div></div></z)?></td>
    <td></td>
    <td><#TITLE><BR><#SUBTITLE></td>



    [PAGE_OFFSET] + <Search maxlistings="" startPage="" endPage="">

    Page number used starts at 0 unless "startPage" is given in which case this will be used as the starting page number.

    The page number will be increased until either "endPage" or the number of listings on the page is less than "maxlistings". Providing both endPage and maxlisting is not required and depends on the site.

    Lets say you have a site that always has 3 pages. Then set endPage=2 (2 if starting from 0 or 3 if starting from 1). However, many sites have say 10 listings per page and have as many pages as is required to hold them all for that day. By setting maxlistings=10, it will stop getting new pages when the number of listings found on a page is less than 10. They can also be used together.
     

    Khurram

    Portal Pro
    May 12, 2008
    211
    5
    Home Country
    Pakistan Pakistan
    [PAGE_OFFSET] + <Search maxlistings="" startPage="" endPage="">

    Page number used starts at 0 unless "startPage" is given in which case this will be used as the starting page number.
    Its much clearer now, thanks.
    If using the above template with only Tags=T doesn't work (ignoring the div tags)....
    I am still having trouble with parsing the website. I have tried all combination that you have mentioned but none of them work.

    i) Using this template
    Code:
    <td width="40" class="time"><#START><div class="interval">03:00</div>
    <div class="interval">03:30</div>
    <div class="interval">04:00</div>
    <div class="interval">04:30</div>
    <div class="interval">05:00</div>
    </td><td width="60" valign="top"></td><td width="470" class="programme" valign="top"><#TITLE>:<#SUBTITLE></td>
    With TAGS=T, I get 26 entries after parsing (which is correct), but some of the start times as wierd as mentioned in my earlier post.
    With TAGS=TD, I get correct start times but ony 2 entries.

    ii) Using this template
    Code:
    <td width="40" class="time"><#START><z(><div class="interval">03:00</div></z)?>
    <z(><div class="interval">03:30</div></z)?><z(>
    <div class="interval">04:00</div></z)?><z(>
    <div class="interval">04:30</div></z)?><z(>
    <div class="interval">05:00</div></z)?>
    </td><td width="60" valign="top"></td><td width="470" class="programme" valign="top"><#TITLE>:<#SUBTITLE></td>
    With TAGS=T, I get 26 entries but none of the entries has any field values.
    With TAGS=TD, I get 25 entries but none of them have start times, and some entries have titles/subtitles and some dont (all entries should have at least a title).

    The url is TV Listing | espnstar.com . Is there no way to parse this?
     

    Users who are viewing this thread

    Top Bottom