Devs help on Sublink search: Bug or missunderstanding? (1 Viewer)

lulsam

Portal Pro
December 28, 2006
116
6
Home Country
Spain Spain
MediaPortal Version: 1.0.2 stable
MediaPortal Skin: Bluetwo
Windows Version: XP SP3
CPU Type: AMD 64 3000+
HDD: Seagate 200 GB
Memory: 1 GB DDR
Motherboard: Gigabyte
Video Card: nvidia 6100 (mb integrated)
Video Card Driver: latest
Sound Card: integrated motherboard
Sound Card AC3:
Sound Card Driver:
1. TV Card: Hauppauge HVR-1300
1. TV Card Type: Hybrid analog - DVB-T
1. TV Card Driver: latest
2. TV Card:
2. TV Card Type:
2. TV Card Driver:
3. TV Card:
3. TV Card Type:
3. TV Card Driver:
4. TV Card:
4. TV Card Type:
4. TV Card Driver:
MPEG2 Video Codec:
MPEG2 Audio Codec:
h.264 Video Codec:
Satelite/CableTV Provider:
HTPC Case:
Cooling:
Power Supply:
Remote: Hauppauge 45 buttons
TV:
TV - HTPC Connection:

Trying to develop a new grabber in spain, and after weeks of work (I had to learn how to develop from scratch) able to get almost
everything working out but the "Sublink search" feature. My understanding is that this feature is not working out as expected when some specific conditions take place, but I would like to confirm with the devs. It works out when the code is something like:

<a href="whatever link to search">

but if the code includes extra references the results are somewhat weird. Let me write down an example of real code:

--------------real source code of a given web--------------

<a class="ee" title="Aeropuerto 77| (NO REC. MENORES DE 7 AÑOS)| M-27. 23:10h. - 01:05h. (Cine - Acción)" href="/guiatv/fichaemision.html?grepi=P&id=16308627&tipo=R">Aeropuerto 77</a>

-----------------------end of real code--------------------------

----------------------piece of code from the grabber--------------------

<Sublink search="fichaemision" template="Details" />

--------------------- end of grabber code-------------------

--------------------results (taken from webepg.log) ---------------------

2009-10-27 16:56:10.703125 [Info.][WebEPG-xmltv]: WebEPG: SubLink Request http://www.plus.es/guiatv/NO REC. MENORES DE 7 AÑOS POST:
2009-10-27 16:56:11.187500 [Warn.][WebEPG-xmltv]: WebEPG: Getting sublinked data failed

---------------------------end of webepg.log--------------

It does not matter which piece of the target link I tried (fichaemision, guiatv, grepi, /guiatv/fichaemision. html, ....) the result is the same: Webepg tries to build the sublink with a part of code that is between the brackets, and if there are more than one part fo the code between brackets it goes for the first one. notice that in the example provided there are two different parts of the code between brackets, but in this other case:

<a class="ee" title="New York Undercover: Venganzas raciales| Temporada: 2 / Episodio: 10| | MI-28. 03:30h. - 04:20h. (Series - Policiaca)" href="/guiatv/fichaemision.html?grepi=P&id=16308631&tipo=R">New York Undercover: Venganzas raciales</a>

where there is only one piece of code between brackets, the sublink that webepg tries to request is:

http://www.plus.es/guiatv/Series - Policiaca


Anyway, this is the side efect, the fact (to be confirmed by somebody else) is that a piece of we source code like this: <a some_piece_of_additional_info_here href="whatever_link.html"> confuses webepg. I do not know if this could be considered a bug or the issue is a non standard web code or it is simply something that is not usual and therefore the people who develop webepg did not take into account. (it is impossible to take into account everything)

Would somebody else confirm this investigation?:D in advance.

PD: By the way, I notice that there is also another thread talking about an issue with the "Sublink search" feature, but I am not sure if it is fully related or not https://forum.team-mediaportal.com/webepg-136/search-sublink-extra-info-tag-69756/ because the extra info is not before the href but after
 

Attachments

  • www_plus-prueba_es.xml
    30.7 KB
  • WebEPG.xml
    30.6 KB

arion_p

Retired Team Member
  • Premium Supporter
  • February 7, 2007
    3,373
    1,626
    Athens
    Home Country
    Greece Greece
    Not sure why this happens, I have to look into it and report back.
    In general what should happen is that the search string is used to identify the tag containing the link. Then WebEPG will try to extract the link. The extraction is somewhat complex because it also supports extracting some simple forms of javascript links. In your case the tag is identified correctly, but the extraction of the url fails.

    The other thread you have linked is related in the sense that the extraction process also fails but for a different reason (unquoted attribute values in the tag).
     

    arion_p

    Retired Team Member
  • Premium Supporter
  • February 7, 2007
    3,373
    1,626
    Athens
    Home Country
    Greece Greece
    Just had a look at the code. The problem is, as I imagined, the parsing of javascript links. Anything that is within parenthesis is assumes to be parameters to a javascript function call. In general link parsing in MP's html parsers is rather weak. Not sure yet how I could fix that. Right now you have no way to work around this unfortunately.
    When I have more info I will post back.
     

    lulsam

    Portal Pro
    December 28, 2006
    116
    6
    Home Country
    Spain Spain
    Let me know if I can help you to test. This grabber may be valuable to the Mediaportal spanish community because it takes the data from the offical website of the only one sat provider we have here and this ensures that programs and descriptions are updated. Unfortunately this is not the case with other webs, this is why I started to develop the grabber, desperated with unnacuracies that drives issues to record.

    By the way, nad based on my ignorance on how it is structured the webepg code. Would it be possible to detect all the string between quotes, I mean href="string_to_be_detected" regardless what is written between the quotes?

    Probably you already take it into account........

    Thanks in advance
     

    arion_p

    Retired Team Member
  • Premium Supporter
  • February 7, 2007
    3,373
    1,626
    Athens
    Home Country
    Greece Greece
    By the way, nad based on my ignorance on how it is structured the webepg code. Would it be possible to detect all the string between quotes, I mean href="string_to_be_detected" regardless what is written between the quotes?

    Not sure what you mean by that. In the search string you only need to specify a string that is contained in the A tag, but not contained anywhere else. You cannot specify which part(s) of the tag will be extracted as the target url.
     

    lulsam

    Portal Pro
    December 28, 2006
    116
    6
    Home Country
    Spain Spain
    By the way, nad based on my ignorance on how it is structured the webepg code. Would it be possible to detect all the string between quotes, I mean href="string_to_be_detected" regardless what is written between the quotes?

    Not sure what you mean by that. In the search string you only need to specify a string that is contained in the A tag, but not contained anywhere else. You cannot specify which part(s) of the tag will be extracted as the target url.

    I explain the idea pretty bad, excuse me :confused:. What I suggest it is not to make any modification to the grabber, but to the webepg code!!

    Based on what you said, seems to be that webepg identifies correctly the sentence, in my example:

    <a class="ee" title="Aeropuerto 77| (NO REC. MENORES DE 7 AÑOS)| M-27. 23:10h. - 01:05h. (Cine - Acción)" href="/guiatv/fichaemision.html?grepi=P&id=16308627&tipo=R">Aero puerto 77</a>

    but fails to build / locate the right URL because takes a piece of HTML code between brackets as the right one instead of what is between the quotes taht follows the href command. In this especific example: (NO REC. MENORES DE 7 AÑOS)

    I don't know how webepg works, perhaps it tries to parse the full <a bla_bla_bla href="..."> sentence. I am just wondering whether it should be possible to focus just on the href command (highlighted in red color). I am aware it sounds stupid and I don't know if the way webepg works requires to parse the full <a href> instead of just the href.

    Do not forget that I have just born to xml coding / parser three weeks ago so forgive me if the idea is fully wrong :oops:
     

    arion_p

    Retired Team Member
  • Premium Supporter
  • February 7, 2007
    3,373
    1,626
    Athens
    Home Country
    Greece Greece
    Now I understand what you mean. However what you suggest is not possible because javascript calls can be made in numerous attributes of the A tag (e.g. onclick, onmouseover, href, etc). Restricting the search to href only will certainly break some of the existing grabbers. I can only see the following solutions:

    1. Add an option so that a grabber may choose to not search for javascript calls.
    2. Limit the search to specific attributes of the A tag (e.g. there is no point in searching the title attribute)
    3. Same as 2 but allow the grabber to specify which attribute(s) to search
    Any of the above changes need to be made to the core html parsing code of MP (used also for movie info grabbing and others). As you know we are in feature freeze right now and this change will affect and may break other parts of MP, so it is not possible to be implemented before 1.1.0 release is out.
     

    lulsam

    Portal Pro
    December 28, 2006
    116
    6
    Home Country
    Spain Spain
    All right, if we have to wait after the 1.1.0 will be released you should consider to go for the most complete and flexible option instead of a short term solution. IMHO this is the option 3 you described. What do you think?

    By the way, is there a chance to include this webepg enhance / bug fix in mantis for "MP Future Release" so somebody will take care in 2010?

    Just a final comment: You guys are performing an absolutely G-R-E-A-T job.

    :Du very much
     

    lulsam

    Portal Pro
    December 28, 2006
    116
    6
    Home Country
    Spain Spain
    Hi again arion_p,

    after one year and a half, and taking into account that there aren't so many reliable sources of web EPG data for DVB-S spanish channels (in fact only for a smalll set of channels), maybe it is time to reschedule this issue and include it in the 1.3.0 roadmap. Takning into acount the alternatives analised in the mantins comments it does not seem to be extremely complex to solve it.

    What do you think arion? Is it fair to ask for this reschedule?

    Thanks in advance
     

    Users who are viewing this thread

    Top Bottom