WebEPG pagegrab delay not working correctly (1 Viewer)

benjerry

MP Donator
  • Premium Supporter
  • September 26, 2007
    167
    10
    Home Country
    Netherlands Netherlands
    It must be something client side in WebEPG or something WebEPG does not do like pretending to be an IE browser.

    Agent reference seems outdated.. but is it filtered on server-side?

    HTTPTransaction.cs
    Code:
      public class HTTPTransaction
      {
        #region Variables
    
        private string _agent = "Mozilla/4.0 (compatible; MSIE 6.0;  WindowsNT 5.0; .NET CLR 1 .1.4322)";

    Wget didn't have any problem:
    Wget normally identifies as `Wget/version', version being the current version number of Wget.

    It might be also something to do with how the url is processed in these classes.
     

    benjerry

    MP Donator
  • Premium Supporter
  • September 26, 2007
    167
    10
    Home Country
    Netherlands Netherlands
    Ok. Well, I know now what's the problem. It's not the agent, I simulated the one from WebEPG in Wget and it was no problem.

    I then used a packetsniffer to inspect the trafic between WebEPG-Designer and upc website and it confirmed the possible problem with "./" in the URL.

    You can see here that the dot in "Cartoon+Netw." is left out in the request:

    Code:
    GET /TV/Guide/Channel/Cartoon+Netw/Today/ HTTP/1.1
    User-Agent: Mozilla/4.0 (compatible; MSIE 6.0;  WindowsNT 5.0; .NET CLR 1 .1.4322)
    Host: tvgids.upc.nl
    Connection: Keep-Alive
    
    
    HTTP/1.1 200 Apple
    Date: Tue, 08 Jun 2010 13:36:05 GMT
    Server: Apache
    cache-control: max-age=300
    cache-control: must-revalidate
    set-cookie: wosid=zX2dZHis1a8XPk0w9EpWyg; version="1"; path=/
    set-cookie: woinst=1; version="1"; path=/
    connection: close
    content-length: 6461
    Content-Type: text/html; charset=utf-8
    
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">

    Hi.

    I've solved my problem with "./" in the url by putting a space in between.

    Code:
    GET /TV/Guide/Channel/Cartoon+Netw.%20/Tomorrow/ HTTP/1.1
    User-Agent: Mozilla/4.0 (compatible; MSIE 6.0;  WindowsNT 5.0; .NET CLR 1 .1.4322)
    Host: tvgids.upc.nl
    Connection: Keep-Alive
    
    
    HTTP/1.1 200 Apple
    Date: Tue, 08 Jun 2010 14:16:33 GMT
    Server: Apache

    Luckily, the website accepted the extra space and returned all the program information.

    I've found 4 out of 147 channels in the grabber with this problem and I shall pay extra attention with testing those.
     

    arion_p

    Retired Team Member
  • Premium Supporter
  • February 7, 2007
    3,373
    1,626
    Athens
    Home Country
    Greece Greece
    That behavior is really odd. I have taken a look at the code and I don't see anything relating to this issue. So I am inclined to believe it is .NET Framework that removes the "." at some point while processing the URLs. Please post your grabber so I do some debugging to find out what is going on.
     

    benjerry

    MP Donator
  • Premium Supporter
  • September 26, 2007
    167
    10
    Home Country
    Netherlands Netherlands
    That behavior is really odd. I have taken a look at the code and I don't see anything relating to this issue. So I am inclined to believe it is .NET Framework that removes the "." at some point while processing the URLs. Please post your grabber so I do some debugging to find out what is going on.

    Makes sense, didn't see anything either and was thinking in similar direction at one point.

    You can find a version here: https://forum.team-mediaportal.com/webepg-136/upc-tv-guide-webepg-75516/#post625227

    Only difference with my current version is space between ./ and delay is now 500 (the minimum) instead of 12000 in that file.

    Edit:
    I've done some research on the web. And it looks like it's a bug or, I guess, unwanted behaviour in the URI class.

    See also here:
    DotNet: Uri Bug? ,DotNet. - dotnet.itags.org

    Somebody talking about similar url with dots:
    Ive encountered a problem very similar to this; In my case the uri http://www.explodedgrandad.com/Site/Exploded Grandad Welcomes You../rss.xml is converted to http://www.explodedgrandad.com/Site/Exploded Grandad Welcomes You/rss.xml

    Note the two missing dots near the end. The two dots are in the "path" part of the uri and surely the path /Site/Exploded%20Grandad%20Welcomes%20You../rss.xml is NOT the same as the path Site/Exploded%20Grandad%20Welcomes%20You/rss.xml

    Ive tried replacing the two dots with their ascii values %2E but they are still removed.

    The original uri is handled fine by IE and firefox. With the two dots removed I, unsurprisingly, get a 404.

    Is this a bug?
     

    benjerry

    MP Donator
  • Premium Supporter
  • September 26, 2007
    167
    10
    Home Country
    Netherlands Netherlands
    About double delay bug:

    first time is in HTMLPage.LoadPage and second time in case of internal grabbing method it's in HTTPTransaction.Transaction

    HTMLPage.cs old:

    // Delay before getting page
    if (page.Delay > 0)
    Thread.Sleep(page.Delay);

    bool success;

    if (page.External)
    {
    success = GetExternal(page);
    }
    else
    {
    success = GetInternal(page);
    }

    should be okay to move to external only, because delay already handled in GetInternal->Page.HTTPGet(page)->Transaction(request)

    HTMLPage.cs new:

    bool success;

    if (page.External)
    {
    // Delay before getting page
    if (page.Delay > 0)
    Thread.Sleep(page.Delay);

    success = GetExternal(page);
    }
    else
    {
    success = GetInternal(page);
    }

    or move the delay to the private function GetExternal.
     

    Users who are viewing this thread


    Write your reply...
    Top Bottom