WebGrab+Plus a new xmltv grabber (1 Viewer)

WG++Maker

Portal Pro
October 25, 2010
130
112
La Gomera, Canary Islands
Home Country
Spain Spain
I cleared my log file and re-run with the radiotimes channel 301 and my normal channels.
I have attached my config and log for you.
Thanks for looking at this,
Rich

Rich,

The new WG++ version introduces some options in the configfile that could improve the connection to sites. You were using the old one, so .. try the attached config file

WG++Maker Jan

btw: Channel 301 doesn't have much content!
 

Attachments

  • WebGrab++.config.xml
    13.3 KB

richy759

MP Donator
  • Premium Supporter
  • July 16, 2007
    130
    4
    Home Country
    United Kingdom United Kingdom
    thanks, I'll give it a go.
    301 is the BBC 1s sport interactive channel that's usually only active when sports events are on :)

    Rich
     

    richy759

    MP Donator
  • Premium Supporter
  • July 16, 2007
    130
    4
    Home Country
    United Kingdom United Kingdom
    Fantastic, its working. I have all my channels now and I haven't seen any more (?) since I added the index_only lines.
    Thanks a lot,
    Rich
     

    Quake505

    Portal Pro
    May 23, 2009
    68
    8
    First I would like to say that I find webgrab very easy to use, and creating my own grab .ini file was quick, I created a grab file for mydigiguide in a couple of hours, and then another couple modifying the output.

    Here are some things I have discovered when testing the grab file.

    Issues with webgrab: -

    1. Scrubbing the TV programs slows down dramatically when a lot of channels are being grabbed.

    I tested my grab .ini file by selecting all channels (around 650) for 10 days, the scrubbing (not grabbing from the web page) of the program information was very fast for the first channels, the first channel took less then 1 second (337 shows), but slows down with each new channel, when it got to around the 200 channel mark, webgrab took around 2 minutes to scrub 236 shows.

    It slow down so much that after 9 hours I stopped webgrab, it had done around 400 channels.

    2. Resources

    The CPU usage is always 50% when scrubbing, is this normal or is this related to 1?


    Issues with the mydigiguide grab file: -

    3. Need to use a high retry.

    Mydigiguide blocks access when too many pages are grabbed within a set period of time, I have witnessed the retry count being as high as 14.

    This could be easily resolved by adding a delay between page grabs.

    Is there a way to set a delay between page grabs?


    Instructions with the grab file attached (not a problem with webgrab): -

    * Channel list file.

    The channel list file will not include the last channel in the WebGrab++.config.xml, for this reason I always use BBC1 which has a SiteId of 1.

    I anticipate that the channel list file creation will fail with a minor change to the web page.

    * Star rating

    The star rating is only shown with 3 stars or above.

    * Star rating

    I added ‘index_starrating.modify {replace| star rating|}’ this means it can be changed for the desired output.

    Currently it will just have the number, it can be changed as follows: -

    index_starrating.modify {replace| star rating|/5}

    This would output 3/5 if the show star rating was 3.

    * Program Description

    The Description has (n) at the end.


    Hope you can help with issues 1-3 above.



    Regards,

    Q505
     

    Attachments

    • mydigiguide.com.Quake505-V1.zip
      1.3 KB

    WG++Maker

    Portal Pro
    October 25, 2010
    130
    112
    La Gomera, Canary Islands
    Home Country
    Spain Spain
    Hi Quake505,

    Impressive piece of work!!

    I am working on a complete answer, but I need a litte more time. Just for now the easy answers:

    Re 1. The reason that the scrubbing time increases with every channel added is that , supposing you start with an empty xmltv file, this file will grow. This file is updated with every show, it has to be inserted/appended to it. Although the .net xml commands used for it are very efficient, it is already noticable at around 5 to 10 thousend shows. With 650 channels for 10 days it can grow to around 200000 shows! This is not practical.
    Do you really want all these channels? If yes, a more practical approach would be to split in several runs and merge the files separately. The result will be a xmltv file of around 50 - 70 MB!
    Re 2. Yes the CPU consumption is caused by the xmltv file update
    Re 3.
    -- Delay ... At this moment there is no setting for a delay. Maybe something to consider for a next update
    -- Channel list file ... I run this only once to create it as a quick way to compose config files. It happens indeed that a channel is not easy to scrub (often the one that is selected). Just add it by hand and disable the scrubstrings in the siteini.
    -- Starrating .. I have solution and will send it when I am ready with the evaluation of your siteini.
    -- Program Description, the (n) at the end. It is an indication of the update type that created the show in the xmltv. (n,c,g,r) You can disable it by adding mode n in the config file.

    I will send you an siteini file with remarks and alternative solutions in the coming days. It might be easier to mail , if you agree send me a mail, my address is on the cover of the documentation.

    WG++Maker Jan
     

    Quake505

    Portal Pro
    May 23, 2009
    68
    8
    Hi WG++Maker Jan,

    Thanks for the feedback, the software was very easy to use and the documentation was great.

    1. No I don’t normally grab 650 channels, this was for testing, but I do grab 362 channels over 10 days with my current grabber, 362 channels over 10 days really slows the app when in full update mode (or xml file missing), but the performance is greatly improved if the app is running in incremental update mode, I would give more feedback but incremental mode means that the app grabs the pages quicker and mydigiguide block the IP more often (which slow everything down), which would be helped by a delay in the grabbing.

    2. OK

    3. A delay would be required if a user was going to grab a lot of channels over a high number of days, with the current version I would only recommend grabbing 2 or 3 days over a large number of channels.

    The issues below I was not expecting to be fixed, I knew it was an issue with my .ini and not webgrab.

     Channel List
    Yep, that’s why I always use the BBC1, makes it easy.

     Starrating
    Great, the reason why it only grabs 3 stars and above is because I was scrubbing this information by using the mydigiguide star picture jpg, and mydigiguide only displays a picture when the program has three stars or above. I guess your going to grab the information from the description, it will be interesting to see how you do this.

     Program Description
    :oops:, I was scratching my head for a while on this.

    Sent you a mail,

    Great work,

    Q505,
     

    Quake505

    Portal Pro
    May 23, 2009
    68
    8
    Hi, I have made a small change to the .ini file.

    The episode information was not getting inputed correctly so I added episodesystem=onscreen.
     

    Attachments

    • mydigiguide.com.Quake505-V2.zip
      1.3 KB

    WG++Maker

    Portal Pro
    October 25, 2010
    130
    112
    La Gomera, Canary Islands
    Home Country
    Spain Spain
    Version 1.0.7

    The new Version 1.0.7Beta is available @WebGrab+PlusV1.0.7Beta
    for the user it adds:
    *support for about 25 new sites(see below)! Thanks to Willy de Wilde and Quacke505!! It now covers most of Europe and some countries beyond that.
    *New xmltv elements : subtitles, premiere, previously shown, video aspect and quality.
    *Timeoffset channels : channels that only differ in start and stop time from another channel.
    *new parameters in the retry setting: timeout, channel-delay, index-delay, show-delay
    *possibility to disable the skip function of very long and very short shows

    and for the siteini developer:
    *new preconditionals > < >= and <=
    *support for fragmented multi-day index pages
    *new attribute 'force' for index_date
    *operations addstart, addend, remove and replace will now support multi value elements
    *site value retry, overruling the config retry for that channel

    Supported sites for the following countries (see the full list in Siteini.Pack.7 @ http://www.servercare.nl/Docs/SiteIni-List-Pack7.pdf):
    Belgium(2), Brazil, Croatia, Czech Republic, Denmark(2), Finland, France(2), Germany(2), Hungary, Italy(3), Netherlands(3), Norway, Poland(2), Portugal, Romania(3), Serbia, Slovakia, Spain(4), Sweden(2), Switserland(2), UK(6) and Networks like DirectTV(North America), OSN (Libanon, Saudi Arabia, North African Countries etc), Discovery.

    More information @WebGrab+Plus, an advanced XMLTV grabber - ServerCare home
     

    apuokas

    Portal Member
    February 1, 2011
    28
    4
    47
    Home Country
    AW: WebGrab+Plus a new xmltv grabber

    plase to make graber for this site: tv24.lt . There we have all of Lithuanian programs + some of germen, russian, etc. This site is complicated and i have no documentation how to create graber file too. I can help you in translations if you need. Write this graber, please
     

    Users who are viewing this thread

    Top Bottom