WebGrab+Plus a new xmltv grabber (2 Viewers)

kleinerflo

Portal Member
January 25, 2009
44
6
Bayern
Home Country
Germany Germany
Hello community use Webgrab + + quite a few times and am very happy with it. I have but with a series of problems with the various grappern Result. NCIS LA would be the title but the LA appears in Subtitle
This is from the Result Tvspielfilm.de this he also makes in tvtoday.de.
I hope you can help me how this can be brought under control
Code:
 <programme start="20130104160500 +0100" stop="20130104165000 +0100" channel="13th Street">
    <title lang="de">Navy CIS</title>
    <title lang="xx">NCIS: Los Angeles</title>
    <sub-title lang="de">L. A.. Der einsame Wolf</sub-title>
    <desc lang="de">Eine frühere Navy-Nachrichtendienstlerin, die für eine Friedensorganisation arbeitete, wird ermordet</desc>
    <credits>
      <director>James Whitmore Jr.</director>
      <actor>Chris O'Donnell (Special Agent G. Callen)</actor>
      <actor>LL Cool J (Special Agent Sam Hanna)</actor>
      <actor>Daniela Ruah (Special Agent Kensi Blye)</actor>
      <actor>Linda Hunt (Henrietta "Hetty" Lange)</actor>
      <actor>Peter Cambor (Nate "Doc" Getz)</actor>
      <actor>Eric Christian Olsen (Marty Deeks)</actor>
      <actor>Barrett Foa (Eric Beale)</actor>
    </credits>
    <category lang="de">USA 2011</category>
    <category lang="de">Krimiserie</category>
    <date>2011</date>
    <episode-num system="onscreen">Staffel 3|Folge 54</episode-num>
 

corporate_gadfly

Portal Pro
May 17, 2011
396
136
Home Country
Canada Canada
Thanks in advance for the wonderful program. I use tvguide.com for OTA EPG data for Buffalo (Toronto, actually - but close enough). Would someone be kind enough and willing to decipher the genre information for tvguide.com?

I have figured out the following which may be helpful:
  • In the tab-delimited file, the genre information is always the 7th element in the row.
Following lookup table can be used to figure out the genres:
  • 64 = movies
  • 1024 = sports
  • 2 = family
  • 256 = news
  • 1 = unknown?
Now, someone who knows more about "scrubs" can perhaps help out with getting genre information for tvguide.com?

Here's a small sampling of sports and news:


0 2.1 WGRZHD 5 The Tim McCarver Show 6 1024 8192 16 0 2 8 189347611822 201301200330 30 347390 0
0 2.1 WGRZHD 5 On the Money With Maria Bartiromo 6 256 8192 16 0 2 12 21449700 11822 201301200600 30 205344 0
0 2.1 WGRZHD 5 Sunday Daybreak 18 256 8192 16 0 2 12 12741185 11822 201301200630 90 0 0
0 2.1 WGRZHD 5 Today 12 256 8192 16 0 2 44 21428577 11822 201301200800 60 0 0
0 2.1 WGRZHD 5 Meet the Press 12 256 8192 16 0 2 44 21422376 11822 201301200900 60 203044 0
0 2.1 WGRZHD 5 Sabres Pregame 6 1024 8192 16 0 2 8 19252951 11822 201301201200 30 347088 0
0 2.1 WGRZHD 5 NHL Hockey:Flyers at Sabres 30 1024 8192 16 0 2 41 1149318 11822 201301201230 150 0 0
0 2.1 WGRZHD 5 Skiing:U.S. Freestyle World Cup 12 1024 8192 16 0 2 40 4063434 11822 201301201500 60 0 0
0 2.1 WGRZHD 5 Channel 2 News 6 256 8192 16 0 2 4 4270837 11822 201301201800 30 0 0
0 2.1 WGRZHD 5 NBC Nightly News 6 256 8192 16 0 2 44 222693311822 201301201830 30 0 0
0 2.1 WGRZHD 5 Channel 2 News 6 256 8192 16 0 2 4 4270838 11822 201301202300 30 0 0


Again, thanks in advance.

Cheers.
 
Last edited:

WG++Maker

Portal Pro
October 25, 2010
130
112
La Gomera, Canary Islands
Home Country
Spain Spain
Hello community use Webgrab + + quite a few times and am very happy with it. I have but with a series of problems with the various grappern Result. NCIS LA would be the title but the LA appears in Subtitle
This is from the Result Tvspielfilm.de this he also makes in tvtoday.de.
I hope you can help me how this can be brought under control
Code:
 <programme start="20130104160500 +0100" stop="20130104165000 +0100" channel="13th Street">
	<title lang="de">Navy CIS</title>
	<title lang="xx">NCIS: Los Angeles</title>
	<sub-title lang="de">L. A.. Der einsame Wolf</sub-title>
	<desc lang="de">Eine frühere Navy-Nachrichtendienstlerin, die für eine Friedensorganisation arbeitete, wird ermordet</desc>
	<credits>
	  <director>James Whitmore Jr.</director>
	  <actor>Chris O'Donnell (Special Agent G. Callen)</actor>
	  <actor>LL Cool J (Special Agent Sam Hanna)</actor>
	  <actor>Daniela Ruah (Special Agent Kensi Blye)</actor>
	  <actor>Linda Hunt (Henrietta "Hetty" Lange)</actor>
	  <actor>Peter Cambor (Nate "Doc" Getz)</actor>
	  <actor>Eric Christian Olsen (Marty Deeks)</actor>
	  <actor>Barrett Foa (Eric Beale)</actor>
	</credits>
	<category lang="de">USA 2011</category>
	<category lang="de">Krimiserie</category>
	<date>2011</date>
	<episode-num system="onscreen">Staffel 3|Folge 54</episode-num>

Hi Kleinerflo,

sorry that I didn't reply earlier, for some reason the auto notification doesn't seem to work for me.

I need a bit of time to sort it out, but will be back.

WG++Maker .. Jan[DOUBLEPOST=1358785142][/DOUBLEPOST]
Thanks in advance for the wonderful program. I use tvguide.com for OTA EPG data for Buffalo (Toronto, actually - but close enough). Would someone be kind enough and willing to decipher the genre information for tvguide.com?

I have figured out the following which may be helpful:
  • In the tab-delimited file, the genre information is always the 7th element in the row.
Following lookup table can be used to figure out the genres:
  • 64 = movies
  • 1024 = sports
  • 2 = family
  • 256 = news
  • 1 = unknown?
Now, someone who knows more about "scrubs" can perhaps help out with getting genre information for tvguide.com?

Here's a small sampling of sports and news:


0 2.1 WGRZHD 5 The Tim McCarver Show 6 1024 8192 16 0 2 8 189347611822 201301200330 30 347390 0
0 2.1 WGRZHD 5 On the Money With Maria Bartiromo 6 256 8192 16 0 2 12 21449700 11822 201301200600 30 205344 0
0 2.1 WGRZHD 5 Sunday Daybreak 18 256 8192 16 0 2 12 12741185 11822 201301200630 90 0 0 0 2.1 WGRZHD 5 Today 12 256 8192 16 0 2 44 21428577 11822 201301200800 60 0 0 0 2.1 WGRZHD 5 Meet the Press 12 256 8192 16 0 2 44 21422376 11822 201301200900 60 203044 0 0 2.1 WGRZHD 5 Sabres Pregame 6 1024 8192 16 0 2 8 19252951 11822 201301201200 30 347088 0 0 2.1 WGRZHD 5 NHL Hockey:Flyers at Sabres 30 1024 8192 16 0 2 41 1149318 11822 201301201230 150 0 0 0 2.1 WGRZHD 5 Skiing:U.S. Freestyle World Cup 12 1024 8192 16 0 2 40 4063434 11822 201301201500 60 0 0 0 2.1 WGRZHD 5 Channel 2 News 6 256 8192 16 0 2 4 4270837 11822 201301201800 30 0 0 0 2.1 WGRZHD 5 NBC Nightly News 6 256 8192 16 0 2 44 222693311822 201301201830 30 0 0 0 2.1 WGRZHD 5 Channel 2 News 6 256 8192 16 0 2 4 4270838 11822 201301202300 30 0 0


Again, thanks in advance.

Cheers.

Hi,

I need a few days and will be back

WG++Maker .. Jan
 

silentbuteo2

Portal Member
January 20, 2013
8
1
Home Country
Belgium Belgium
@kleinerflo
I've tested this, but with me it is grabbed correctly. Are you using the latest version of the .ini file?
The latest version is from 01/11/2012.
here is the link: http://webgrabplus.com/sites/default/files/download/ini/detail/de_tvspielfilm.de.zip

If the problem still occurs with the latest version, just let me know and i'll check further.

XML:
<programme start="20130122143000 +0100" stop="20130122152000 +0100" channel="13th Street Universal">
	<title lang="de">Navy CIS</title>
	<sub-title lang="de">Max Destructo</sub-title>
	<desc lang="de">Makabrer Auftakt: Gibbs (Mark Harmon) und sein Team entdecken in einer gestohlenen Damenhandtasche Fingerkuppen und Zähne eines Corporals. Der Fall führt zu einer Gruppe Computerfreaks (unter ihnen: Beth Riesgraf aus der Serie "Leverage", hier als Maxine) und zeigt, dass Gibbs mit moderner Technik immer noch auf Kriegsfuß steht.(n)</desc>
	<category lang="de">Serie</category>
	<category lang="de">Krimiserie</category>
	<episode-num system="onscreen"> Folge 178</episode-num>
  </programme>
  <programme start="20130122152000 +0100" stop="20130122160500 +0100" channel="13th Street Universal">
	<title lang="de">Navy CIS: L. A.</title>
	<sub-title lang="de">Die Koreanerin</sub-title>
	<desc lang="de">Wissenschaftler Daniel Su, der eine Hightechausrüstung fürs Marine Corps entwickelt, wird ermordet. Ein Überwachungsvideo zeigt "Die Koreanerin" Lee Wuan Kai, eine kaltblütige Auftragskillerin, mit der bereits der NCIS an der Ostküste zu tun hatte. Weshalb Laborgenie Abby (Pauley Perrette) ein Gastspiel gibt(n)</desc>
	<category lang="de">Serie</category>
	<category lang="de">Krimiserie</category>
	<episode-num system="onscreen"> Folge 5</episode-num>
  </programme>
  <programme start="20130122160500 +0100" stop="20130122165000 +0100" channel="13th Street Universal">
	<title lang="de">Navy CIS: L. A.</title>
	<sub-title lang="de">Tinte in den Adern</sub-title>
	<desc lang="de">Ein Marine fällt während einer Party von der Dachterrasse eines Hotels. Wie sich herausstellt, war er bereits ohnmächtig, als ihn jemand über die Brüstung warf. Eine Spur führt die Navy-Agenten Callen, Kensi und Sam (Chris O'Donnell, Daniela Ruah, LL Cool J) zu einer Falschgeldbande(n)</desc>
	<category lang="de">Serie</category>
	<category lang="de">Krimiserie</category>
	<episode-num system="onscreen"> Folge 6</episode-num>
  </programme>
 
Last edited:

silentbuteo2

Portal Member
January 20, 2013
8
1
Home Country
Belgium Belgium
@corporate_gadfly

I have looked at this "problem" and found out that now only the subcategory is grabbed in the code. (basketball, football, comedy, ...)
I adjusted the code to also grab the main category (movie, sports, family, news)
Can you just test this. I tested it with some site, but before I put it in the release, I want you to test it also.
Just change your .ini file with the code below. First find the line with "urldate.format" in your .ini and remove all what is below (inlcuding that line). And then append the code below.


Code:
urldate.format {datestring|} * no value but required by the program
index_showsplit.scrub {multi|'index_variable_element'||\n}
index_temp_3.scrub {single(separator="\t" include=11)||||} *scrubs the show_id, needed for index_urlshow
*
index_date.scrub	{single(force)|||\t|}
index_temp_1.scrub  {single(separator="\t" include=13)||||} * start in format yyyyMMddHHmm, we use substring
index_title.scrub	{single(separator="\t" include=3)||||}
index_temp_4.scrub	{single(debug separator="\t" include=5)||||} * category on the main page
*
title.scrub {single(separator="\t" include=2)||||<div style=}
subtitle.scrub {single(separator="\t" include=3)||||<div style=}
description.scrub {single(separator="\t" include=4)||||<div style=}
director.scrub {single(separator="\t" include=14)||||<div style=}
actor.scrub {single(single(separator="\t" include=15)||||<div style=}
temp_1.scrub {single(single(separator="\t" include=11 exclude="other")||||<div style=} * category	(from detail page)
temp_2.scrub {single(single(separator="\t" include=12 exclude="other")||||<div style=} * subcategory (from detail page)
rating.scrub {single(single(separator="\t" include=8)||||<div style=}
productiondate.scrub {single(single(separator="\t" include=10)||||<div style=}
*
* operations:
*index_variable_element.modify {addstart|\t'config_xmltv_id'}
*index_variable_element.modify {substring(type=word)|0 1}
*index_variable_element.modify {addend|\t}
scope.range {(datelogo)|end}
index_variable_element.modify {addstart|'config_xmltv_id'}
index_variable_element.modify {substring(type=word)|-1 1}
* must contain a number
index_temp_6.modify {calculate(format=F0)|'index_variable_element'}
* clear if not a number
index_variable_element.modify {clear('index_temp_6' "0")}
index_variable_element.modify {addstart|\t}
index_variable_element.modify {addend|\t}
end_scope
*
scope.range {(indexshowdetails)|end}
* correct date :
index_date.modify {substring(type=char)|0 10}
* compose start :
index_temp_2.modify {substring(type=char)|'index_temp_1' 8 2} * the hours of start
index_start.modify {addstart|'index_temp_2':} * add hours minutes separator
index_temp_2.modify {substring(type=char)|'index_temp_1' -2} * the minutes of start
index_start.modify {addend|'index_temp_2'}
* compose index_urlshow :
index_urlshow.modify {addstart('index_temp_3' not "")|http://www.tvguide.com/listings/data/detailcache.aspx?Qr='index_temp_3'&tvoid=0&v2=1}
end_scope
*
title.modify {addstart(scope=showdetails "")|'index_title'}
actor.modify {replace(scope=showdetails)|,|\|} * make actor multi a multi element
* translate the category id to string
index_temp_4.modify {replace("1")|1|}
index_temp_4.modify {replace( "2")|2|family}
index_temp_4.modify {replace("64")|64|movie}
index_temp_4.modify {replace("256")|256|news}
index_temp_4.modify {replace("1024")|1024|sports}
* add all the categories together
category.modify {addstart(scope=showdetails 'temp_2' not "")|'temp_2'\|} * add subcategory (from the detail page)
category.modify {addstart(scope=showdetails 'temp_1' not "")|'temp_1'\|} * add category	(from detail page)
category.modify {addstart( 'index_temp_4' not "")|'index_temp_4'\|}	  * add category	(from index page)
 
category.modify {cleanup(scope=showdetails removeduplicates=equal)}
 
**  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _  _
**	  #####  CHANNEL FILE CREATION (only to create the xxx-channel.xml file)
**
** @auto_xml_channel_start
** the following 8 entries create a channel list file:
*index_site_channel.scrub {multi(separator="\t")|magic=\n|||\n}
*index_site_channel.modify {replace| |-} * replace space in channel name by -
*index_site_channel.modify {replace|\b| } * replace char U+0008 (word separator) in space
*index_site_channel.modify {substring(type=word)|0 2}
*index_site_id.scrub {multi(separator="\t")|magic=\n|||\n}
*index_site_id.modify {replace| |-}
*index_site_id.modify {replace|\b| } * replace char U+0008 (word separator) in space
*index_site_id.modify {substring(type=word)|-1}
** @auto_xml_channel_end
 

corporate_gadfly

Portal Pro
May 17, 2011
396
136
Home Country
Canada Canada
@corporate_gadfly



I have looked at this "problem" and found out that now only the subcategory is grabbed in the code. (basketball, football, comedy, ...)

I adjusted the code to also grab the main category (movie, sports, family, news)

Can you just test this. I tested it with some site, but before I put it in the release, I want you to test it also.

Just change your .ini file with the code below. First find the line with "urldate.format" in your .ini and remove all what is below (inlcuding that line). And then append the code below.
I am so sorry @silentbuteo2. I probably made you do all that work for nothing. That's what happens when you encounter something new and unknown. I opened up my original tvguide.xml (the one processed by existing .ini files) and sure enough it has tons of <category> lines already:
Code:
	<category lang="en">sports</category>
	<category lang="en">hockey</category>
where, I guess, sports is the main category with hockey being the subcategory. So, unless there was something obviously wrong with the original .ini files. I am reluctant to change over to the new ones.

Do you still want me to test?
 

silentbuteo2

Portal Member
January 20, 2013
8
1
Home Country
Belgium Belgium
@corporate_gadfly
The original .ini file already contained the code to grab the main category, but it was commented out.
But the main gategory was not always available. So I extended the code to do that. So now the code grabs the extra info.
So if the old code works for you, just use that. If you want to test the new code, it is already released.
http://webgrabplus.com/epg-channels#co
 
Last edited:

tom78

Portal Pro
August 10, 2007
149
5
Home Country
Germany Germany
Hello.
Since a while i have a problem with the tvtv.de.ini-file. (Update to SiteINI 11.10 does't solve the problem)
Webgrab downloads the index page, but then it shows the message "no shows in index page! Cannot find any shows in the index page !"

Here's one channel for example:
<channel update="i" site="tvtv.de" site_id="11" xmltv_id="VOX">VOX</channel>

Could you please have a look at this?!
Thanks!
 

Lightning303

MP Donator
  • Premium Supporter
  • September 12, 2009
    798
    577
    Home Country
    Germany Germany
    Hello.
    Since a while i have a problem with the tvtv.de.ini-file. (Update to SiteINI 11.10 does't solve the problem)
    Webgrab downloads the index page, but then it shows the message "no shows in index page! Cannot find any shows in the index page !"

    Here's one channel for example:
    <channel update="i" site="tvtv.de" site_id="11" xmltv_id="VOX">VOX</channel>

    Could you please have a look at this?!
    Thanks!

    same here :( seems tvtv.de was bought by another company, maybe they fiddled around with their code.
    would be great if WG++Maker could fix that (also the tvtv.de.xmltv_ns.ini version, as i am using that one ;P).
    thanks
     

    Users who are viewing this thread

    Top Bottom