home
products
contribute
download
documentation
forum
Home
Forums
New posts
Search forums
What's new
New posts
All posts
Latest activity
Members
Registered members
Current visitors
Donate
Log in
Register
What's new
Search
Search
Search titles only
By:
New posts
Search forums
Search titles only
By:
Menu
Log in
Register
Navigation
Install the app
Install
More options
Contact us
Close Menu
Forums
MediaPortal 1
Development
General Development (no feature request here!)
Getting Data from the Web
Contact us
RSS
JavaScript is disabled. For a better experience, please enable JavaScript in your browser before proceeding.
You are using an out of date browser. It may not display this or other websites correctly.
You should upgrade or use an
alternative browser
.
Reply to thread
Message
<blockquote data-quote="James" data-source="post: 89650" data-attributes="member: 12681"><p>Hi patrick,</p><p></p><p>Thanks for your interest. I do plan to add detailed info the wiki, but until I find some time I will give you the details here.</p><p></p><p>The start and end, are simply search strings to quickly filter out large parts of the html source. As sometimes the template it not totally unique on the page. Also can help speed the parsing on large pages.</p><p></p><p>Ok the template is basically html tags and parser tags</p><p></p><p>All html tags are supported, include comments <!-- ---></p><p></p><p>Parser tags are ones I invented for marking the place where the interesting data is. I have 3 at the moment <#xxxx> <*xxxx> <Zxx>.</p><p></p><p>In this post I will just cover the major one <#xxx></p><p></p><p>A simple template would look like this:</p><p></p><p>[CODE]</p><p><tr></p><p><td><#START></td></p><p><td><#TITLE></td></p><p><td></p><p></tr></p><p>[/CODE]</p><p></p><p>The parser searches for this pattern in the HTML source and reports the number of times it finds it.</p><p></p><p>When you ask it to parse a certain occurance, it will get the text form the html source, located where the <#START> and <#TITLE> tags are and pass this into an IParserData object using the SetElement(string tag, string value) method. </p><p></p><p>In this case tag = "#START" or "#TITLE" and value will be the text located in the html source at this location. Characters can be put in front and behind the <#> tags to remove part of the text.</p><p></p><p>so "-<#START>." will search for the '-' and '.' and pass what is between these as the value string into the SetElement method. To use more then one character in front and behind you need to use the following syntax <#TAGNAME:front,back>, where front and back are search strings (either can be empty). If no search strings/characters are given, then it will go to the next tag. Of cause extra parsing can be done in the IParserData object. You just need to create a new class with this interface.</p><p></p><p>The tag names can be anything, and just need to be in your template and in the IParserData class must know what to do with them. I have made a very simple ParserData class which just stores the tag/value pair in a Dictionary, these can then be retreived by tag name later. This will take any tag and value pair.</p><p></p><p>The WebEPG IParserData class however looks like this:</p><p></p><p>[CODE]</p><p>switch (tag)</p><p> {</p><p> case "#START":</p><p> BasicTime startTime = GetTime(element);</p><p> break;</p><p> case "#TITLE":</p><p> _title = element.Trim(' ', '\n', '\t');</p><p> break;</p><p>...</p><p>[/CODE]</p><p></p><p>It does extra parsing of the element values, for example trimming the spaces and other junk or parsing the time values from strings.</p><p></p><p>The Tags variable, tells the parser which HTML tags are interesting, all other tags will be ignored. It is the first character of the HTML tag name.</p><p></p><p>So </p><p>"T" = all table tags</p><p>"I" = img </p><p>"D" = div</p><p>"!" = comment</p><p>etc.</p><p></p><p>I take all table tags as one group, mutliple tags can of course be given ie "TSD" (table, span, div), etc, etc.</p><p></p><p>So in this example I would use "T" as all the tags are table tags (ie starting with the letter T). This means that the real HTML source could have other tags in it, but the parser would match it because it would just ingore these tags.</p><p></p><p>General it to use a few tags as required to make the template unique to the data. Using too many tags can mean small changes require template changes. Such tags like table tags which define structure are good, because the structure doesn't often change. </p><p></p><p>I hope this helps. I will try to get a start on the wiki documentation soon.</p><p></p><p>/James</p></blockquote><p></p>
[QUOTE="James, post: 89650, member: 12681"] Hi patrick, Thanks for your interest. I do plan to add detailed info the wiki, but until I find some time I will give you the details here. The start and end, are simply search strings to quickly filter out large parts of the html source. As sometimes the template it not totally unique on the page. Also can help speed the parsing on large pages. Ok the template is basically html tags and parser tags All html tags are supported, include comments <!-- ---> Parser tags are ones I invented for marking the place where the interesting data is. I have 3 at the moment <#xxxx> <*xxxx> <Zxx>. In this post I will just cover the major one <#xxx> A simple template would look like this: [CODE] <tr> <td><#START></td> <td><#TITLE></td> <td> </tr> [/CODE] The parser searches for this pattern in the HTML source and reports the number of times it finds it. When you ask it to parse a certain occurance, it will get the text form the html source, located where the <#START> and <#TITLE> tags are and pass this into an IParserData object using the SetElement(string tag, string value) method. In this case tag = "#START" or "#TITLE" and value will be the text located in the html source at this location. Characters can be put in front and behind the <#> tags to remove part of the text. so "-<#START>." will search for the '-' and '.' and pass what is between these as the value string into the SetElement method. To use more then one character in front and behind you need to use the following syntax <#TAGNAME:front,back>, where front and back are search strings (either can be empty). If no search strings/characters are given, then it will go to the next tag. Of cause extra parsing can be done in the IParserData object. You just need to create a new class with this interface. The tag names can be anything, and just need to be in your template and in the IParserData class must know what to do with them. I have made a very simple ParserData class which just stores the tag/value pair in a Dictionary, these can then be retreived by tag name later. This will take any tag and value pair. The WebEPG IParserData class however looks like this: [CODE] switch (tag) { case "#START": BasicTime startTime = GetTime(element); break; case "#TITLE": _title = element.Trim(' ', '\n', '\t'); break; ... [/CODE] It does extra parsing of the element values, for example trimming the spaces and other junk or parsing the time values from strings. The Tags variable, tells the parser which HTML tags are interesting, all other tags will be ignored. It is the first character of the HTML tag name. So "T" = all table tags "I" = img "D" = div "!" = comment etc. I take all table tags as one group, mutliple tags can of course be given ie "TSD" (table, span, div), etc, etc. So in this example I would use "T" as all the tags are table tags (ie starting with the letter T). This means that the real HTML source could have other tags in it, but the parser would match it because it would just ingore these tags. General it to use a few tags as required to make the template unique to the data. Using too many tags can mean small changes require template changes. Such tags like table tags which define structure are good, because the structure doesn't often change. I hope this helps. I will try to get a start on the wiki documentation soon. /James [/QUOTE]
Insert quotes…
Verification
Post reply
Forums
MediaPortal 1
Development
General Development (no feature request here!)
Getting Data from the Web
Contact us
RSS
Top
Bottom