Hi,
I was wondering why not just use full regex expressions for template matching tv programs and where ever possible.
Current system is perhaps more userfriendly(?), but limited when things get complicated.
It's possbile to do the full regex way with creating a template.
Example:
partly html code from a dutch tvguide (TVGids.nl - Zoeken):
<div class="program">
<a href="/programma/9519408/Nederland_in_Beweging%21/">
<span class="time">08:45 - 09:00</span>
<span class="title">Nederland in Beweging!</span>
<span class="channel">Nederland 1</span>
</a>
template could be like this:
TemplateProgram =
<div class="program">[^<]*
<a href="(?<SUBLINK>[^"]*)">[^<]*
<span class="time">(?<START>[^\s]*) - (?<END>[^<]*)</span>[^<]*
<span class="title">(?<TITLE>[^<]*)</span>[^<]*
<span class="channel">(?<CHANNEL>[^<]*)</span>[^<]*
</a>
some sample sourcecode:
Regex ProgramSearch = new Regex(TemplateProgram)
MatchCollection ProgramMatches = ProgramSearch.Matches(HtmlPageText)
foreach(Match ProgramMatch in ProgramMatches)
{
sublink = ProgramMatch.Groups["SUBLINK"].Value;
starttime = ProgramMatch.Groups["START"].Value;
endtime = ProgramMatch.Groups["END"].Value;
title = ProgramMatch.Groups["TITLE"].Value ;
extrafields.Add("CHANNEL", ProgramMatch.Groups["CHANNEL"].Value);
}
gr,
Gijs
I was wondering why not just use full regex expressions for template matching tv programs and where ever possible.
Current system is perhaps more userfriendly(?), but limited when things get complicated.
It's possbile to do the full regex way with creating a template.
Example:
partly html code from a dutch tvguide (TVGids.nl - Zoeken):
<div class="program">
<a href="/programma/9519408/Nederland_in_Beweging%21/">
<span class="time">08:45 - 09:00</span>
<span class="title">Nederland in Beweging!</span>
<span class="channel">Nederland 1</span>
</a>
template could be like this:
TemplateProgram =
<div class="program">[^<]*
<a href="(?<SUBLINK>[^"]*)">[^<]*
<span class="time">(?<START>[^\s]*) - (?<END>[^<]*)</span>[^<]*
<span class="title">(?<TITLE>[^<]*)</span>[^<]*
<span class="channel">(?<CHANNEL>[^<]*)</span>[^<]*
</a>
some sample sourcecode:
Regex ProgramSearch = new Regex(TemplateProgram)
MatchCollection ProgramMatches = ProgramSearch.Matches(HtmlPageText)
foreach(Match ProgramMatch in ProgramMatches)
{
sublink = ProgramMatch.Groups["SUBLINK"].Value;
starttime = ProgramMatch.Groups["START"].Value;
endtime = ProgramMatch.Groups["END"].Value;
title = ProgramMatch.Groups["TITLE"].Value ;
extrafields.Add("CHANNEL", ProgramMatch.Groups["CHANNEL"].Value);
}
gr,
Gijs