Hi,
I've taken a look into the source of WebEPG because I had strange issues with a new grabber.
The grabber uses the Z-tag to match optional parts of the grabed page.
The processed sublink template of this grabber looks like this:
dsSsSdddD(pP)?pPDaA(aA)?Ddu(lL)?(lsSsSsSL)?(lsSL)?(lsSL)?(lsSL)?(lsSL)?(lsSL)?
The tags between ( and )? are the optional parts created using z-tag.
Lowercase characters are start of html-tag and uppercase = end. example d=div, D=/div
Now, suppose the linked webpage contains these processed data tags.
dsSsSdddDpPpPDaAaADdulLlsSLlsSLlsSLlsSLUDdDD
One would suppose that it would be recognised like this, by ommitting the optional unmatched parts:
dsSsSdddD(pP)pPDaA(aA)Ddu(lL)(lsSL)(lsSL)(lsSL)(lsSL)
However... that's not like WebEPG handles it all.
All what WebEPG does now is some marking. Marking if a tag (and so also data in between) is optional or not.
But, it doesn't do anything with that information.
It just tries to match ALL the template tags. If a template tag matches then fine.. if a template tag doesn't match also fine.. it just tries to match next one.
In the example: (lsSsSsSL)?
Is - found, processed.
SsSs - not found, skipped
SL - found, processed
This shouldn't be (unless I'm complete incorrect about the function of the Z-tag). The whole optional part should be skipped and be processed by next optional part (lsSL)?
This is leading to mismatched and unmatched fields.
To solve this needs quite some structural change, because individual tags are marked optional and not groups of tags are stored as optional. It needs an extra grouping layer.
Perhaps I can do some programming, but I'm not so familiar with C# yet.
gr,
Gijs
I've taken a look into the source of WebEPG because I had strange issues with a new grabber.
The grabber uses the Z-tag to match optional parts of the grabed page.
The processed sublink template of this grabber looks like this:
dsSsSdddD(pP)?pPDaA(aA)?Ddu(lL)?(lsSsSsSL)?(lsSL)?(lsSL)?(lsSL)?(lsSL)?(lsSL)?
The tags between ( and )? are the optional parts created using z-tag.
Lowercase characters are start of html-tag and uppercase = end. example d=div, D=/div
Now, suppose the linked webpage contains these processed data tags.
dsSsSdddDpPpPDaAaADdulLlsSLlsSLlsSLlsSLUDdDD
One would suppose that it would be recognised like this, by ommitting the optional unmatched parts:
dsSsSdddD(pP)pPDaA(aA)Ddu(lL)(lsSL)(lsSL)(lsSL)(lsSL)
However... that's not like WebEPG handles it all.
All what WebEPG does now is some marking. Marking if a tag (and so also data in between) is optional or not.
But, it doesn't do anything with that information.
It just tries to match ALL the template tags. If a template tag matches then fine.. if a template tag doesn't match also fine.. it just tries to match next one.
In the example: (lsSsSsSL)?
Is - found, processed.
SsSs - not found, skipped
SL - found, processed
This shouldn't be (unless I'm complete incorrect about the function of the Z-tag). The whole optional part should be skipped and be processed by next optional part (lsSL)?
This is leading to mismatched and unmatched fields.
To solve this needs quite some structural change, because individual tags are marked optional and not groups of tags are stored as optional. It needs an extra grouping layer.
Perhaps I can do some programming, but I'm not so familiar with C# yet.
gr,
Gijs