Normal
Hi guys,first question, what is 0.2.3+ version?And second: my solution so far..Thanks for hints JiRo, I dig a little deeper to the xml, spend some hours to undestand it and made some changes for me.Search node I left without changes. In the get_details node, I changed just section for parsing title from CSFD movie detail page. In the page, there is some property og:title for facebook and it seems to me, that there is always czech title / original title ( or only original, if czech not available ). So I parsed this out and then parsed both titles from it. Made one configuration variable in the script, when I choose, what I want in the title ( cz, ori, firstori, firstcz ).I don't see the aka titles in MP plugin, so I decided to do it this way.Here is the changed script part:[CODE]... <!-- Retrieve details --> <set name="movie.details_url" value="${site}${movie.site_id}" /> <retrieve name="details_page" url="${movie.details_url}" encoding="utf-8" retries="10" timeout_increment="3000" allow_unsafe_header="true" /> <!-- Set variable to prefer original name or czech name from CSFD DB values: cz, ori, firstori, firstcz --> <set name="pref_title" value="firstori" /> <!-- Regular expressions for parsing og:title property from movie detail html page --> <set name="rx_og_title"> <![CDATA[ <**** property="og:title" content="(.*?)" /> ]]> </set> <set name="rx_parse_og_title"> <![CDATA[ content="(.*?) / (.*?)\( ]]> </set> <!-- OG **** property title --> <parse name="og_title_all" input="${details_page}" regex="${rx_og_title}" /> <parse name="title_main" input="${og_title_all}" regex="${rx_parse_og_title}" /> <parse name="title_ori" input="${title_main[0][1]}" regex="(.+?)(?:, (The|A|An|Ein|El|Das|Die|Der|Les|Un|Une))?[ \t]*$" /> <!-- Accorging to pref_title variable, set movie title --> <if test="${pref_title}=ori"> <if test="${title_ori[0][0]}="> <set name="movie.title" value="${title_main[0][0]:htmldecode}" /> </if> <if test="${title_ori[0][0]}!="> <set name="movie.title" value="${title_ori[0][0]:htmldecode}" /> </if> </if> <if test="${pref_title}=cz"> <set name="movie.title" value="${title_main[0][0]:htmldecode}" /> </if> <if test="${pref_title}=firstori"> <if test="${title_ori[0][0]}="> <set name="movie.title" value="${title_main[0][0]:htmldecode}" /> </if> <if test="${title_ori[0][0]}!="> <if test="${title_ori[0][0]}=${title_main[0][0]}"> <set name="movie.title" value="${title_main[0][0]:htmldecode}" /> </if> <if test="${title_ori[0][0]}!=${title_main[0][0]}"> <set name="movie.title" value="${title_ori[0][0]:htmldecode} ( ${title_main[0][0]:htmldecode} )" /> </if> </if> </if> <if test="${pref_title}=firstcz"> <if test="${title_ori[0][0]}="> <set name="movie.title" value="${title_main[0][0]:htmldecode}" /> </if> <if test="${title_ori[0][0]}!="> <if test="${title_ori[0][0]}=${title_main[0][0]}"> <set name="movie.title" value="${title_main[0][0]:htmldecode}" /> </if> <if test="${title_ori[0][0]}!=${title_main[0][0]}"> <set name="movie.title" value="$${title_main[0][0]:htmldecode} ( ${title_ori[0][0]:htmldecode} )" /> </if> </if> </if> <!-- Title ( original from Trottel, not used) --> <!-- <parse name="titleaa" input="${details_page}" regex="<h1>(.+?)(?:, (The|A|An|Ein|El|Das|Der|Die|Les|Un|Une))?(?:\s<span.+?</span>)?.*?</h1>" /> <set name="movie.title" value="${titleaa[0][1]:htmldecode} ${titleaa[0][0]:htmldecode}" /> <replace name="movie.title" input="${movie.title}" pattern="( \(TV film\))" with="" /> --> <!-- Alternate Titles -->...[/CODE]Attached result in MP.Metelka ( Jindrich )
Hi guys,
first question, what is 0.2.3+ version?
And second: my solution so far..
Thanks for hints JiRo, I dig a little deeper to the xml, spend some hours to undestand it and made some changes for me.
Search node I left without changes. In the get_details node, I changed just section for parsing title from CSFD movie detail page. In the page, there is some property og:title for facebook and it seems to me, that there is always czech title / original title ( or only original, if czech not available ). So I parsed this out and then parsed both titles from it. Made one configuration variable in the script, when I choose, what I want in the title ( cz, ori, firstori, firstcz ).
I don't see the aka titles in MP plugin, so I decided to do it this way.
Here is the changed script part:
[CODE]
...
<!-- Retrieve details -->
<set name="movie.details_url" value="${site}${movie.site_id}" />
<retrieve name="details_page" url="${movie.details_url}" encoding="utf-8" retries="10" timeout_increment="3000" allow_unsafe_header="true" />
<!-- Set variable to prefer original name or czech name from CSFD DB values: cz, ori, firstori, firstcz -->
<set name="pref_title" value="firstori" />
<!-- Regular expressions for parsing og:title property from movie detail html page -->
<set name="rx_og_title">
<![CDATA[
<**** property="og:title" content="(.*?)" />
]]>
</set>
<set name="rx_parse_og_title">
content="(.*?) / (.*?)\(
<!-- OG **** property title -->
<parse name="og_title_all" input="${details_page}" regex="${rx_og_title}" />
<parse name="title_main" input="${og_title_all}" regex="${rx_parse_og_title}" />
<parse name="title_ori" input="${title_main[0][1]}" regex="(.+?)(?:, (The|A|An|Ein|El|Das|Die|Der|Les|Un|Une))?[ \t]*$" />
<!-- Accorging to pref_title variable, set movie title -->
<if test="${pref_title}=ori">
<if test="${title_ori[0][0]}=">
<set name="movie.title" value="${title_main[0][0]:htmldecode}" />
</if>
<if test="${title_ori[0][0]}!=">
<set name="movie.title" value="${title_ori[0][0]:htmldecode}" />
<if test="${pref_title}=cz">
<if test="${pref_title}=firstori">
<if test="${title_ori[0][0]}=${title_main[0][0]}">
<if test="${title_ori[0][0]}!=${title_main[0][0]}">
<set name="movie.title" value="${title_ori[0][0]:htmldecode} ( ${title_main[0][0]:htmldecode} )" />
<if test="${pref_title}=firstcz">
<set name="movie.title" value="$${title_main[0][0]:htmldecode} ( ${title_ori[0][0]:htmldecode} )" />
<!-- Title ( original from Trottel, not used) -->
<!--
<parse name="titleaa" input="${details_page}" regex="<h1>(.+?)(?:, (The|A|An|Ein|El|Das|Der|Die|Les|Un|Une))?(?:\s<span.+?</span>)?.*?</h1>" />
<set name="movie.title" value="${titleaa[0][1]:htmldecode} ${titleaa[0][0]:htmldecode}" />
<replace name="movie.title" input="${movie.title}" pattern="( \(TV film\))" with="" />
-->
<!-- Alternate Titles -->
[/CODE]
Attached result in MP.
Metelka ( Jindrich )