Reply to thread

Hi guys,


first question, what is 0.2.3+ version?


And second: my solution so far..


Thanks for hints JiRo, I dig a little deeper to the xml, spend some hours to undestand it and made some changes for me.


Search node I left without changes. In the get_details node, I changed just section for parsing title from CSFD movie detail page. In the page, there is some property og:title for facebook and it seems to me, that there is always czech title / original title ( or only original, if czech not available ). So I parsed this out and then parsed both titles from it. Made one configuration variable in the script, when I choose, what I want in the title ( cz, ori, firstori, firstcz ).


I don't see the aka titles in MP plugin, so I decided to do it this way.


Here is the changed script part:


[CODE]

...


      <!-- Retrieve details -->


      <set name="movie.details_url" value="${site}${movie.site_id}" />

      <retrieve name="details_page" url="${movie.details_url}" encoding="utf-8" retries="10" timeout_increment="3000" allow_unsafe_header="true" />


     

      <!-- Set variable to prefer original name or czech name from CSFD DB values: cz, ori, firstori, firstcz -->

      <set name="pref_title" value="firstori" />


      <!-- Regular expressions for parsing og:title property from movie detail html page -->

 

       <set name="rx_og_title">

        <![CDATA[

        <**** property="og:title" content="(.*?)" />

        ]]>

      </set>


      <set name="rx_parse_og_title">

        <![CDATA[

        content="(.*?) / (.*?)\(

        ]]>

      </set>


       <!-- OG **** property title -->

      <parse name="og_title_all" input="${details_page}" regex="${rx_og_title}" />

      <parse name="title_main" input="${og_title_all}" regex="${rx_parse_og_title}" />

      <parse name="title_ori" input="${title_main[0][1]}" regex="(.+?)(?:, (The|A|An|Ein|El|Das|Die|Der|Les|Un|Une))?[ \t]*$" />

  

       <!-- Accorging to pref_title variable, set movie title -->

      

      <if test="${pref_title}=ori">

        <if test="${title_ori[0][0]}=">

          <set name="movie.title" value="${title_main[0][0]:htmldecode}" />

        </if>

        <if test="${title_ori[0][0]}!=">

          <set name="movie.title" value="${title_ori[0][0]:htmldecode}" />

        </if>

      </if>

     <if test="${pref_title}=cz">

        <set name="movie.title" value="${title_main[0][0]:htmldecode}" />

     </if>

      <if test="${pref_title}=firstori">

        <if test="${title_ori[0][0]}=">

          <set name="movie.title" value="${title_main[0][0]:htmldecode}" />

        </if>

        <if test="${title_ori[0][0]}!=">

          <if test="${title_ori[0][0]}=${title_main[0][0]}">

            <set name="movie.title" value="${title_main[0][0]:htmldecode}" />

          </if>

          <if test="${title_ori[0][0]}!=${title_main[0][0]}">

            <set name="movie.title" value="${title_ori[0][0]:htmldecode} ( ${title_main[0][0]:htmldecode} )" />

          </if>

        </if>

      </if>

      <if test="${pref_title}=firstcz">

        <if test="${title_ori[0][0]}=">

          <set name="movie.title" value="${title_main[0][0]:htmldecode}" />

        </if>

        <if test="${title_ori[0][0]}!=">

          <if test="${title_ori[0][0]}=${title_main[0][0]}">

            <set name="movie.title" value="${title_main[0][0]:htmldecode}" />

          </if>

          <if test="${title_ori[0][0]}!=${title_main[0][0]}">

            <set name="movie.title" value="$${title_main[0][0]:htmldecode} ( ${title_ori[0][0]:htmldecode} )" />

          </if>

        </if>

      </if>



      <!-- Title  ( original from Trottel, not used) -->

      <!--

     <parse name="titleaa" input="${details_page}" regex="&lt;h1&gt;(.+?)(?:, (The|A|An|Ein|El|Das|Der|Die|Les|Un|Une))?(?:\s&lt;span.+?&lt;/span&gt;)?.*?&lt;/h1&gt;" />

      <set name="movie.title" value="${titleaa[0][1]:htmldecode} ${titleaa[0][0]:htmldecode}" />

      <replace name="movie.title" input="${movie.title}" pattern="( \(TV film\))" with="" />

      -->

 

      <!-- Alternate Titles -->


...

[/CODE]


Attached result in MP.


Metelka ( Jindrich )


Top Bottom