Short plot gets superflous text (1 Viewer)

mortstar

MP Donator
  • Premium Supporter
  • January 30, 2008
    415
    41
    Home Country
    England England
    The imdb scraper script falls back to the short plot summary on the main-imdb page (I've slightly edited the function so that the scrapper script only picks up this as I'm not a big fan of the full synopsis which can be full of spoilers). The regex seems to be picking up an html link with the text ". Full summary" at the end of every matched film.

    For example Transsiberian (2008) get's the plot "A Trans-Siberian train journey from China to Moscow becomes a thrilling chase of deception and murder when an American couple encounters a mysterious pair of fellow travelers. full summary".

    Regex expressions make absolutely no sense to me so unfortunately I couldn't edit the script myself...any help much appreciated :D
     

    fforde

    Community Plugin Dev
    June 7, 2007
    2,667
    1,702
    43
    Texas
    Home Country
    United States of America United States of America
    If you post your current regex and the block of text you're trying to match against (including some buffer on each side) I will help if I can.
     

    mortstar

    MP Donator
  • Premium Supporter
  • January 30, 2008
    415
    41
    Home Country
    England England
    If you post your current regex and the block of text you're trying to match against (including some buffer on each side) I will help if I can.

    Regex is -
    <set name="rx_plot2">
    <![CDATA[
    <h5>Plot:</h5>\s+([^|]+)\|\s<a class="[^"]+" href="synopsis">
    ]]>
    </set>

    Parsed by -
    <parse name="summary2" input="${details_page}" regex="${rx_plot2}"/>
    <set name="summary_clean" value="${summary2[0][0]:striptags}" />
    <set name="movie.summary" value="${summary_clean:htmldecode}" />

    imdb page from example
    <div class="info">
    <h5>Plot:</h5>
    A Trans-Siberian train journey from China to Moscow becomes a thrilling chase of deception and murder when an American couple encounters a mysterious pair of fellow travelers. <a class="tn15more inline" href="/title/tt0800241/plotsummary" onclick="(new Image()).src='/rg/title-tease/plotsummary/images/b.gif?link=/title/tt0800241/plotsummary';">full summary</a> | <a class="tn15more inline" href="synopsis">full synopsis</a>
     

    mortstar

    MP Donator
  • Premium Supporter
  • January 30, 2008
    415
    41
    Home Country
    England England
    Works perfectly, thanks!

    You have the old regex expression in your imdb script, as all I did was swap round the order in which film plots are grabbed, so you'll need the update of expression too.
     

    fforde

    Community Plugin Dev
    June 7, 2007
    2,667
    1,702
    43
    Texas
    Home Country
    United States of America United States of America
    Works perfectly, thanks!

    You have the old regex expression in your imdb script, as all I did was swap round the order in which film plots are grabbed, so you'll need the update of expression too.

    Yeah I noticed that. It is changed I just have not committed it yet. Thanks for the heads up though and good find. ^^
     

    Users who are viewing this thread

    Top Bottom