Spanish Scraper FilmAffinity.com with IMDb.es bonus to get fanarts -- v2.1.0 | Page 4

Discussion in 'Moving Pictures' started by RoChess, December 28, 2009.

  1. Gixxer
    • Premium Supporter

    Gixxer Retired Team Member

    Joined:
    August 18, 2007
    Messages:
    1,383
    Likes Received:
    41
    Occupation:
    Mechanical Engineer
    Location:
    Spain
    Ratings:
    +41 / 0
    Home Country:
    Spain Spain
    Rochess, is it at all possible to include some lines of code in the scraper so that the filename is splitted into 2 parts separated by the first [

    so the everything before [ would be used as title and the rest omitted.

    I now use the noise filter to delete things such as "blueray", "spanish", "español" but there are so many variaties that its impossible, but in the other hand, in 100% of the cases everything after [ is not useful.

    im not saying to include it in your scraper release but if you could tell my how to edit the code for my personal use.

    thanks alot. !!!


     
  2. Google AdSense Guest Advertisement



    to hide all adverts.
  3. Gixxer
    • Premium Supporter

    Gixxer Retired Team Member

    Joined:
    August 18, 2007
    Messages:
    1,383
    Likes Received:
    41
    Occupation:
    Mechanical Engineer
    Location:
    Spain
    Ratings:
    +41 / 0
    Home Country:
    Spain Spain
    Filename example:

    La Cinta Blanca [DVDScreener][Spanish][2010].avi
    New York City [DVDRIP][Spanish][salkjfhsad.com].avi


    of course the year is necessary for later parts of the scraper (i think) but not for the TITLE.

    thanks, let me know if u need more examples or information
     
  4. RoChess
    • Premium Supporter

    RoChess Extension Developer

    Joined:
    March 10, 2006
    Messages:
    4,182
    Likes Received:
    1,304
    Ratings:
    +1,683 / 2
    The Noise filter addition you are after = \s?\[\D+\]
     
  5. Gixxer
    • Premium Supporter

    Gixxer Retired Team Member

    Joined:
    August 18, 2007
    Messages:
    1,383
    Likes Received:
    41
    Occupation:
    Mechanical Engineer
    Location:
    Spain
    Ratings:
    +41 / 0
    Home Country:
    Spain Spain
    where should i inse rt that in the noise filter? Begining, middle, end? Thanks 4 helping
     
  6. RoChess
    • Premium Supporter

    RoChess Extension Developer

    Joined:
    March 10, 2006
    Messages:
    4,182
    Likes Received:
    1,304
    Ratings:
    +1,683 / 2
    I would put it at the beginning, because it matches your filenaming scheme the best, so you put:

    Code (Text):
    1. \s?\[\D+\]|
    Before the existing string, the | is important because you want to keep this RegExp seperate from the rest.
     
  7. Gixxer
    • Premium Supporter

    Gixxer Retired Team Member

    Joined:
    August 18, 2007
    Messages:
    1,383
    Likes Received:
    41
    Occupation:
    Mechanical Engineer
    Location:
    Spain
    Ratings:
    +41 / 0
    Home Country:
    Spain Spain
    great, thanks. I will try it this afternoon. Also try to understand its meaning. I will report back
     
  8. Gixxer
    • Premium Supporter

    Gixxer Retired Team Member

    Joined:
    August 18, 2007
    Messages:
    1,383
    Likes Received:
    41
    Occupation:
    Mechanical Engineer
    Location:
    Spain
    Ratings:
    +41 / 0
    Home Country:
    Spain Spain
    this is the default noise filter:

    (([\(\{\[]|\b)((576|720|1080)[pi]|dir(ectors )?cut|dvd([r59]|rip|scr(eener)?)|(avc)?hd|wmv|Spanish|español|ntsc|pal|mpeg|dsr|r[1-5]|bd[59]|dts|ac3|blu(-)?ray|[hp]dtv|stv|hddvd|xvid|divx|x264|dxva|(?-i)FEST[Ii]VAL|L[iI]M[iI]TED|[WF]S|PROPER|REPACK|RER[Ii]P|REAL|RETA[Ii]L|EXTENDED|REMASTERED|UNRATED|CHRONO|THEATR[Ii]CAL|DC|SE|UNCUT|[Ii]NTERNAL|[DS]UBBED)([\]\)\}]|\b)(-[^\s]+$)?)

    should it end up like this????

    (\s?\[\D+\]|([\(\{\[]|\b)((576|720|1080)[pi]|dir(ectors )?cut|dvd([r59]|rip|scr(eener)?)|(avc)?hd|wmv|Spanish|español|ntsc|pal|mpeg|dsr|r[1-5]|bd[59]|dts|ac3|blu(-)?ray|[hp]dtv|stv|hddvd|xvid|divx|x264|dxva|(?-i)FEST[Ii]VAL|L[iI]M[iI]TED|[WF]S|PROPER|REPACK|RER[Ii]P|REAL|RETA[Ii]L|EXTENDED|REMASTERED|UNRATED|CHRONO|THEATR[Ii]CAL|DC|SE|UNCUT|[Ii]NTERNAL|[DS]UBBED)([\]\)\}]|\b)(-[^\s]+$)?)
     
  9. RoChess
    • Premium Supporter

    RoChess Extension Developer

    Joined:
    March 10, 2006
    Messages:
    4,182
    Likes Received:
    1,304
    Ratings:
    +1,683 / 2
    No, because that is not Before, but inside of the existing filter. Before means what it means, starting with, first, etc.

    As for the meaning:

    \s = space
    ? = previous RegExp is optional
    \[ = look for '[' character, the \ is needed to escape, because [ is used for Regular Expression definitions
    \D = find a character that is not a digit (as in 0-9)
    + = keep looking for the previous RegExp, in this case grab all the characters until the next character is found
    \] = look for ']' character.

    So result is that it will capture " [blabla]" and "[blabla]", but if it finds "[0000]" or even "[blabla0]" it will skip it. This way your "[2010]" and "[CD1]" filename stuff is not eliminated and can be used for proper indentification.
     
  10. Gixxer
    • Premium Supporter

    Gixxer Retired Team Member

    Joined:
    August 18, 2007
    Messages:
    1,383
    Likes Received:
    41
    Occupation:
    Mechanical Engineer
    Location:
    Spain
    Ratings:
    +41 / 0
    Home Country:
    Spain Spain
    ups sorry, just tried...

    \s?\[\D+\]|(([\(\{\[]|\b)((576|720|1080)[pi]|dir(ectors )?cut|dvd([r59]|rip|scr(eener)?)|(avc)?hd|wmv|ntsc|pal|mpeg|dsr|r[1-5]|bd[59]|dts|ac3|blu(-)?ray|[hp]dtv|stv|hddvd|xvid|divx|x264|dxva|(?-i)FEST[Ii]VAL|L[iI]M[iI]TED|[WF]S|PROPER|REPACK|RER[Ii]P|REAL|RETA[Ii]L|EXTENDED|REMASTERED|UNRATED|CHRONO|THEATR[Ii]CAL|DC|SE|UNCUT|[Ii]NTERNAL|[DS]UBBED)([\]\)\}]|\b)(-[^\s]+$)?)

    but not working as filename:
    Pandorum [DVDRIP][Spanish AC3 5.1][2010][newpct.com].avi

    when i hit the binoculars button, it shows in the text box:
    Pandorum[Spanish 5 1]

    any thing im doing wrong?

    thanks again
     
  11. Gixxer
    • Premium Supporter

    Gixxer Retired Team Member

    Joined:
    August 18, 2007
    Messages:
    1,383
    Likes Received:
    41
    Occupation:
    Mechanical Engineer
    Location:
    Spain
    Ratings:
    +41 / 0
    Home Country:
    Spain Spain
    analyzing your explanation, i can understand that the 5.1 is causing an issue as it identifies it as "maybe a year".

    am i right? any way of keeping the digits only if they come in the form of 4 consecutive digits?
     
Loading...

Users Viewing Thread (Users: 0, Guests: 1)

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice
  • About The Project

    The vision of the MediaPortal project is to create a free open source media centre application, which supports all advanced media centre functions, and is accessible to all Windows users.

    In reaching this goal we are working every day to make sure our software is one of the best.

             

  • Support MediaPortal!

    The team works very hard to make sure the community is running the best HTPC-software. We give away MediaPortal for free but hosting and software is not for us.

    Care to support our work with a few bucks? We'd really appreciate it!