Spanish Scraper FilmAffinity.com with IMDb.es bonus to get fanarts -- v2.1.0 (1 Viewer)

Gixxer

Retired Team Member
  • Premium Supporter
  • August 18, 2007
    1,383
    41
    35
    Spain
    Spain Spain
    Rochess, is it at all possible to include some lines of code in the scraper so that the filename is splitted into 2 parts separated by the first [

    so the everything before [ would be used as title and the rest omitted.

    I now use the noise filter to delete things such as "blueray", "spanish", "español" but there are so many variaties that its impossible, but in the other hand, in 100% of the cases everything after [ is not useful.

    im not saying to include it in your scraper release but if you could tell my how to edit the code for my personal use.

    thanks alot. !!!
     

    Gixxer

    Retired Team Member
  • Premium Supporter
  • August 18, 2007
    1,383
    41
    35
    Spain
    Spain Spain
    Filename example:

    La Cinta Blanca [DVDScreener][Spanish][2010].avi
    New York City [DVDRIP][Spanish][salkjfhsad.com].avi


    of course the year is necessary for later parts of the scraper (i think) but not for the TITLE.

    thanks, let me know if u need more examples or information
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,342
    1,824
    Country flag
    • Thread starter
    • Moderator
    • #33
    Filename example:

    La Cinta Blanca [DVDScreener][Spanish][2010].avi
    New York City [DVDRIP][Spanish][salkjfhsad.com].avi


    of course the year is necessary for later parts of the scraper (i think) but not for the TITLE.

    thanks, let me know if u need more examples or information
    The Noise filter addition you are after = \s?\[\D+\]
     

    Gixxer

    Retired Team Member
  • Premium Supporter
  • August 18, 2007
    1,383
    41
    35
    Spain
    Spain Spain
    where should i inse rt that in the noise filter? Begining, middle, end? Thanks 4 helping
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,342
    1,824
    Country flag
    • Thread starter
    • Moderator
    • #35
    where should i inse rt that in the noise filter? Begining, middle, end? Thanks 4 helping
    I would put it at the beginning, because it matches your filenaming scheme the best, so you put:

    Code:
    \s?\[\D+\]|
    Before the existing string, the | is important because you want to keep this RegExp seperate from the rest.
     

    Gixxer

    Retired Team Member
  • Premium Supporter
  • August 18, 2007
    1,383
    41
    35
    Spain
    Spain Spain
    great, thanks. I will try it this afternoon. Also try to understand its meaning. I will report back
     

    Gixxer

    Retired Team Member
  • Premium Supporter
  • August 18, 2007
    1,383
    41
    35
    Spain
    Spain Spain
    this is the default noise filter:

    (([\(\{\[]|\b)((576|720|1080)[pi]|dir(ectors )?cut|dvd([r59]|rip|scr(eener)?)|(avc)?hd|wmv|Spanish|español|ntsc|pal|mpeg|dsr|r[1-5]|bd[59]|dts|ac3|blu(-)?ray|[hp]dtv|stv|hddvd|xvid|divx|x264|dxva|(?-i)FEST[Ii]VAL|L[iI]M[iI]TED|[WF]S|PROPER|REPACK|RER[Ii]P|REAL|RETA[Ii]L|EXTENDED|REMASTERED|UNRATED|CHRONO|THEATR[Ii]CAL|DC|SE|UNCUT|[Ii]NTERNAL|[DS]UBBED)([\]\)\}]|\b)(-[^\s]+$)?)

    should it end up like this????

    (\s?\[\D+\]|([\(\{\[]|\b)((576|720|1080)[pi]|dir(ectors )?cut|dvd([r59]|rip|scr(eener)?)|(avc)?hd|wmv|Spanish|español|ntsc|pal|mpeg|dsr|r[1-5]|bd[59]|dts|ac3|blu(-)?ray|[hp]dtv|stv|hddvd|xvid|divx|x264|dxva|(?-i)FEST[Ii]VAL|L[iI]M[iI]TED|[WF]S|PROPER|REPACK|RER[Ii]P|REAL|RETA[Ii]L|EXTENDED|REMASTERED|UNRATED|CHRONO|THEATR[Ii]CAL|DC|SE|UNCUT|[Ii]NTERNAL|[DS]UBBED)([\]\)\}]|\b)(-[^\s]+$)?)
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,342
    1,824
    Country flag
    • Thread starter
    • Moderator
    • #38
    this is the default noise filter:

    (([\(\{\[]|\b)((576|720|1080)[pi]|dir(ectors )?cut|dvd([r59]|rip|scr(eener)?)|(avc)?hd|wmv|Spanish|español|ntsc|pal|mpeg|dsr|r[1-5]|bd[59]|dts|ac3|blu(-)?ray|[hp]dtv|stv|hddvd|xvid|divx|x264|dxva|(?-i)FEST[Ii]VAL|L[iI]M[iI]TED|[WF]S|PROPER|REPACK|RER[Ii]P|REAL|RETA[Ii]L|EXTENDED|REMASTERED|UNRATED|CHRONO|THEATR[Ii]CAL|DC|SE|UNCUT|[Ii]NTERNAL|[DS]UBBED)([\]\)\}]|\b)(-[^\s]+$)?)

    should it end up like this????

    (\s?\[\D+\]|([\(\{\[]|\b)((576|720|1080)[pi]|dir(ectors )?cut|dvd([r59]|rip|scr(eener)?)|(avc)?hd|wmv|Spanish|español|ntsc|pal|mpeg|dsr|r[1-5]|bd[59]|dts|ac3|blu(-)?ray|[hp]dtv|stv|hddvd|xvid|divx|x264|dxva|(?-i)FEST[Ii]VAL|L[iI]M[iI]TED|[WF]S|PROPER|REPACK|RER[Ii]P|REAL|RETA[Ii]L|EXTENDED|REMASTERED|UNRATED|CHRONO|THEATR[Ii]CAL|DC|SE|UNCUT|[Ii]NTERNAL|[DS]UBBED)([\]\)\}]|\b)(-[^\s]+$)?)
    No, because that is not Before, but inside of the existing filter. Before means what it means, starting with, first, etc.

    As for the meaning:

    \s = space
    ? = previous RegExp is optional
    \[ = look for '[' character, the \ is needed to escape, because [ is used for Regular Expression definitions
    \D = find a character that is not a digit (as in 0-9)
    + = keep looking for the previous RegExp, in this case grab all the characters until the next character is found
    \] = look for ']' character.

    So result is that it will capture " [blabla]" and "[blabla]", but if it finds "[0000]" or even "[blabla0]" it will skip it. This way your "[2010]" and "[CD1]" filename stuff is not eliminated and can be used for proper indentification.
     

    Gixxer

    Retired Team Member
  • Premium Supporter
  • August 18, 2007
    1,383
    41
    35
    Spain
    Spain Spain
    ups sorry, just tried...

    \s?\[\D+\]|(([\(\{\[]|\b)((576|720|1080)[pi]|dir(ectors )?cut|dvd([r59]|rip|scr(eener)?)|(avc)?hd|wmv|ntsc|pal|mpeg|dsr|r[1-5]|bd[59]|dts|ac3|blu(-)?ray|[hp]dtv|stv|hddvd|xvid|divx|x264|dxva|(?-i)FEST[Ii]VAL|L[iI]M[iI]TED|[WF]S|PROPER|REPACK|RER[Ii]P|REAL|RETA[Ii]L|EXTENDED|REMASTERED|UNRATED|CHRONO|THEATR[Ii]CAL|DC|SE|UNCUT|[Ii]NTERNAL|[DS]UBBED)([\]\)\}]|\b)(-[^\s]+$)?)

    but not working as filename:
    Pandorum [DVDRIP][Spanish AC3 5.1][2010][newpct.com].avi

    when i hit the binoculars button, it shows in the text box:
    Pandorum[Spanish 5 1]

    any thing im doing wrong?

    thanks again
     

    Gixxer

    Retired Team Member
  • Premium Supporter
  • August 18, 2007
    1,383
    41
    35
    Spain
    Spain Spain
    analyzing your explanation, i can understand that the 5.1 is causing an issue as it identifies it as "maybe a year".

    am i right? any way of keeping the digits only if they come in the form of 4 consecutive digits?
     

    Users Who Are Viewing This Thread (Users: 0, Guests: 1)

    Top Bottom