Noise Filter Exchange (1 Viewer)

RoChess

Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    There might be other users who have weird filenames that throw off the Moving Pictures plugin auto-approval rate without modification. To fix this you can use the 'Noise Filter' option to clean up your filenames before they are passed onto the scrapers, such as the one for imdb.com. You can find the Noise Filter option under the "About" tab -> "Advanced Settings" button and then locate the one for the 'Noise Filter'.

    The filter I use = \s-\s\d+x\d+.+|\(Director\'s\sCut\)|\(Live\)|\(Unrated\)|\(Extended\)|\(Screener\)

    This takes care of filenames such as: "Elephants Dream [2006] - 1920x1080 S-MPEG 4.2 @ 448Kbps AC3.avi"

    But it also takes care of filenames such as "THX 1138 (Director's Cut) [2004].avi"

    The Noise Filter is a Regular Expression, I'm not a master in them, but I'll try to explain the one I use:

    | = seperates multiple patterns
    \s = space
    \d = single decimal number
    \d+ = multiple decimal numbers
    \' = '
    \( = since '(' can be used by an expression you need to use '\(' when you want to locate a '('​

    So the "\(Director\'s\sCut\)" part simply means locate "(Director's Cut)" and it will be ignored. Now I could have used a single "\(.+\)" to match all "(.....)" parts and ignore them, but I also put Foreign language movie title translations between those parentheses, which are sometimes needed to match the right title.

    The more complex "\s-\s\d+x\d+.+" one looks for "(space)-(space)(decimal numbers)x(decimal numbers)....." where the .+ means that any character is valid and there has to be at least 1. Can also use .* which means any character even if none exists. Taken the example filename, this turns "Elephants Dream [2006] - 1920x1080 S-MPEG 4.2 @ 448Kbps AC3.avi" into "Elephants Dream [2006]" by stripping out the part that matches and results in an auto-approve match via the imdb.com scraper.

    For more explanations on Regular Expressions as well as ability to test out your creation use: RegExr: Online Regular Expression Testing Tool. If you paste a printout of your filenames into the big box (you could use "dir /b/ogn > dirlist.txt" for example to do so), then you can create and test out your own 'Noise Filter' easy.
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    • Thread starter
    • Moderator
    • #3
    Yes, you should be able to simply add the noise filter I posted at the end of the existing one by seperating it with a |, so you get: existing|new

    The default one just never worked for me, and if anybody else uses their own filenaming scheme that gets them into trouble for a proper auto-approval rating, I figure they can either share their Noise Filter, or request help to create a custom one that will fix things for them :)

    Obviously the default one works for the large majority, but if you end up manually approving a lot of your filenames, then a good Noise Filter might be all that you need.
     

    Guzzi

    Retired Team Member
  • Premium Supporter
  • August 20, 2007
    2,159
    750
    AW: Noise Filter Exchange

    Hi RoChess,
    I have a question: What is the best format that should be given to the scraper: Just the movie title or should the year also be part of the string? If the latter - is there a format required, should the year be like (2008) or [2008] or just at the end of the movietitle? THe goal is to get best autoapproves - that are for a lot of movies only possible if the year is available.

    How should I build a noisefilter to get proper cleaning for movies titled like
    Moviename (uncut) (2007) (SAT1) (AC3)
    ? (Actually this is the directoryname, that is used for parsing)

    I currently use "(\((.*?)\).*$)" - but htis also cuts the year from the moviename, that is in the middle...

    THanks for all your work,
    Guzzi
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    • Thread starter
    • Moderator
    • #5
    Re: AW: Noise Filter Exchange

    Hi RoChess,
    I have a question: What is the best format that should be given to the scraper: Just the movie title or should the year also be part of the string? If the latter - is there a format required, should the year be like (2008) or [2008] or just at the end of the movietitle? THe goal is to get best autoapproves - that are for a lot of movies only possible if the year is available.

    How should I build a noisefilter to get proper cleaning for movies titled like
    Moviename (uncut) (2007) (SAT1) (AC3)
    ? (Actually this is the directoryname, that is used for parsing)

    I currently use "(\((.*?)\).*$)" - but htis also cuts the year from the moviename, that is in the middle...

    THanks for all your work,
    Guzzi

    There are multiple steps to the process.

    1 = MovingPictures plugin takes the filename and before it passes it on to the scraper it tries to obtain as much information as possible. It will try to obtain "Title", "Year", and "Part number" from the filename. Title being the obvious one, Year to assist in proper matching when multiple movies share the title (especially remakes) and part number to group files together.

    2 = The scraper(s) take all this information and run a search (via the search node), by default if one scraper finds 3 results, no other scraper is used. This behavior can be controlled via advanced settings.

    3 = MovingPictures plugin uses all these search results from 1 or multiple scrapers to decide if the first result from the first scraper is an exact match (this is then auto-approved), or presents the user with a drop down box with all the options. It will also insert other info if it finds it, and such info wasn't found by previous step (especially non-IMDb scrapers), such as the IMDb-ID number from a NFO file if that option is enabled (and an NFO file is found).

    4 = User verifies the right one, or the auto-approve results are used.

    5 = MovingPictures plugin then parses on the correct movie back to the scraper, to the details node.

    6 = Scraper will obtain all the information, summary, rating, etc.

    7 = MovingPictures plugin gets all this info, puts it in the database and then initiates another scraper run to get backdrops, covers, etc.

    8 = Backdrop and cover scraper obtain all the images

    9 = MovingPictures plugin makes final adjustments to the database.


    So the main problem is to make step #2 work as good as possible, offering the year is very important, this can of course be manually provided via a manual search with the binocular icon, but that might be already too late and the wrong movie might have gotten auto-approved based on popularity rankings at the movie info site.

    So yes, it is crucial you attempt to parse the year. And MovingPictures plugin doesn't care for 2010, (2010), or [2010], it looks for a 4 digit number at the end of the filename.

    Taking your own example "Moviename (uncut) (2007) (SAT1) (AC3)" you would have to eliminate "(uncut)", "(SAT1)" and "(AC3)", via:

    (([\(\{\[]|\b)((576|720|1080)[pi]|[Dd]ir(ector[']?s )?[Cc]ut|dvd([r59]|rip|scr
    (eener)?)|(avc)?hd|wmv|ntsc|pal|mpeg|dsr|r[1-5]|bd[59]|dts|ac3|AC3|blu(-)?ray|[hp]
    dtv|stv|hddvd|xvid|divx|x264|dxva|(?-i)FEST[Ii]VAL|L[iI]M[iI]TED|[WF]
    S|PROPER|REPACK|RER[Ii]P|REAL|RETA[Ii]L|EXTENDED|REMASTERED|UNRATED|CHRONO|THEATR[Ii]
    CAL|DC|SE|UNCUT|uncut|[Ii]NTERNAL|[DS]UBBED|SAT\d)([\]\)\}]|\b)(-[^\s]+$)?)


    The log files will show you exactly what filename the scrapers end up working with (after noise filter), so you can verify if it needs additional adjustments.
     

    armandp

    Retired Team Member
  • Premium Supporter
  • April 6, 2008
    987
    623
    Zoetermeer
    Home Country
    Netherlands Netherlands
    Note: You don't need to specify lower case and uppercase in the part before (?-i). The expression is by default case insensitive. The (?-i) modifier signals that the pattern to the right should be taken with case sensitivity... to turn on case insensitive again you can use the (?i) modifier.
     

    Guzzi

    Retired Team Member
  • Premium Supporter
  • August 20, 2007
    2,159
    750
    AW: Re: AW: Noise Filter Exchange

    [...]
    So yes, it is crucial you attempt to parse the year. And MovingPictures plugin doesn't care for 2010, (2010), or [2010], it looks for a 4 digit number at the end of the filename.
    [...]

    Thanks for the quick answer - "filename" would be "foldername", if selected in the plugin, right?

    Anyway, I think it's best to follow up in the log what's happening to b sure - I will setup a DB with just 1 or 2 Movies to check it - otherwise too much data going too fast to follow for me ;-)

    Thanks for your support,
    Guzzi
     

    Users who are viewing this thread

    Top Bottom