Possible Improvement when scanning video database (1 Viewer)

kev160967

Portal Member
November 8, 2006
16
0
57
Home Country
England England
Hi,
I'd noticed that the IMDB scan was having trouble with a lot of my movies, so I took a look at what was going on. One thing I spotted was that FuzzyMatch in IMDBFetcher.cs was being asked to match files with titles like "Stuart Little 2.AVI" (hey, I have kids :) ) against strings returned from IMDB like "Stuart Little 2 (1996) (imdb)". This was screwing up the matching results. It seemed daft to include the (imdb) in the match, and also I rarely had a year in my file names, so that threw things off as well. I added the following function and import:

using System.Text.RegularExpressions;

private string StripNameAndIMDB(string title)
{
if (title.Trim().EndsWith("(imdb)"))
title = title.Substring(0, title.LastIndexOf("(imdb)"));
Regex bracketedYear = new Regex("[(]\\d{4,4}[)]$");
title = bracketedYear.Replace(title.Trim(), "");
return title;
}

and used it to preprocess the strings being passed into Levenshtein.Match. I got a huge improvement in accuracy.

As it stands, this throws away the year even if it is included in the filename being matched, so it could do with a tweak to only lose the year if the filename doesn't include one.

Any comments?
 

kev160967

Portal Member
November 8, 2006
16
0
57
Home Country
England England
Hmm, I notice on further playing with the sources that the year and the (imdb) tag are added deliberately, rather than coming back from IMDB. Anyone know why? I can see that the year might appear in the file name (though if it doesn't it's better removed from the movie being tested), but the "(imdb)" surely can't do anything but confuse the matching process?

I've made a few changes to the routine:

private string StripNameAndIMDB(string title, string search)
{
if (title.Trim().EndsWith("(imdb)"))
title = title.Substring(0, title.LastIndexOf("(imdb)")).Trim();
Regex bracketedYear = new Regex("[(]\\d{4,4}[)]($|.)");
if (search.Trim().Length!=0 && bracketedYear.IsMatch(search)==false)
title = bracketedYear.Replace(title.Trim(), "");
return title.Trim();
}

I now pass search in to check whether or not the filename has a year in it, and to leave it in the comparison string if it does - pass an empty string in if I'm stripping the filename. I also needed to change the RegExp as it wasn't spotting the year in things like "Movie Name (2005).CD1.AVI", and was consequently removing it from the IMDB result, but leaving it in the filename.

Kev
 

Topher5000

Portal Pro
April 5, 2006
179
0
Hi kev.
I too have problems doing imdb scans. Most movies show up, some don't. I have to do a manual search & for only one word in the title & then pick the appropriate one one from the results. Of course, this doesn't work if I scan the folder. This is one reason I haven't upgraded from 0.2.2.0 yet.
How do I get your routine to work? I'm no programmer, but I'd like to try it.
 

kev160967

Portal Member
November 8, 2006
16
0
57
Home Country
England England
Hi,
I hadn't really thought about getting this out in the wild yet, but you'd certainly be welcome to try it. I'll have to set something up, as I'm also making some changes to music scanning, which are only partially complete. Shouldn't be a big deal - I'll get back to you later this evening when I've had a chance to take a look

Kevin
 

Users who are viewing this thread

Top Bottom