Hi,
I'd noticed that the IMDB scan was having trouble with a lot of my movies, so I took a look at what was going on. One thing I spotted was that FuzzyMatch in IMDBFetcher.cs was being asked to match files with titles like "Stuart Little 2.AVI" (hey, I have kids ) against strings returned from IMDB like "Stuart Little 2 (1996) (imdb)". This was screwing up the matching results. It seemed daft to include the (imdb) in the match, and also I rarely had a year in my file names, so that threw things off as well. I added the following function and import:
using System.Text.RegularExpressions;
private string StripNameAndIMDB(string title)
{
if (title.Trim().EndsWith("(imdb)"))
title = title.Substring(0, title.LastIndexOf("(imdb)"));
Regex bracketedYear = new Regex("[(]\\d{4,4}[)]$");
title = bracketedYear.Replace(title.Trim(), "");
return title;
}
and used it to preprocess the strings being passed into Levenshtein.Match. I got a huge improvement in accuracy.
As it stands, this throws away the year even if it is included in the filename being matched, so it could do with a tweak to only lose the year if the filename doesn't include one.
Any comments?
I'd noticed that the IMDB scan was having trouble with a lot of my movies, so I took a look at what was going on. One thing I spotted was that FuzzyMatch in IMDBFetcher.cs was being asked to match files with titles like "Stuart Little 2.AVI" (hey, I have kids ) against strings returned from IMDB like "Stuart Little 2 (1996) (imdb)". This was screwing up the matching results. It seemed daft to include the (imdb) in the match, and also I rarely had a year in my file names, so that threw things off as well. I added the following function and import:
using System.Text.RegularExpressions;
private string StripNameAndIMDB(string title)
{
if (title.Trim().EndsWith("(imdb)"))
title = title.Substring(0, title.LastIndexOf("(imdb)"));
Regex bracketedYear = new Regex("[(]\\d{4,4}[)]$");
title = bracketedYear.Replace(title.Trim(), "");
return title;
}
and used it to preprocess the strings being passed into Levenshtein.Match. I got a huge improvement in accuracy.
As it stands, this throws away the year even if it is included in the filename being matched, so it could do with a tweak to only lose the year if the filename doesn't include one.
Any comments?