- March 10, 2006
- 4,434
- 1,897
- Thread starter
- Moderator
- #41
analyzing your explanation, i can understand that the 5.1 is causing an issue as it identifies it as "maybe a year".
am i right? any way of keeping the digits only if they come in the form of 4 consecutive digits?
Just add more RegExp noise filters.
\s?\[Spanish.+?\]|\s?\[\D+\]|......
The "\s?\[Spanish.+?\]" part is more vigurous.
\s = space
? = previous RegExp is optional
\[ = look for '[' character, the \ is needed to escape, because [ is used for Regular Expression definitions
Spanish = look for 'Spanish'
. = match any character
+? = keep looking for the previous RegExp, but stop on first match of next character
\] = look for ']' character
So it will grab " [Spanish.......]" and "[Spanish.......]" not caring at all what follows Spanish. But since the [ and ] chars have to be around it, it will not harm a movie title such as "The Spanish Prisoner".