Hi RoChess,
There might be a way to force IMDB to reply in a consistent, language nonspecific way, by using the Accept-Language in the HTTP header of the request to the IMDB.
You can easily test it when using IE, just go to the Settings -> Internet Options -> General tab -> Languages and set the language preferences to receive pages in a specific language if they exist on the server. If set to English-US, it actually forces the IE to use the specific Accept-Language field in the header, which forces the IMDB to reply with English titles instead of trying to guess the location of the user by IP address and translate them to a local language.
I tried it with
http://akas.imdb.com/find?s=tt&q=Scent of a Woman and
http://akas.imdb.com/title/tt0105323/releaseinfo#akas
and it works like a charm! I'm located in Croatia and when I remove all language preferences, I get Croatian tranlations for movies from IMDB. But, with English language requested, the magic happens
If there is a way for you to change headers in your HTTP requests to aka.imdb.com from within the scraper, you can then force the retrieval of the page in English language, which greatly reduces the guesswork on your side and eliminates the need for a lot of parsing and magic - at least everyone around the globe will receive same exact response as you do in the US. Unfortunately, it does not work the other way - if I set the Accept-Language to, say, de, I still get the page with Croatian titles (and not German, even though the server says Content-Language: de in response to my request).
Hope this helps a bit - as of recently I keep getting messed-up (translated) titles no matter what setting I use
There might be a way to force IMDB to reply in a consistent, language nonspecific way, by using the Accept-Language in the HTTP header of the request to the IMDB.
You can easily test it when using IE, just go to the Settings -> Internet Options -> General tab -> Languages and set the language preferences to receive pages in a specific language if they exist on the server. If set to English-US, it actually forces the IE to use the specific Accept-Language field in the header, which forces the IMDB to reply with English titles instead of trying to guess the location of the user by IP address and translate them to a local language.
I tried it with
http://akas.imdb.com/find?s=tt&q=Scent of a Woman and
http://akas.imdb.com/title/tt0105323/releaseinfo#akas
and it works like a charm! I'm located in Croatia and when I remove all language preferences, I get Croatian tranlations for movies from IMDB. But, with English language requested, the magic happens
If there is a way for you to change headers in your HTTP requests to aka.imdb.com from within the scraper, you can then force the retrieval of the page in English language, which greatly reduces the guesswork on your side and eliminates the need for a lot of parsing and magic - at least everyone around the globe will receive same exact response as you do in the US. Unfortunately, it does not work the other way - if I set the Accept-Language to, say, de, I still get the page with Croatian titles (and not German, even though the server says Content-Language: de in response to my request).
Hope this helps a bit - as of recently I keep getting messed-up (translated) titles no matter what setting I use