Original title (1 Viewer)

Lightning303

MP Donator
  • Premium Supporter
  • September 12, 2009
    798
    577
    Home Country
    Germany Germany
    After your post, i felt challenged to try my own luck, if i could get it to work.

    As i see it, atleast on IMDB Germany, it shows me always the german title. If there is no special german title it shows me the original title. Which as i understand, is the title the film makers gave the movie. Whenever it shows me not the original title, but the german one, the original title is on the releaseinfo page in the section for akas the first one.
    As i said i wanted to try my own luck, so i made a little php script to get the original title. From my tests, it looks like it is always working. Of course i dont know if it will work for somebody in the us (maybe you can test?) or in other countries.

    Here my php script
    PHP:
    <?php
    	/*
    	Proof of concept to get original title based on imdb id.
    	Just tested from Germany. Possible that this does not work in other countries.
    	
    	Lightning303, 06 May 2013
    	*/
    
    	$MovingPicturesDB = "H:/movingpictures.db3";
    	$MovingPicturesDB = "sqlite:".$MovingPicturesDB;
    	try {
    		$dbh = new PDO($MovingPicturesDB);
    	}
    	catch(PDOException $e){
    		echo $e->getMessage();
    	}
    	$Start = 0;
    	$Limit = 25;
    	$MovPicDB = $dbh->query("SELECT id, imdb_id, title FROM movie_info ORDER BY id LIMIT ".$Start.", ".$Limit."");
    	$x = 0;
    	while($MovPicDBData = $MovPicDB->fetch(PDO::FETCH_ASSOC)) {
    		// Title in MP DB and IMDB ID
    		echo $MovPicDBData["title"]." -> ".$MovPicDBData["imdb_id"]." -> ";
    		// Getting information from imdb page
    		$IMDBData = file_get_contents("http://www.imdb.com/title/".$MovPicDBData["imdb_id"]);
    		// Searching for position of the title
    		$TitlePos = strpos($IMDBData, "<span class=\"itemprop\" itemprop=\"name\">");
    		// Cutting off everything before the first title
    		$TitleTemp = substr($IMDBData,($TitlePos + 39),500);
    		// Getting position of the end of the title
    		$TitleTempPos = strpos($TitleTemp, "</span>");
    		// Seperating the title
    		$TitleToUse = substr($TitleTemp,0,$TitleTempPos);
    		// Searching for original title and its position
    		$OriginalTitlePos = strpos($TitleTemp, "<span class=\"title-extra\" itemprop=\"name\">");
    		if ($OriginalTitlePos) {
    			// Cutting off everything before the first original title
    			$OriginalTitleTemp = substr($TitleTemp,($OriginalTitlePos + 56),250);
    			// Getting position of the end of the original title
    			$OriginalTitleTempPos = strpos($OriginalTitleTemp, "<i>(original title)</i>");
    			// Seperating the original title
    			$TitleToUse = substr($OriginalTitleTemp,0,($OriginalTitleTempPos-14));
    		}
    		echo $TitleToUse."<br>\n";
    		$x = $x + 1;
    		if ($x == $Limit) { break; }
    	}
    ?>

    To test it, you just would have to change the destination of the movingpietures db.

    Example of the results
    Code:
    This Is It -> tt1477715 -> This Is It
    The Hurt Locker -> tt0887912 -> The Hurt Locker
    The Girl I: with the Dragon Tattoo -> tt1132620 -> Män som hatar kvinnor
    Minority Report -> tt0181689 -> Minority Report
    Avatar -> tt0499549 -> Avatar
    Up in the Air -> tt1193138 -> Up in the Air
    Surrogates -> tt0986263 -> Surrogates
    The Stepfather -> tt0814335 -> The Stepfather
    The Game -> tt0119174 -> The Game
    Nine Miles Down -> tt0812352 -> Nine Miles Down
    Nine -> tt0875034 -> Nine
    Gamer -> tt1034032 -> Gamer
    The Boondock Saints II: All Saints Day -> tt1300851 -> The Boondock Saints II: All Saints Day
    Armageddon -> tt0120591 -> Armageddon
    Goodfellas -> tt0099685 -> Goodfellas
    Ice Age III: Dawn of the Dinosaurs -> tt1080016 -> Ice Age: Dawn of the Dinosaurs
    Schindler's List -> tt0108052 -> Schindler's List
    The Code -> tt1112782 -> Thick as Thieves

    Note, that the titles in front are from my db, scraped with force english and rename db. Also, "Män som hatar kvinnor" included ;p

    Hope that helps :)
     
    Last edited:

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    You are seeing it from a single user, single location situation. I have no problem making IMDb+ work then. Now add in the fact that IMDb serves different HTML content, GeoIP mixture, not all languages in AKA, etc, etc and you will find out it is not as easy anymore.

    Anyway, I have a plan of attack, just need to find the time. If you are able to do what you did in PHP, you should look at: http://moving-pictures.tv/wiki/Scraper_Scripts and learn how scraper-scripts work.

    Then look at: http://imdbplus.googlecode.com/svn/trunk/Scraper/IMDb+.Scraper.SVN.xml and you can help me fix it :cool:
     

    Lightning303

    MP Donator
  • Premium Supporter
  • September 12, 2009
    798
    577
    Home Country
    Germany Germany
    Yes you are right, i am seeing it from a single location.
    First, i reworked my little script, and now it is getting the information from the main imdb page and not the aka section. It also now gets the "main title", however always chooses original if available. I updated my post.

    So i used some webproxies to look what the imdb site looks like when having another ip (spain, france, usa, russia, turkey, mexico, argentina) and looking at the html sources, the information was always at the same location, always in the same format.
    Could you point me in the direction how i can get an imdb page that has a different html code etc? Maybe web proxy is not enough?

    I would like to get into scraper scripting and help you. I will have a deeper look at it on the weekend. But seeing all these regular expressions makes my brain hurt, never really understood them :(.
    Understanding the exact problem however first, would be good.

    Again, thanks for your patience :)
     

    Users who are viewing this thread

    Top Bottom