[Videos] - imdb script and patch to database.dll

Discussion in 'Improvement Suggestions' started by draclich, April 27, 2011.

  1. draclich

    draclich Portal Member

    Joined:
    March 4, 2007
    Messages:
    21
    Likes Received:
    3
    Location:
    melb.vic.au
    Ratings:
    +3 / 0
    Home Country:
    Hi,

    I have had good success with this amended script and code changes in reducing the number of conflicts to around 2% failure rate. This was from over 600 titles searched.

    I have used the imdb MP1.2x script sections for the .GetDetails() etc without modification. The bulk of the changes are to the FindFilm section where I only return popular if there are known then exact results. This has one annoying effect in that of that 2% failure rate, you can't match to all shows and will need to use the tt# from imdb to match.

    I addition the check for video game or tv series wasn't working and adding the entry to the possible list. The regex has been modified extract the year and extra details separately. I have ignored the aka entries as they didn't improve my matches (for english) adding that code back should be straight forward enough if necessary.

    The changes to imdb.cs are to strip out the trailing year for the search string as it doesn't help only hinder the results. Finally the changes to imdbFetcher.cs are to display the movie title the whole time in the dialogue box and perform the stripping on the string used in the fuzzy match.

    Overall I'm extremely happy with the hit rate. The only thing missing that I'm thinking about doing is when its a manual look up return both popular and exact results. The auto match has issues with both and the hit rate is much worse with both lists. Note that this result was before I had totally fixed the tv & vg issue so it could be better now.

    Cheers


     

    Attached Files:

    • imdb.patch
      File size:
      8.5 KB
      Uploaded:
      April 27, 2011
      Views:
      150
    • imdb.script.patch
      File size:
      26.6 KB
      Uploaded:
      April 27, 2011
      Views:
      143
  2. Google AdSense Guest Advertisement



    to hide all adverts.
  3. Deda
    • Team MediaPortal

    Deda Lead Dev MP1 Videos

    Joined:
    March 18, 2009
    Messages:
    2,423
    Likes Received:
    2,098
    Gender:
    Male
    Occupation:
    IT Consultant
    Location:
    Zagreb
    Ratings:
    +2,385 / 1
    Home Country:
    Croatia Croatia
    Show System Specs
    Hi draclich thx for your work I will surely look what you have done. Strangely we both on the same time try to improve hit accuracy. I reworked a little IMDB scrapper (you can fetch via Update from config) and for testing purpose I create TMDB scrapper (it's very very fast because of using API but TMDB lacks from extra infos and many movies doesn't have IMDBtt number which is very important for managing movies and other features like fanarts and covers - surely it works without it but with lesser accuracy).

    Also did some minor tweak in clearing searching text and Levenshtein method passing string from unwanted words.

    Concerning AKA results, I don't see the point in them because for direct hits there is no way that you can see them so they are pointless and just making noise in results. Will see with the team if we can remove them from script or I will create clean IMDB script without them.

    Anyway, you can try the patch with updated scripts and see if there is better results, also I will test your work so we can join best from both works.

    Also, I making 2nd major rework for 1.3.0 that finally My Videos will look like fun place to be for non-movie maniacs like I am :).

    I will move this thread to development-Improvement suggestion forum because it's more proper place because of patches.
     

    Attached Files:

  4. draclich

    draclich Portal Member

    Joined:
    March 4, 2007
    Messages:
    21
    Likes Received:
    3
    Location:
    melb.vic.au
    Ratings:
    +3 / 0
    Home Country:
    Ran you patch against TMDB with 200 entries it ended with 29 conflicts, some where due to TMDB returning exact matches aka duplicate titles vs IMDB popular which would only have the most recent. Having a closer look at those conflicts it was clear that the replacement of remove after with the regex has caused a large jump due to the less than standard naming convention used. At least with the remove after you can get rid of any unusual strings further on in addition to handling lack of characters separating words.

    Added [Ii]NT to the regex and you might need to handle the various permutations of read nfo as well. In addition the expected - before the end doesn't always happen.
     
  5. Deda
    • Team MediaPortal

    Deda Lead Dev MP1 Videos

    Joined:
    March 18, 2009
    Messages:
    2,423
    Likes Received:
    2,098
    Gender:
    Male
    Occupation:
    IT Consultant
    Location:
    Zagreb
    Ratings:
    +2,385 / 1
    Home Country:
    Croatia Croatia
    Show System Specs
    Thx for you report, I tested your patches and it works pretty impressive but as you said with some minor problems (there is no way to create something perfect so it's not your fault), script will never return results for original movies vs remakes ie.

    Journey to the center of the earth (1959) -> we need to use tt-number for such a case
    Journey to the center of the earth (2008)

    Also, I would like to leave numbers which looks like year in the search title (this can be cleaned in script if needed -> ie. for TMDB which doesn't like even if it have a movie with year in the title ie:

    Red Riding In The Year Of Our Lord 1974
    Code (Text):
    1. http://www.themoviedb.org/search?search=Red+Riding+In+The+Year+Of+Our+Lord+1974
    2. vs
    3. http://www.themoviedb.org/search?search=Red+Riding+In+The+Year+Of+Our+Lord
    4.  
    another good thing about year is that we can narrow search results for original MP IMDB script according to year (I implement that in latest scripts but it will work better after cleanup patch, especially for users who don't care for the naming convention and uses torrents filename -> I know that this is not our problem but we must think also on that).

    Next is a problem using #The Internet Movie Database (IMDb) vs #akas.imdb.com. In your case #The Internet Movie Database (IMDb) (English spoken countries) will work wonderful but for other countries it will return in most cases local movie name, akas will return English titles except foreign movies and for those cases TMDB is used to try to help (will work only if there is tt-number in their database).

    Anyway, thx for IMDBfetcher.cs part for showing Movie title in the dialog box, concerning IMDB.cs part I would like if you can test with my patch (practically you did the same except search string cleaning part) and your script (I modified it a little for handling years outside of MP code and change to search akas.imdb.com), I'm curious about conflict rate and what titles you get on the end, just trying to find something which can satisfy most of the users.

    And finally, if you will have time :) to use latest IMDB MP script (in attachment) and see the results with patched MP. It should work the best with filenaming like:
    Moviename (year) -> will narrow search result to year (+- 1)
    tt1234567 ->this should be direct hit
    Moviename tt1234567 ->this should be direct hit

    but even something like this should work:
    2012 2009
    Red.Riding.In.The.Year.Of.Our.Lord.1974.2009.720p.BluRay.x264
    Salt.2010.R5.V2.XviD
    Alien (1979)DC
     

    Attached Files:

  6. disaster123
    • Premium Supporter

    disaster123 MP Donator

    Joined:
    May 14, 2008
    Messages:
    3,546
    Likes Received:
    417
    Ratings:
    +431 / 2
    Home Country:
    Germany Germany
    AW: imdb script and patch to database.dll

    Deda
    could you tell us a little bit more about your attachments. What have you modified and which one should we try
     
  7. Deda
    • Team MediaPortal

    Deda Lead Dev MP1 Videos

    Joined:
    March 18, 2009
    Messages:
    2,423
    Likes Received:
    2,098
    Gender:
    Male
    Occupation:
    IT Consultant
    Location:
    Zagreb
    Ratings:
    +2,385 / 1
    Home Country:
    Croatia Croatia
    Show System Specs
    ScrapperAccuracy.patch is a minor fix for current 1.2.0 Beta (I hope it will go for 1.2.0 RC) which improves clearing search text from video file before it goes to script (basically it's a Moving Pics noise filter regex with some adaption which replaced tons of same search&replace code but with one line) and matching Levensthein method (method which compares strings and try to guess the movie from filename and results from script).Simply, it's the same thing as draclich did but it lefts year in the filename for later processing in scripts (I could do it in the MP code, but then consequences is that all other MP scripts will stop working in 1.2.0 Beta)

    IMDB_draclich_modified.rar is draclich script modified a little for above patch.

    IMDB_MP12x_ORIGINAL.rar is a script currently available for MP 1.2.0 Alpha-Beta (you can also fetch it via update from Configuration). Improvements are in year extraction from filename and narrowing search results according to movie year from IMDB.

    TMDB_NEW_VERSION.rar is a script for TMDB (The open movie database) which I made recently to test some things with fetch speed (it's just a showcase how fast fetching can be when using API but results can be poor because TMDB database lacks from movie entries and their infos). It should work for all MP versions but will do a job better with above patch.
     
    • Like Like x 3
Loading...

Users Viewing Thread (Users: 0, Guests: 0)

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice
  • About The Project

    The vision of the MediaPortal project is to create a free open source media centre application, which supports all advanced media centre functions, and is accessible to all Windows users.

    In reaching this goal we are working every day to make sure our software is one of the best.

             

  • Support MediaPortal!

    The team works very hard to make sure the community is running the best HTPC-software. We give away MediaPortal for free but hosting and software is not for us.

    Care to support our work with a few bucks? We'd really appreciate it!