Rename system rules + scraper-script priority + Folder bug (1 Viewer)

VdR

MP Donator
  • Premium Supporter
  • October 17, 2006
    612
    16
    Belgium
    Home Country
    Netherlands Netherlands
    This is a really great addition, specially the grouping function. Now f.i. 'Raiders of the Lost Ark' finally sorts together with the other Indiana Jones movies :). Thanks a lot.

    I did find a few surprises when it comes to grouping, maybe a suggestion in here for your renaming database.

    X-men came out as follows:
    'X-men II: X-men United', 'X-men III: The last stand', but then' X-men Origins: Wolverine' and 'X-men: First Class'.

    And 'Hulk' did not sort together with 'The Incredible Hulk'.

    VdR
     

    aquilan

    MP Donator
  • Premium Supporter
  • March 29, 2007
    9
    1
    Home Country
    England England
    Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.8

    Hi RoChess,

    Firstly, I must say that this is a great addition to the Moving Pictures plgin features and has answered my prayers when it comes to Japanese titled films with the foreign film naming options. It was an issue in the original IMDB scraper which bugged me greatly as whilst it often renamed them correctly to an English title it would always leave the 'Sort By' name as the original foreign title so the films would not appear in the correct alphabetical order.

    I don't know if anyone has reported it so forgive me if it's already been mentioned but for me at least it would appear that the IMDB naming bug is still present (or has returned) because I found that some of the new titles in my library were added to the database with the header "IMDB" in front of them so won't sort correctly (it appears on both the 'title' and 'sort by' in the Moving Pictures configuration screen). When choosing to refresh them in the configuration screen using the MovieDB then in all instances the films affected were properly renamed.

    I noticed far earlier in this thread that you responded once to a user reporting this problem as having been solved in an earlier release of your IMDB+ plugin so I'm not sure whether it has returned or never fully been cleared up. But if this happens as a result of IMDB changing their code then rather than people like yourself having to play catch-up with them constantly would it be possible to code it within the plugin to remove any instances where it finds "IMDB" within a naming convention? I'm no coder so my apologies if this is a daft suggestion but I can only imagine how much of a pain it must be for people like yourselves to try and keep scrapers running as perfectly as possible whilst being at the mercy of third party databases you have no control over.

    Finally, I would like to make a possible suggestion regarding the excellent grouping function you feature within the IMDB+ plugin. When it renames a set of films as I, II, III etc, is it possible for it not to add the numbering to the first film in a series? I know this may sound picky but it's not conventional for the first releases in a series of films to use any kind of numbering denomination within their titles so it tends to look odd. And if you happen to only have the first film in a set then it especially stands out. For instance, I have the film 'Wild Things' in my Moving Pictures library but as there was a straight to video (and likely little watched or even known about sequel) then the scraper titles it 'Wild Things I' which makes it look odd. As I said though, this is likely me being overly picky so my apologies for bringing it up if it seems that way.

    Thanks again and keep up the good work. It's the work of developers like yourself making such great extra features like your plugin which make MediaPortal such a great media player for us all.
     

    zicoz

    MP Donator
  • Premium Supporter
  • September 3, 2006
    896
    63
    Home Country
    Norway Norway
    Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.8

    Almost everything seems to be working perfectly here, there is only one problem and that is BR-folders. I don't know if this is a scraper problem or a problem with MovingPictures

    I have some BR-disks ripped in a folder-format

    My folders are all named "title (year)"

    But that's not what shows up in the "Title/Keywords" field when I import the movies, and example is the movie "Drive Angry"

    My folder is named "Drive Angry"

    But the Title/Keywords field says: "Drive Angry Blu-rayTM

    I'm guessing that is the disk-name, but is there a way to have it search using the folder name instead?
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.8

    X-men came out as follows:
    'X-men II: X-men United', 'X-men III: The last stand', but then' X-men Origins: Wolverine' and 'X-men: First Class'.
    And 'Hulk' did not sort together with 'The Incredible Hulk'.

    I follow the 'movieconnections' done at imdb.com as close as possible.

    X-men: First Class = IMDb movieconnections for 'First Class'

    As you can see there is no connection to the X-Men trillogy, but there is a connection to the upcoming sequel. So in less then 3 years, I'll adjust the title to become "X-men: First Class I" and then we will have "X-men: First Class II: Some Subtitle". The movies still group alongside the other X-Men movies and are not mixing themselves inside the grouping. The latter would be a mistake and those I will fix if you come across them.

    Some series however pose a real problem and "The Incredible Hulk" is one of them. See I can rename it to "Hulk (Remake): The Incredible Hulk", so that it groups alongside, but if you take a look at the movie connections page, you will see that they plan to make it a trilogy.

    So I had pushed it aside, but will look at a solution. What I was toying with before was something like:


    Avengers: The Incredible Hulk (sortby "Avengers 01")
    Avengers: The Avengers (sortby "Avengers 02")
    Avengers: Nick Fury (sortby "Avengers 03")

    But then 'Thor', 'Captain America' (which also gets sequal in 2014) and others totally mess things up again.

    The real solution might lie in the upcoming feature of custom IMDb+ driven categories, because then it is easy to make a custom one say for example "Marvel Superheroes" or wider "Comicbook Adaptions", or even smaller with just "The Avengers" and place all the related movies in there. But it will be a little while longer before that function will be done.

    I don't know if anyone has reported it so forgive me if it's already been mentioned but for me at least it would appear that the IMDB naming bug is still present

    Please provide me with exact folder (if you have one) and filename that is causing the problem, so I can reproduce. I also need to know if you adjusted any IMDb+ scraper-script settings via IMDb+ plugin, because otherwise I could end up testing wrong and not be able to reproduce it. You can otherwise goto start menu, find the MediaPortal entries and open the "User Files" folder via the shortcut. Then open the "IMDb+" subfolder and find the options XML file and attach it to your reply, so I can use that to duplicate your setup.

    As for playing catchup, that is one of the whole reasons the IMDb+ plugin has that option now to auto-update the IMDb+ scraper-script, because right now we have over a 1000 users and if one person (or myself), find a problem and reports it in such a way so that I can reproduce and fix, then most of the other users will not even realize something broke, because by then they will have already received the update.

    As for the convention to capture anything IMDb related, that's pretty much what I tried to code already, but computers unfortunatly do not posses AI, so it's still a cat and mouse game with playing catchup and improve as much as possible.

    Your other suggestion to not add "I" to the first title is already in the ToDo list. I actually made a post a while ago outlining what I'm working on, and it should be in there. But those type of things do end up getting lost in the sea of posts. I'm just still stuck at the monster task of those 350 titles, but once that finishes up, I can get back to improving the scraper-script with that option as well as many others.

    PS: I've created Issue #30 for this to keep me reminded and have already started finished it, so should make it into next update.

    Almost everything seems to be working perfectly here, there is only one problem and that is BR-folders. I don't know if this is a scraper problem or a problem with MovingPictures
    I have some BR-disks ripped in a folder-format
    But the Title/Keywords field says: "Drive Angry Blu-rayTM

    Yesterday I discovered a rare-parsing bug in MovingPictures: rare parsing bug, issue #1062

    So perhaps it relates to the same problem. You can either debug it yourself and help me, but you would need to know how to work debug mode and in some cases scraper-debug mode to see exactly where it goes wrong (plugin or scraper-script). Now I don't mind doing that, but then I need to be able to reproduce your problem on my system.

    In order to do that, I would need to import the same folder+files you did. Now the benefit is that the problem is not with MediaInfo (I hope), so I can simply make 0-byte test files that only duplicate the filenames that MovingPictures uses for import. So please provide me with something like:

    Folder Name
    Folder Name\Media File 1.ext
    Folder Name\Media File 2.ext
    Folder Name\Media File 3.ext
    etc.​

    :D :D :D
     

    ltfearme

    Community Plugin Dev
  • Premium Supporter
  • June 10, 2007
    6,751
    7,196
    Sydney
    Home Country
    Australia Australia
    Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.8

    Finally, I would like to make a possible suggestion regarding the excellent grouping function you feature within the IMDB+ plugin. When it renames a set of films as I, II, III etc, is it possible for it not to add the numbering to the first film in a series? I know this may sound picky but it's not conventional for the first releases in a series of films to use any kind of numbering denomination within their titles so it tends to look odd. And if you happen to only have the first film in a set then it especially stands out. For instance, I have the film 'Wild Things' in my Moving Pictures library but as there was a straight to video (and likely little watched or even known about sequel) then the scraper titles it 'Wild Things I' which makes it look odd. As I said though, this is likely me being overly picky so my apologies for bringing it up if it seems that way.

    I think a setting to control that would be awesome, definitely something I would like to see as well.
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.8

    I think a setting to control that would be awesome, definitely something I would like to see as well.

    I already had it on my ToDo list when you asked and was going to implement it right after finishing those darn 350 movie titles. But I needed a break anyway, so had a look at it and was able to make it work very easy.

    Worked perfect, until....

    The darn article removal stuff threw a wrench in the gears. I couldn't just use the original rename database title for the sortby and strip the 'I' from the first movie, because then if a user has the article removal setting enabled; "The Matrix I" would end up with new title of "The Matrix" and sortby="The Matrix I", which then sorts it under the 'T' and not the 'M' as all the other Matrix movies do in that case.

    So I made it work via a dirty work around, you first import the movie and it uses "The Matrix I", the MovPic plugin then assigns this the article removed sortby of "matrix i the". Then you refresh the movie and I'm able to correct the title into "The Matrix" and keep the correct "matrix i the" sortby entry.

    I'm just not liking the whole extra-refresh workaround, so it looks like I'll be forced into adding my own article removal system. Reason being is because at the scraper-script level I do not have access to the MovingPictures advanced configuration setting of the article-removal entries, so will have to create a seperate one.

    At least I can then enhance it with some ideas I've had regarding the article removal system to make it more international-friendly. Right now there is a single default setting of "the|a|an|ein|das|die|der|les|la|le|el|une|de|het", which combines English, German, French, Spanish and Dutch articles into one.

    The Germans indeed use 'Die' the same way 'The' is used in English, but this then means 'Die Hard' gets sorted under 'H', unless you modify this advanced setting to just "the|a|an". Since there is now foreign language support in IMDb+, it would be possible for me to integrate this in a new article removal system and make it more flexible, such as detecting spoken language of movie and then only use the article remove settings for that language.

    That does require a lot more adjustments to both IMDb+ scraper-script and IMDb+ plugin, so at first I'm just going to make it alike to how the Advanced scraper script option works for the country+language filter system, and leave it up to the user to edit it for the time being.
     

    zicoz

    MP Donator
  • Premium Supporter
  • September 3, 2006
    896
    63
    Home Country
    Norway Norway
    Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.8

    Yesterday I discovered a rare-parsing bug in MovingPictures: rare parsing bug, issue #1062

    So perhaps it relates to the same problem. You can either debug it yourself and help me, but you would need to know how to work debug mode and in some cases scraper-debug mode to see exactly where it goes wrong (plugin or scraper-script). Now I don't mind doing that, but then I need to be able to reproduce your problem on my system.

    In order to do that, I would need to import the same folder+files you did. Now the benefit is that the problem is not with MediaInfo (I hope), so I can simply make 0-byte test files that only duplicate the filenames that MovingPictures uses for import. So please provide me with something like:

    Folder Name
    Folder Name\Media File 1.ext
    Folder Name\Media File 2.ext
    Folder Name\Media File 3.ext
    etc.​

    :D :D :D


    This only happens with Bluray folders for me, and those contains alot of of files and folders, one of the movies it happens with is Indiana Jones and the Kingdom of the Crystal Skull (2008), when I'm importing that the keyword/title it uses is something like Indiana Jones IV Disk I or something similar to that.

    I could provide you with a 0-byte test file, but considdering the folder has 422 files and 30 folders inside it, it might be better for me to provide som logs to begin with atleast. :)

    But like I said, I believe the problem is that it uses the name of the disk/the disk ID, that's the only place where some of the names makes sense to me atleast, I'll try to provide you with some logs asap.


    Ok, these are the logs for me doing the following:

    - Installing Mediaportal+Moving Pictures and IMDB+
    - Open Moving Pictures configuration and making IMDB+ the only scraper
    - Adding a new folder to watch that only containst the Indiana Jones KotCS bluray in folder structure.
    - Getting "Indiana Jones IV Disc 1" as title/keyword.
    - Changing that to Indiana Jones and the Kingdom of the Crystal Skull
    - Getting the correct results.

    View attachment BR-Issue.zip
     

    aquilan

    MP Donator
  • Premium Supporter
  • March 29, 2007
    9
    1
    Home Country
    England England
    Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.8

    RoChess,

    Sorry for the delay in replying with the information you requested but as a friend of mine who I setup with MediaPortal has a much larger film library than me I thought it would be better if I ran IMDB+ on their setup and gave you the feedback from that to give a better indication of the titles being affected by the IMDB naming issue.

    With regards the configuration, the IMDB+ plugin was setup with all plugin GUI options left as standard and using the hidden 'Force' and 'Refresh all titles' options in the side menu to refresh the whole library via IMDB+. I've also attached the Option XML file so you can verify the configuration used for yourself.

    Here are a list of films (I've written them by they're actual given Windows folder/filename to help you with verification of my findings) that were renamed incorrectly with "IMDB -" appearing in front of both the Moving Pictures plugin 'Title' and 'Sort By' fields for each film after refreshing them via IMDB+. I should also point out that the majority of these films were already incorrectly titled with IMDB in front of their names by the standard IMDB scraper.

    2 Fast 2 Furious (2003)
    300 (2006)
    Avatar (2009)
    Dark City (1998)
    Fanboys (2008)
    Galaxy Quest (1999)
    Harry Potter And The Order Of The Phoenix (2007)
    Independance Day (1996)
    Jumper (2008)
    Kiss Kiss Bang Bang (2005)
    Sahara (2005)
    The Bounty Hunter (2010)
    The Dark Knight (2008)
    The Rocker (2008)
    Transformers (2007)
    Transformers: Revenge Of The Fallen (2009)
    V For Vendetta (2006)
    Watchmen (2009)

    Of these, the following titles are ones I also have on my own system but which are renamed correctly without the IMDB tag in front of their titles. The only difference here is that I haven't had MediaPortal on since your very latest update (the one in which you said IMDB+ now removed the first numeral 'I') so I can only assume this discrepancy has been caused by the latest update and is due to the new numeral renaming issue with respect to films featuring subtitles in their name:

    2 Fast 2 Furious (2003)
    Avatar (2009)
    Sahara (2005)
    The Dark Knight (2008)
    The Rocker (2008)
    Transformers (2007)
    Transformers: Revenge Of The Fallen (2009)

    The plugin won't update on my own system when I boot up MediaPortal (have you removed the latest update by any chance?) so I can't refresh my library and check if the titles will be renamed the same as on my friends system. I also noticed that some titles were named differently on mine as they still used numerals with subtitled films. They were listed in the Moving Pictures library as you see below:

    The Fast And The Furious II: 2 Fast 2 Furious
    Batman VI: The Dark Knight
    Transformers I
    Transformers II: Revenge Of The Fallen

    Unfortunately I never made a note of what version my friends IMDB+ plugin was at so can't compare it too my own to see if they were at different versions but the differences I'm seeing in the film titles between our two systems with regards films we both have (and also identical filenames) is royally confusing.

    Apologies if this is all a bit confusing but I hope it's of some help. If you need any further information please let me know.
     

    Attachments

    • Options IMDb+ Scraper.rar
      438 bytes

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.8

    I could provide you with a 0-byte test file, but considdering the folder has 422 files and 30 folders inside it, it might be better for me to provide som logs to begin with atleast. :)

    Log files come in multiple variations:

    1. Info log files -> they roughly say what is going on, but they do not allow me to find out 'why'
    2. Debug log files (MediaPortal Config log mode changed to 'debug') -> gives me a much better idea as to why things happened
    3. Scraper-debug mode enabled (MovingPictures special option via 'Gear' icon inside importer tab in config) -> Allows me exactly to see everything that happened. However for 80% of the problems the regular debug mode is enough and much easier to read.

    IMPORTANT NOTE: MovingPictures is multi-thread based, meaning it will do multiple things at the same time. This makes log files extremly hard to read, so to make it easier on me import just *ONE* movie. On scraper-debug mode this is a nessessity for other reasons, because it generates almost 2MB of log PER movie.

    To illustrate multithead log files:


    1. Multi-threaded log line for Movie A
    2. Multi-threaded log line for Movie B
    3. Multi-threaded log line for Movie A
    4. Multi-threaded log line for Movie C
    5. Multi-threaded log line for Movie B
    6. Multi-threaded log line for Movie A
    7. Multi-threaded log line for Movie C
    8. Multi-threaded log line for Movie A
    9. Multi-threaded log line for Movie B
    10. Multi-threaded log line for Movie B
    11. Multi-threaded log line for Movie C
    12. Multi-threaded log line for Movie C
    1. Multi-threaded log line for Movie A
    2. Multi-threaded log line for Movie A
    3. Multi-threaded log line for Movie A
    4. Multi-threaded log line for Movie A

    As you understand, reading the 2nd log is much easier for me.

    You provided Info log files, which indeed tell me the steps you took, I see it failed to find a match, you manually approved it, it found movie, activated rename system and obtained artwork. But I can not see 'why' it did that, so please reproduce the problem and give me debug-log files (if I'm still unable to find cause I'll ask for scraper-debug ones). You do not even need to manually approve the movie, the moment MovingPictures shows you that blue info icon that awaits your manual approval, the log file at that moment already contains all I need (and is shorter, easier to read).

    The plugin won't update on my own system when I boot up MediaPortal

    Unfortunately I never made a note of what version my friends IMDB+ plugin was at so can't compare it too my own to see if they were at different versions but the differences I'm seeing in the film titles between our two systems with regards films we both have (and also identical filenames) is royally confusing.

    To see info about the IMDb+ plugin and IMDb+ scraper-script you have, use the hidden menu -> IMDb+ Info option.

    This is explained with pictures in the Wiki guide - Version Check (use 'Which version do I have now?' index, or scroll down a bit)

    Current release is IMDb+ plugin v1.3.0.188 and as you can see by the title of this thread the IMDb+ scraper-script is at v3.3.8 (changelog available near the end of first post).

    The chat I did in a few posts earlier about removal of 'I' for first movies in the series, is on the work I'm doing on my system to develop the next version. The problem is that I can't just update only the IMDb+ scraper-script anymore, these new improvements also require a new IMDb+ plugin. So I wanted to get as much done and tested before I release, otherwise you would have to download a new plugin every hour at the rate I'm testing :cool:

    As for the "IMDb - " prefix, this should not happen on the movies you listed. So what is happening is that MovingPictures decided to ignore IMDb+ being in first position and put the default imdb.com scraper back in first position. This means any new movie will not use the IMDb+ scraper, so you still end up with the old results that contained the prefix.

    It is a weird race condition that we are aware of and trying to fix. I've installed and tested like 10x and only once did imdb.com end up back at the top, but I was unable to figure out why. The solution we will probably go for is add an extra option to the "Force IMDb+" hidden menu option that will allow you to put the IMDb+ scraper-script back in 'First' position.

    You can verify current position via that "IMDb+ Info" screen as explained via the Wiki link. As you can see in the screenshot example it has to show 'First' to work properly on newly imported movie. Which brings us to how you can fix it. Use the "Force IMDb+" hidden menu option in the plugin to convert all your imdb.com scraper-script imported movies over to the IMDb+ scraper-script. Then use the "Refresh" hidden menu option to fix any problems from those original imports.

    And unfortunatly, then you have to open MovingPictures config, goto "Importer" tab, click on the "Manage manual data sources" button and reposition the IMDb+ scraper in first position. Hopefully we can have next IMDb+ plugin released soon that will allow you to correct this easier.

    If a lot of the work you need to do to keep your rename database upto-date (i.e the 350 titles or so you mention) is simply cutting and pasting correct titles against the IMDB ID codes into an XML sheet (I assume it's what's contained within the 'Rename dBase IMDb+ Scraper.xml' file) then is there any way other MediaPortal users like myself can possibly help? Just wondering if some of us can aid in taking some of the mundane work load off your hands and give you time to get on with the coding of new features and bug tracing etc.

    That is pretty much what I'm doing.

    I could use help for the letter 'T', which are the following movies (remember to read the important notes at the bottom):

    [collapse]
    1. Taken -- AKA search for: Taken
    2. Tarzan -- AKA search for: Tarzan
    3. Taxi Driver -- AKA search for: Taxi Driver
    4. Teacher's Pet -- AKA search for: Teacher's Pet
    5. The Adventures of Huckleberry Finn -- AKA search for: The Adventures of Huckleberry Finn
    6. The Amityville Horror -- AKA search for: The Amityville Horror
    7. The Big Sleep -- AKA search for: The Big Sleep
    8. The Black Hole -- AKA search for: The Black Hole
    9. The Blob -- AKA search for: The Blob
    10. The Bourne Identity -- AKA search for: The Bourne Identity
    11. The Boxer -- AKA search for: The Boxer
    12. The Cabinet of Dr. Caligari -- AKA search for: The Cabinet of Dr. Caligari
    13. The Company -- AKA search for: The Company
    14. The Day the Earth Stood Still -- AKA search for: The Day the Earth Stood Still
    15. The Enforcer -- AKA search for: The Enforcer
    16. The Fall -- AKA search for: The Fall
    17. The Flight of the Phoenix -- AKA search for: The Flight of the Phoenix
    18. The Fly -- AKA search for: The Fly
    19. The Front Page -- AKA search for: The Front Page
    20. The General -- AKA search for: The General
    21. The Gift -- AKA search for: The Gift
    22. The Girl Next Door -- AKA search for: The Girl Next Door
    23. The Hills Have Eyes -- AKA search for: The Hills Have Eyes
    24. The Hitcher -- AKA search for: The Hitcher
    25. The Hitchhiker -- AKA search for: The Hitchhiker
    26. The Hound of the Baskervilles -- AKA search for: The Hound of the Baskervilles
    27. The Hunchback of Notre Dame -- AKA search for: The Hunchback of Notre Dame
    28. The Invisible Man -- AKA search for: The Invisible Man
    29. The Island -- AKA search for: The Island
    30. The Italian -- AKA search for: The Italian
    31. The Italian Job -- AKA search for: The Italian Job
    32. The Karate Kid -- AKA search for: The Karate Kid
    33. The Key -- AKA search for: The Key
    34. The Kid -- AKA search for: The Kid
    35. The King and I -- AKA search for: The King and I
    36. The Ladykillers -- AKA search for: The Ladykillers
    37. The Last Frontier -- AKA search for: The Last Frontier
    38. The Last of the Mohicans -- AKA search for: The Last of the Mohicans
    39. The Last Warrior -- AKA search for: The Last Warrior
    40. The Lion in Winter -- AKA search for: The Lion in Winter
    41. The Lion, the Witch & the Wardrobe -- AKA search for: The Lion, the Witch & the Wardrobe
    42. The Longest Yard -- AKA search for: The Longest Yard
    43. The Lost World -- AKA search for: The Lost World
    44. The Man Who Came to Dinner -- AKA search for: The Man Who Came to Dinner
    45. The Man Who Knew Too Much -- AKA search for: The Man Who Knew Too Much
    46. The Manchurian Candidate -- AKA search for: The Manchurian Candidate
    47. The Mark of Zorro -- AKA search for: The Mark of Zorro
    48. The Miracle Worker -- AKA search for: The Miracle Worker
    49. The Mummy -- AKA search for: The Mummy
    50. The Nanny -- AKA search for: The Nanny
    51. The Navigator -- AKA search for: The Navigator
    52. The Navigators -- AKA search for: The Navigators
    53. The Nest -- AKA search for: The Nest
    54. The Old Curiosity Shop -- AKA search for: The Old Curiosity Shop
    55. The Old Man and the Sea -- AKA search for: The Old Man and the Sea
    56. The Other Boleyn Girl -- AKA search for: The Other Boleyn Girl
    57. The Out-of-Towners -- AKA search for: The Out-of-Towners
    58. The Outsider -- AKA search for: The Outsider
    59. The Outsiders -- AKA search for: The Outsiders
    60. The Parent Trap -- AKA search for: The Parent Trap
    61. The Patriot -- AKA search for: The Patriot
    62. The Perils of Pauline -- AKA search for: The Perils of Pauline
    63. The Phantom of the Opera -- AKA search for: The Phantom of the Opera
    64. The Pianist -- AKA search for: The Pianist
    65. The Pickwick Papers -- AKA search for: The Pickwick Papers
    66. The Pink Panther -- AKA search for: The Pink Panther
    67. The Pirates of Penzance -- AKA search for: The Pirates of Penzance
    68. The Poseidon Adventure -- AKA search for: The Poseidon Adventure
    69. The Postman Always Rings Twice -- AKA search for: The Postman Always Rings Twice
    70. The Premonition -- AKA search for: The Premonition
    71. The Prince and the Pauper -- AKA search for: The Prince and the Pauper
    72. The Producers -- AKA search for: The Producers
    73. The Prophecy -- AKA search for: The Prophecy
    74. The Quiet American -- AKA search for: The Quiet American
    75. The Raven -- AKA search for: The Raven
    76. The Return -- AKA search for: The Return
    77. The Ring -- AKA search for: The Ring
    78. The Scarlet Letter -- AKA search for: The Scarlet Letter
    79. The Shaggy Dog -- AKA search for: The Shaggy Dog
    80. The Shining -- AKA search for: The Shining
    81. The Staircase -- AKA search for: The Staircase
    82. The Taking of Pelham One Two Three -- AKA search for: The Taking of Pelham One Two Three
    83. The Taming of the Shrew -- AKA search for: The Taming of the Shrew
    84. The Ten Commandments -- AKA search for: The Ten Commandments
    85. The Texas Chainsaw Massacre -- AKA search for: The Texas Chainsaw Massacre
    86. The Thomas Crown Affair -- AKA search for: The Thomas Crown Affair
    87. The Three Musketeers -- AKA search for: The Three Musketeers
    88. The Time Machine -- AKA search for: The Time Machine
    89. The Unforgiven -- AKA search for: The Unforgiven
    90. The Vanishing -- AKA search for: The Vanishing
    91. The Warrior -- AKA search for: The Warrior
    92. The Way West -- AKA search for: The Way West
    93. The Wind in the Willows -- AKA search for: The Wind in the Willows
    94. The Wizard of Oz -- AKA search for: The Wizard of Oz
    95. Thomas Crown Affair -- AKA search for: Thomas Crown Affair
    96. Titanic -- AKA search for: Titanic
    97. To Be or Not To Be -- AKA search for: To Be or Not To Be
    98. Together -- AKA search for: Together
    99. Tom Jones -- AKA search for: Tom Jones
    100. Tom Sawyer -- AKA search for: Tom Sawyer
    101. Treasure Island -- AKA search for: Treasure Island
    102. Twilight -- AKA search for: Twilight
    103. Two Women -- AKA search for: Two Women

    Here is a small sample of some of the movies I've already done:

    <rename id="tt0400150" title="Alive (Musical)" />
    <rename id="tt0331834" title="Alive (Japan)" />
    <rename id="tt0020629" title="All Quiet on the Western Front (Original)" />
    <rename id="tt0041113" title="All the King's Men (Original)" />
    <rename id="tt0216534" title="All the King's Men (TV)" />
    <rename id="tt0220969" title="All the King's Men (UK)" />

    <rename id="tt1334515" title="Bad Company (UK)" />
    <rename id="tt0068245" title="Bad Company (Western)" />
    <rename id="tt0112443" title="Bad Company (Thriller)" />

    <rename id="tt0078879" title="Bloodline (Classic)" />
    <rename id="tt0488967" title="Bloodline (Cult)" />
    <rename id="tt0462201" title="Bloodline (Urban)" />
    <rename id="tt1190537" title="Bloodline (Docu)" />
    <rename id="tt0434805" title="Bloodlines (TV)" />

    <rename id="tt0106438" title="Blue (UK)" />
    <rename id="tt1223922" title="Blue (India)" />
    <rename id="tt0326773" title="Blue (Japan)" />
    <rename id="tt1148198" title="Blue (Urban)" />

    <rename id="tt0003772" title="Cinderella (Original)" />
    <rename id="tt0845442" title="Cinderella (Korea)" />
    <rename id="tt0128996" title="Cinderella (TV)" />
    <rename id="tt0129672" title="Cinderella (TV/Original)" />
    <rename id="tt0057950" title="Cinderella (TV/Remake)" />
    <rename id="tt0910852" title="Cinderella (Video)" />

    <rename id="tt0163596" title="Dark Water (Canada)" />
    <rename id="tt1677677" title="Dark Waters (UK)" />
    <rename id="tt0391908" title="Dark Waters (Video)" />

    <rename id="tt0821767" title="Dr. Jekyll and Mr. Hyde (Cartoon)" />
    <rename id="tt0070002" title="Dr. Jekyll and Mr. Hyde (Musical)" />
    <rename id="tt0011130" title="Dr. Jekyll and Mr. Hyde (Rare)" />
    <rename id="tt0022835" title="Dr. Jekyll and Mr. Hyde (Original)" />
    <rename id="tt0033553" title="Dr. Jekyll and Mr. Hyde (Classic)" />
    <rename id="tt1159984" title="Dr. Jekyll and Mr. Hyde (TV)" />
    <rename id="tt0346899" title="Dr. Jekyll and Mr. Hyde (Video)" />
    <rename id="tt0340083" title="Dr. Jekyll and Mr. Hyde (UK)" />


    Hopefully that makes it clear on the monikers I've used so far to differentiate movies. Mostly it is just adding "(Original)" or "(TV)" to the end, but lot of times you will find the original was made in say 1935, then a remake in like 1956 that was more popular (these are the ones I used "(Classic)" for) and the movie most people know in 1990+. It is also possible that the very old movie remains the most popular, this is especially the case when the other movie with same title is a different movie (not remake, reboot, etc).

    NOTE #1: Only look for movies that have the *EXACT* same title, skip TV-series (you will see them listed between quotes), and skip any foreign movie where its English translated title is the same, because users can simply enable the existing IMDb+ scraper-script option for that to add Original/Foreign title after the English one.

    NOTE #2: Remember that the main movie, either most recent or most popular (it's 99% of the time the movie that shows up first on the imdb.com results) is *NOT* added to the rename database and left as-is. This way a new user who will mainly have those popular/recent versions of the movie in their collection will never even notice a difference and none of their titles are renamed for these 2+ movies with same title.

    NOTE #3: Only go for 'actual' movies, so skip shorts, unless they are popular/vital (such as the '9' original short movie that is already in the database). I came across some titles with 100+ shorts and it borders insanity to start adding those. In the end I decided to mainly go for the movies that have a little cover image at the imdb AKA results, unless of course they are popular versions that lack a cover (so it sometimes still require checking).

    NOTE #4: Keep in mind that as you come across titles, they might already exist in a series addition to the rename database, for example I left 'The Bourne Identity' in the list, eventhough this is already part of the database. This means the 1988 version that was made for TV and has the exact same title, does not need to be added (unless there would be others as well), because both movies will already be different as "The Bourne I: Identity" versus "The Bourne Identity". But if you want the add the 1988 one as "The Bourne Identity (TV)", then go right ahead :cool:

    NOTE #5: If you are really stuck on some movies, add a little note then, so I can go over them.

    :D
    [/collapse]
     

    zicoz

    MP Donator
  • Premium Supporter
  • September 3, 2006
    896
    63
    Home Country
    Norway Norway
    Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.8

    I could provide you with a 0-byte test file, but considdering the folder has 422 files and 30 folders inside it, it might be better for me to provide som logs to begin with atleast. :)

    Log files come in multiple variations:

    1. Info log files -> they roughly say what is going on, but they do not allow me to find out 'why'
    2. Debug log files (MediaPortal Config log mode changed to 'debug') -> gives me a much better idea as to why things happened
    3. Scraper-debug mode enabled (MovingPictures special option via 'Gear' icon inside importer tab in config) -> Allows me exactly to see everything that happened. However for 80% of the problems the regular debug mode is enough and much easier to read.

    IMPORTANT NOTE: MovingPictures is multi-thread based, meaning it will do multiple things at the same time. This makes log files extremly hard to read, so to make it easier on me import just *ONE* movie. On scraper-debug mode this is a nessessity for other reasons, because it generates almost 2MB of log PER movie.

    To illustrate multithead log files:


    1. Multi-threaded log line for Movie A
    2. Multi-threaded log line for Movie B
    3. Multi-threaded log line for Movie A
    4. Multi-threaded log line for Movie C
    5. Multi-threaded log line for Movie B
    6. Multi-threaded log line for Movie A
    7. Multi-threaded log line for Movie C
    8. Multi-threaded log line for Movie A
    9. Multi-threaded log line for Movie B
    10. Multi-threaded log line for Movie B
    11. Multi-threaded log line for Movie C
    12. Multi-threaded log line for Movie C
    1. Multi-threaded log line for Movie A
    2. Multi-threaded log line for Movie A
    3. Multi-threaded log line for Movie A
    4. Multi-threaded log line for Movie A

    As you understand, reading the 2nd log is much easier for me.

    You provided Info log files, which indeed tell me the steps you took, I see it failed to find a match, you manually approved it, it found movie, activated rename system and obtained artwork. But I can not see 'why' it did that, so please reproduce the problem and give me debug-log files (if I'm still unable to find cause I'll ask for scraper-debug ones). You do not even need to manually approve the movie, the moment MovingPictures shows you that blue info icon that awaits your manual approval, the log file at that moment already contains all I need (and is shorter, easier to read).

    Ok, opened up MovingPictures config, then I deleted all my log-files and imported one movie, this is the logs I have now:


    View attachment movingpictures.zip

    Both MP-config and scraper should now be in debug mode.
     

    Users who are viewing this thread

    Top Bottom