Rename database expansion assistance (1 Viewer)

aquilan

MP Donator
  • Premium Supporter
  • March 29, 2007
    9
    1
    Home Country
    England England
    RoChess,

    If a lot of the work you need to do to keep your rename database upto-date (i.e the 350 titles or so you mention) is simply cutting and pasting correct titles against the IMDB ID codes into an XML sheet (I assume it's what's contained within the 'Rename dBase IMDb+ Scraper.xml' file) then is there any way other MediaPortal users like myself can possibly help? Just wondering if some of us can aid in taking some of the mundane work load off your hands and give you time to get on with the coding of new features and bug tracing etc.
     

    aquilan

    MP Donator
  • Premium Supporter
  • March 29, 2007
    9
    1
    Home Country
    England England
    You can verify current position via that "IMDb+ Info" screen as explained via the Wiki link. As you can see in the screenshot example it has to show 'First' to work properly on newly imported movie. Which brings us to how you can fix it. Use the "Force IMDb+" hidden menu option in the plugin to convert all your imdb.com scraper-script imported movies over to the IMDb+ scraper-script. Then use the "Refresh" hidden menu option to fix any problems from those original imports.

    This part I had already done on the system that reported so many titles wrongly with the IMDB prefix to ensure that IMDB+ had overidden the standard scraper and it did appear to make a difference with some titles.

    And unfortunatly, then you have to open MovingPictures config, goto "Importer" tab, click on the "Manage manual data sources" button and reposition the IMDb+ scraper in first position. Hopefully we can have next IMDb+ plugin released soon that will allow you to correct this easier.

    Unfortunately whilst I have manually adjusted this before (prior to using IMDB+) I didn't think to check it this time round but it seems obvious now that you mention it. I'm guessing this answers my question as to why I received such different renaming results on the two different systems running Moving Pictures with the same titles. It's most likely that even though I'd ensured IMDB+ had been 'forced' in its own hidden plugin menu and performed a refresh there, the original IMDB scraper must have remained in first place on the manual settings. I'll have to check it out the next time I get the chance to pop round my friends place and have a look at their system to see if this is the case.

    To see info about the IMDb+ plugin and IMDb+ scraper-script you have, use the hidden menu -> IMDb+ Info option.
    This is explained with pictures in the Wiki guide - Version Check

    Thanks for the tip and the link but fortunately I'd already done my homework and made sure to read the Wiki. ;) What I meant when I said about not knowing the version of IMDB+ on the other system was that I hadn't thought to check the hidden menu version details before I left and came home to retest what I'd seen there on my own system, not that I didn't know where the option was or that it existed. Apologies for not making it clearer.

    Due to the strange differences I'd seen between their system and mine I'd wondered if you'd pushed out an update due to the fact that it had registered an update when I'd re-entered the plugin and due to the fact that where you'd previously said:

    As for playing catchup, that is one of the whole reasons the IMDb+ plugin has that option now to auto-update the IMDb+ scraper-script, because right now we have over a 1000 users and if one person (or myself), find a problem and reports it in such a way so that I can reproduce and fix, then most of the other users will not even realize something broke, because by then they will have already received the update.

    I mistakenly assumed that as you'd mentioned this initially proving to be a more simple update than you'd anticipated that you might have pushed it out straight-away. My mistake! :oops:

    With regards the custom update for the 'T' list of films, would you want these added to the current XML or put into a new blank sheet? I'm guessing the latter as it'll make it easier for you to review and check it meets your requirements. What I might do is sort through a few, post them over and see if you're happy with the results. But maybe not tonight as it's a bit late of an evening on a Sunday for me to be doing homework. I've got films to watch. :D
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.8

    With regards the custom update for the 'T' list of films, would you want these added to the current XML or put into a new blank sheet?

    As-is would be easiest for me, so that I can copy and paste them into the file I have. You can use paste2.org or an alike website, or even easier just post it in here. And yes, please only give me the new entries, and I'll insert them in alphabetical order. There is a risk that you will do double work on some movies, but I have a simple method now to scan for that, so it is actually easier to correct afterwards then having to verify it on each entry if I were to give you the rename database as I have it now.

    I'm sure you will be able to manually correct things for your system and your friend, but hopefully we can figure out a way to prevent this from happening on a fresh install in the future, because it is very annoying :mad:
     

    aquilan

    MP Donator
  • Premium Supporter
  • March 29, 2007
    9
    1
    Home Country
    England England
    Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.8

    I'm sure you will be able to manually correct things for your system and your friend, but hopefully we can figure out a way to prevent this from happening on a fresh install in the future, because it is very annoying

    To be honest, it's something I'm used to doing so isn't really that much of a bind, especially if you're particularly picky about naming as I am. :) The very nature of film titles along with the sheer number of variables to cover (not to mention peoples own preferences for the formatting) means there'll never be a scraper that can cover every eventuality to suit everybody, or get it right 100% of the time, so I'm afraid manual adjusting comes with the territory. It's unfortunately those that do expect things like this to work perfectly with no intervention on their behalf that we're seeking to please, and is what tends to keep software like MediaPortal a fringe element of the HTPC world.

    But keep up the good work and keep striving for perfection. I look forward to the next major update.

    Will post some 'T' updates as soon as I can.

    :D

    Okay, a quick example for you so that I can see if I've gotten this right, although I can't believe the first 'T' on your list just also happens to be possibly the hardest to organise so I may need you opinion on this one. For the title 'Taken' even though there are many, many exact entries available on IMDB (talk about a popular title!), a majority can be discounted as they're shorts (and aren't any important ones as per your '9' example) thereby leaving five possibilities outside of the most commonly known and popular release featuring Liam Neeson in 2008 (and which I've ignored as per your guidelines). Of these five two are straight to video films so I'd expect to label these as:

    <rename title="Taken (Video)" id="tt1486820"/>
    <rename title="Taken (Video)" id="tt0466338"/>

    I haven't seen any other examples of your employing the term 'Video' in your current database though so don't know if you'd approve of that choice.

    Now of the remaining three, none are TV specials, classics, originals, black & whites, cartoons etc, etc but are just plain run of the mill film dramas. One could be labelled 'Horror' I guess to differentiate it from the others (yet is likely not a good descriptive choice to use as an identifier) but I'm struggling to find other means to separate these three and provide them with a unique identify. Will they suffice as is, or would you have another suggestion?

    <rename title="Taken" id="tt0361038"/>
    <rename title="Taken" id="tt0208495"/>
    <rename title="Taken" id="tt0293627"/>
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.8

    Each rename database entry *HAS* to be unique on the title, otherwise there is no point in even renaming any of them as the problem of the same title is back then :)

    So let me show you how I would tackle that movie title.

    AKA Search for 'Taken'

    The first "Taken (2008)" is obviously the popular one that we want to keep as-is. So that will be the movie that will get no entry in the rename database, so it will just be "Taken".

    In the same "Popular Titles" segment, you can skip 2, 3 and 4, because they are either TV-series or do not actually share the same title (one of the AKA titles matched, but we skip those, because the add original/foreign language scraper-script option already takes care of those).

    Which brings us to the "Titles (Exact Matches)" section with 13 movies in it (trust me that is a very small amount compared to some of the movies I've already done with 100+ entries in that area. Then I open all those 13 items via CTRL+click in background tabs and then close all the ones that are 'short'. That leaves me with "Taken (1999)", "Taken (2002)", "Taken (2003)", "Taken (Video 2001)", "Taken (Video 2009)". If it's a gigantic amount of movies, then I would have only opened the ones that have a cover image, but at 20 or less it usually means I have to open them all (as you will see on these 13).

    So those are all movies, which means 5 entries for the rename database. I would then name them as follows to differentiate them:

    Code:
    	<rename id="tt0208495" title="Taken (Classic)" />
    	<rename id="tt1486820" title="Taken (Docu)" />
    	<rename id="tt0466338" title="Taken (Video)" />
    	<rename id="tt0293627" title="Taken (Cult)" />
    	<rename id="tt0361038" title="Taken (Horror)" />

    My reasning for the naming is as follows; I first look at year to find the oldest movie. If it shares story line with the one we skipped "Taken (2008)" then it becomes "(Original)", if it's like a really old one like 1900-1920 then I usually go with "(Rare)" because there is usually a color version remake 1920-1940 that is more worthy of the "(Original)" moniker. Any 1940-1980 version then usually becomes "(Remake)", unless there are other aspects I can use to differentiate.

    So now that we found easy classic one (there is no original for this movie), I look at the others, and I notice there is only one documentary. So I flag that one via "(Docu)". That then leaves one more 'video' release, so that becomes "(Video)". Now it comes down to the hard ones. Fortunatly you can see on the "Taken (2003)" that is has a very low score, scroll down more and you can see it was made on a very cheap budget as well, which makes it perfect candidate for a cult classic, aka "(Cult)". Then we are down to final one. It's not made in the UK (you can use 'country' for that), or a made for TV movie. The main identifier that differentiate it from our "Taken (2008)" goal is that it clearly is a horror, so then I decide to just go with "Taken (Horror)", because if it was me, that's how I would remember it if I had it in my collection.

    You have to remember that 99% of the users will only have the one "Taken (2008)" in their collection, another 0.9% might then have the "Taken (1999)" movie as well (it's actually a nice movie), but it is down to the extremly OCD user to have all 5.

    PS: please keep the 'exact' format/syntax in check, so it is an easier copy and paste for me, including the leading TAB character before "<rename", the order of id+title+sortby (optional) and the space before end "/>" part.

    PPS: You can of course now skip 'Taken', as I already added this one to the file now :cool:
     

    ltfearme

    Community Plugin Dev
  • Premium Supporter
  • June 10, 2007
    6,751
    7,196
    Sydney
    Home Country
    Australia Australia
    Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.8

    I think you may have scared him Rochess :)
     

    aquilan

    MP Donator
  • Premium Supporter
  • March 29, 2007
    9
    1
    Home Country
    England England
    Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.8

    PS: please keep the 'exact' format/syntax in check, so it is an easier copy and paste for me, including the leading TAB character before "<rename", the order of id+title+sortby (optional) and the space before end "/>" part.

    I had made a note of the format layout from your own rename dBase scraper xml file in order to ensure I followed it exactly and save you having to adjust anything by hand (which would defect the object of helping out in the first place) so assumed it would be correct. But after seeing the very different layout you presented I forced the XML to open in Firefox (it had by default originally opened in Internet Explorer, which I don't use) and noticed that it now looked exactly as per your examples. So, apologies for the formatting error but it would seem to be an issue with the way in which IE displays the XML layout onscreen and explains why I got it wrong.


    PPS: You can of course now skip 'Taken', as I already added this one to the file now

    Okay, this is how I tackled the 'Tarzan' entries in IMDB:

    Within the two popular titles the first entry is likely the most well known movie (seeing as Tarzan is better known as a TV series than as a specific film) but as it's a cartoon I've still labelled it as such. The second entry is a Tarzan film but has it's own specific title so I'm assuming doesn't need separating under the Tarzan grouping, but if you want the entry anyway, here it is:

    <rename id="tt0087365" title="Greystoke: The Legend Of Tarzan, Lord Of The Apes" />

    This does beg an important question actually as to whether the colon ":" is acceptable as I know it isn't usually a recognised character in a lot of Windows app's so tends to be replaced with a dash "-" instead. Is that the case here? In other words it would then look like this:

    <rename id="tt0087365" title="Greystoke - The Legend Of Tarzan, Lord Of The Apes" />

    Now, moving onto the exact matches, the first, second, fifth and ninth entries (1932, 1981, 1999 & 1985) again have their own specific titles so shouldn't need to be indentified within the 'Tarzan' grouping seeing as the scraper should name them as they are and so they will be uniquely identifiable. Again though, if you require these entries, then here they are:

    <rename id="tt0023551" title="Tarzan the Ape Man (Original)" />
    <rename id="tt0023551" title="Tarzan the Ape Man (Remake)" />

    Note: This second entry above is not an actual remake so maybe better titled as "Video" but it does feature some 'A' list stars of the time so did receive a proper cinematic release.

    <rename id="tt0217081" title="Tarzan of the Apes (Cartoon)" />
    <rename id="tt0364049" title="Adventures of Tarzan (Bollywood)" />

    The third, fourth, sixth and eighth entries (2003, 1966, 1991 & 1996) I've ignored as they're TV series.

    The seventh and tenth (1999 & 1984) I've also ignored as they're video games.

    The eleventh and fourteenth (1990 & 2004) I've ignored as they're film shorts of no consequence.

    So that leaves the following as proper entries to be accounted for taken from the first popular title match and thirteen/fourteen from the exact matches:

    <rename id="tt0120855" title="Tarzan (Cartoon)" />
    <rename id="tt1705952" title="Tarzan (German)" />
    <rename id="tt0918940" title="Tarzan" />

    The last two of the three titles above are both still in development and have very few details available. It's unknown if they'll even see the light of day but I've noticed you listing films that haven't yet been released so assume you would still want them entered into the database. One appears to be German produced, hence the subtitle and the last one seems to be a possible big bugdet new version. As there aren't any well known or popular film versions for Tarzan (other than Greystoke) then I'm guessing this may become the default one so I've left it unsubtitled.

    Okay, how did I fare this time round? ;)

    P.S Just in case you think I missed it, I didn't forget the tabbing for each entry, it just seems to have been removed by the forum engine when posted.
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.8

    Okay, how did I fare this time round? ;)

    MovingPictures will show and use : characters fine and they should be used as such.

    Only deal with entries that have an exact duplicate movie title at imdb.com (as in the blue clickable link part on that AKA search page), so you do not even have to bother with the 2nd entry under popular titles of "Greystoke: The Legend of Tarzan, Lord of the Apes", because that movie will be imported as-is and have a different title that will not cause a conflict. Unless there was *another* movie called "Greystoke: The Legend of Tarzan, Lord of the Apes" made in a different year, then we have to put an entry in for one, because the goal is; to never have movies show up with the same title inside MovingPictures.

    That means, the only entries that actually end up causing a problem are "Tarzan (1999)", "Tarzan (2012)" and "Tarzan (2013)". However we are still 2011, and nothing is really known about the other movies (pre-production according to their page) so you can skip those as well. They have to at least contain runtime info, and actor+crew info to be considered, unless they are a sequal to an existing movie. For example I already had the next 'Die Hard' movie in the database, and I just edited that one a few days ago when the subtitle 'A Good Day to Die Hard' become known (same with next James Bond now called 'Skyfall'.

    But the only way I can add those future ones easy is if it is 3rd or higher movie in the series. Reason being is because otherwise I have to also add an entry for the original one, and then there is the problem that people for example see "Avatar I" and get confused, because we still have to wait a long time for "Avatar II" to even be in theaters. I solved that by still adding it and using the sortby trick to keep title as-is, but then I still have to edit it later when sequal is behind us. This is another reason for that option in which one can strip the 'I' from first movies in the series (but retain the sortby).

    So getting back to Tarzan, the only movie left that is actually called "Tarzan", is not a TV-series, not a short, not a video-game, not an empty future one... is actually the "Tarzan (1999)" one, so after all that work, no entry has to be made into the database, because there is no conflict with any other (until next-year when more info will become available for the two future ones, but I'm signed up for RSS alerts on that).
     

    RoChess

    Extension Developer
  • Premium Supporter
  • March 10, 2006
    4,434
    1,897
    Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.8

    My bad "Tarzan the Ape Man" does come up with titles that have conflicts.

    However "Tarzan the Ape Man" and "Tarzan, the Ape Man" are different. It's only a comma, but that doesn't matter, it is still different.

    Granted the "Tarzan, the Ape Man (1959)" is in fact a remake of the 1939 one, but in this case we don't have to bother with that, because the titles are different.

    There is however one problem then and that is that there is also the "Tarzan, the Ape Man (1981)" version. It has the exact same title as the 1959 one, but it's a new concept on the story (different POV), so we have to differentiate the two.

    The easiest way to then achieve that is to overrule the 1959 one having a different title and name that one "Tarzan the Ape Man (Remake)" so it stays together with the 1939 one and gets the point across.

    Could have also gone with "(Cult)" for the 1981 one with Bo Derek, due to the low score, but that becomes a judgement call then. The rule that I personally use is what movie the majority of users will actually have in their collection (I prefer not to adjust the title on those), and then the 1981 movie trumps the 1959 one.
     

    aquilan

    MP Donator
  • Premium Supporter
  • March 29, 2007
    9
    1
    Home Country
    England England
    Re: IMDb+ Scraper - Movie Scraping on Steroids v3.3.8

    Only deal with entries that have an exact duplicate movie title at imdb.com (as in the blue clickable link part on that AKA search page), so you do not even have to bother with the 2nd entry under popular titles of "Greystoke: The Legend of Tarzan, Lord of the Apes", because that movie will be imported as-is and have a different title that will not cause a conflict.

    I had already noted this as being the case, as per where I said:

    The second entry is a Tarzan film but has it's own specific title so I'm assuming doesn't need separating under the Tarzan grouping

    and

    Now, moving onto the exact matches, the first, second, fifth and ninth entries (1932, 1981, 1999 & 1985) again have their own specific titles so shouldn't need to be indentified within the 'Tarzan' grouping seeing as the scraper should name them as they are and so they will be uniquely identifiable.

    With regards the two 'Tarzan The ape Man' titles, I would guess they'd get their own entry separate from the 'Tarzan' grouping due to their being duplicates. I know the 1981 version had a comma after the word 'Tarzan' but it's such a minor difference that I would still count them as duplicate titles and add a bracketed subtitle ('Cult' as you mentioned sounds good for the 1981 entry) with the 1932 version recommended as either being subtitled '(Original)' or simply left as plain 'Tarzan The Ape Man'. I couldn't initially see where you got the 1959 version from but found it after filtering for films entitled 'Tarzan The Ape Man'. So, for me personally, I would group these three in the database as (in the order of the 1932, 1959 & 1981 versions):

    <rename id="tt0023551" title="Tarzan the Ape Man" />
    <rename id="tt0053335" title="Tarzan, the Ape Man (Remake)" />
    <rename id="tt0083170" title="Tarzan the Ape Man (Cult)" />

    I wouldn't recommend relying on the comma as a means to distinguish the 1932 and 1981 versions from each other as, if you'll notice, no such comma appears within the text on the cover art for the 1981 film so you could argue that they are in fact both titled 'Tarzan The Ape Man' and not 'Tarzan, The Ape Man' and that IMDB have listed it incorrectly. Plus, I doubt it's something most users would necessarily take much notice of in the Moving Pictures film listing as a means to distinguish them and see them as separate entities.

    I'll get to work on some titles for 'T' and ship them over for your perusal as soon as I've completed a few.
     

    Users who are viewing this thread

    Top Bottom