Req: Music database update speeded up (1 Viewer)

HappyTalk

Portal Pro
July 16, 2006
307
8
UK
I've had to stop using the music database as it takes a long time to do the inital scan of 40k mp3's, which is fair enough, all those id3 tags to read. However when you later add a few new mp3's, rather than just scanning in the new ones it seems to rescan the whole lot, presumably checking each mp3's ID tags as it takes hours and hours all over again. I guess I could make it rescan at night.

I wondered if it would be practical to store the file's modified time in the database, using that and the files full pathname as a unique key, then you can just add each file found during a fast folder re-scan to the database which will reject any unchanged duplicates. At the end any old updated files can be rejected en masse by deleting any duplicate pathnames (disregarding modified time and using db dirty_flag to select the previous not new entry). Using a dirty_flag again to select newly added records that can have their id3 data scanned in. Dunno if stored procedures are available/used to make this quicker?


(sorry for all the posts, I wanted to get them out of the way and get opinions before I start playing with the code myself (just for fun!) to test the feasability )
 

HappyTalk

Portal Pro
July 16, 2006
307
8
UK
Examining the database I see it uses a dwFileNameCRC so I presume it has to re-read each file in its entirety to generate the checksum in order to compare to the original one to detect any changes, I guess this is what is taking the time. Generating a CRC for 40k unchanged records to detect the 14 new ones!

I can see that it was done for 100% accuracy in detecting changes but seems to be overkill. I can see 2 solutions:-

1) Quick n' dirty. Simply add a date field to the Music | Music database dialog in the MediaPortal Setup, allowing to only add files who's modified date is after this date. This can remember the last date when a previous scan was done as the default value. Obviously this option can be disabled to restore the current functionality. I have already used this method in a program I write for dvdr backing up mp3's/any files to select 4.2gb chunks of files
after a certain date.

2) Replace the filename crc with the last modified date. Obviously it's a lot, lot quicker to compare a files current modified date with it's previous one to detect a change than generating CRC's. You could argue this might in a minority of cases give false results, but in real situations I'm sure 99.9% of the time it would be correct but would save hours and hours of time. Alternatively a 'file_modified_time' field could be added in addition to the CRC flag and a quick scan or long scan (as is now) could be offered.

For me the first method is easiest to code as I've already written this before and am about to give it a go to see how it works out.
 

HappyTalk

Portal Pro
July 16, 2006
307
8
UK
Hmmm well I haven't looked into how efficient the dotnet filewatcher thingy is, though I tend to reduce the number of external services running as far as poss and am not a big fan of more apps sitting watching folders that kick in as sometimes a simple change in file explorer suddenly kick starts some other program, cpu goes to 100% and I'm in task manager wondering what the hell is going on...

I planned to have a go at rethinking the sql to see if I could improve on it, but have not had time yet, looked thru the code though and fancy some crazy joins might do it, though never used SQLite.
 

hwahrmann

Development Group
  • Team MediaPortal
  • September 15, 2004
    4,633
    2,457
    Vienna, Austria
    Home Country
    Austria Austria
    HappyTalk,

    if your PC is a dedicated machine, whith MP running continously, then Musicsharewatcher is run as an MP Plugin.
    No additional service at all.

    However, you have always the choice to start it as an external app before copying/updating your mp3s and afterwardds you can terminate it from the system tray.
    no magic and overhead at all.


    regards,

    Helmut
     

    HappyTalk

    Portal Pro
    July 16, 2006
    307
    8
    UK
    OK, fair enough, though in my case I maybe block add mp3's once a month, maybe less, but I would presumably have something scanning for changes 24/7. I'm having problems on a dedicated clean machine with sempron 2800 getting MP to run efficiently anyway so that probably skews my thinking. I still can't help but think there is a much simpler way though.
     

    rtv

    Retired Team Member
  • Premium Supporter
  • April 7, 2005
    3,622
    301
    Osnabruck
    Home Country
    Germany Germany
    Examining the database I see it uses a dwFileNameCRC so I presume it has to re-read each file in its entirety to generate the checksum ...

    Nope - the CRC is just created for the filename itself (not the entire file) to have a unique, fast db field which is also suitable for assigning cover art.
     

    Users who are viewing this thread

    Top Bottom