SQLiteDatabase Plugin for MP2 (1 Viewer)

morpheus_xx

Retired Team Member
  • Team MediaPortal
  • March 24, 2007
    12,073
    7,459
    Home Country
    Germany Germany
    I need to explain the queries above: they are filters for "Album" covers, where not a single MediaItem is queried, but the thumbnail of first MediaItem that has the requested album name.

    That's why I introduced LIMIT and OFFSET: album query returns 20 titles/thumbns, but I only need the first.

    I already planned to do this "paging" inside SQL layer, there is a comment and TODO. Problem is, that SQLCE has an error when using JOINS and LIMIT in a query. So maybe we create an extended DB interface for "paging" support, if the DB plugins doesn't support it, we fall back to current way (IDataReader.Skip().Take(), which is slower).

    The "paging" support is also a main feature for @Valk 's data source virtualization, so the DB can return only the needed range of a query.
     

    MJGraf

    Retired Team Member
  • Premium Supporter
  • January 13, 2006
    2,478
    1,385
    Thanks Morph,
    At least according to the specs, "limit X", "limit X offset y", as well as "limit y,x" should work in SQLite also together with joins. Extending the ISQLDatabase interface accordingly so that also DBs that do not support this is IMO a good idea.

    Just as a side note: I rebased to current dev branch last night and since then there is no more LIMIT 1 at the end of the queries issued to fetch the covers. As a comparison
    This is how such a query looked like based on the older EXP_wip branch I used until yesterday evening:
    Code:
    [2013-09-21 19:26:24,808] [4423623] [39       ] [DEBUG] - MediaLibrary: Search time 00:00:00.0051193 for query MediaItemQuery: NecessaryRequestedMIATypes: [493f2b3b-8025-4db1-80dc-c3cd39683c9f, 1fda5774-9ac5-4873-926c-e84e3c36a966], OptionalRequestedMIATypes: [], Filter: [AudioItem.Album EQ 16 Biggest Hits], SortInformation: [MediaPortal.Common.MediaManagement.MLQueries.SortInformation] LIMIT 1
    And this is how it looks like based on current dev:
    Code:
    [2013-09-22 02:07:57,785] [87993  ] [35       ] [DEBUG] - MediaLibrary: Search time 00:00:00.0401839 for query MediaItemQuery: NecessaryRequestedMIATypes: [493f2b3b-8025-4db1-80dc-c3cd39683c9f, 1fda5774-9ac5-4873-926c-e84e3c36a966], OptionalRequestedMIATypes: [], Filter: [AudioItem.Album EQ 1], SortInformation: []
    As you can see, the "LIMIT 1" at the end is missing now. The code that logs this is unchanged as displayed above (just miq.ToString()).
    I don't know how this query is initiated - probably somewhere in a skin model from which I have no knowledge at all. Maybe you, Morpheus_xx can point me where to look at to make sure this is not a bug.
    Thanks, Michael
     

    MJGraf

    Retired Team Member
  • Premium Supporter
  • January 13, 2006
    2,478
    1,385
    Did I ever say that I don't see much room for improvement :p ?

    The issue I apparently had (and certainly still have to a large extent) is that I never worked with SQLite or System.Data.SQLite before and to really tune a database engine for performance you really have to know a lot about the internals of that database. While SQLite itself (the C-Library) is fairly well documented, the documentation of System.Data.SQLite is in my opinion very small. So the only way to find out how things work is to read the source code and I have already found out some very interesting things. But let me start from the beginning:

    Very early in this thread I shortly explained why we switched to WAL-mode ("write ahead logging"): Threads writing to the database do not block reading threads in WAL-mode, which gives us a noticeable speed improvement in particular when a MP2 Client is in use (i.e. reading the database) while there is an import running (i.e. writing to the database).
    Technically (and very simplified) this works as follows: Writers do not write to the database file itself, they write to a WAL-file. Readers therefore first check whether the data they query is contained in the WAL-file (i.e. it has been changed by a writer before). If so, they read from the WAL file (which does not affect writers because they simply append every new write transactions to the end of the WAL file). If they don't find the queried data in the WAL file, they read from the database file itself.
    This implies that at some point in time the content from the Wal-file has to be transferred to the database file. This is called a "Check-Point". Such a Check-Point may (depending on how it is performed, but this is quite complicated so let's just assume it is like that) block readers and they in particular block the thread in which the Check-Point is performed.
    Now in SQLite standard settings, there is a thing called "Auto-Checkpoint". This means that after a certain number of pages (standard: 1.000) is written to the WAL-file, the next write-commit automatically triggers a Check-Point operation. We use a page-size of 4KB, which means such check-point is executed about every 4MB written to the database. Now my database is 1.7GB in size, which means there are more than 400 Check-Points during my initial import. You can watch that if you use the latest SQLiteDatabase plugin version and have a look into the database directory while an import is running. There is a file ending with "wal", which appears, grows until about 4MB, disappears (= checkpoint) and then reappears again all the time.
    So my idea was that during an import I could switch off the auto-checkpoint feature and switch it back on, when the import is finished. That would result in saving the time of more than 400 checkpoints and only have one checkpoint, which may take a little longer, but my hope was that it doesn't take as long as the 400 short checkpoints.
    But what I found out was something completely different.... I switched off the auto-checkpoint feature in my code, started a new import and watched the database directory and guess what - the wal-file appeared, grew until about 4MB, disappeared and showed up again all the time. What I was expecting is a wal-file, which constantly grows to about 1.7GB and in the end is transferred to the database file and only then disappears - but this was not the case. So apparently there were still regular checkpoints happening - but I switched them off!!?!?

    After a long research in the internet I found out that there is one point in time when a checkpoint happens in any case - independent from the autocheckpoint setting: When the last connection to the database is closed. Then I thought that this is impossible, because I switched on the ConnectionPool feature in System.Data.SQLite and if we do so, the expected behavior is that when you call "Close();" on a database connection it is not actually closed, but it stays open and is returned to the ConnectionPool to be reused later. But apparently, this was not (really) the case.
    So I digged quite deep into the System.Data.SQLite code and found that they implemented the ConnectionPool with weak references! That means that during every garbage collection cycle all the databases, which are currently in the pool, are disposed (and therefore closed). This really shocked me, because all the tests I did with SQLite settings were pretty much useless.
    As an example: Every connection maintains its own cache. When the connection is closed, the cache is deleted. So my tests on what cache size is optimal, did not make sense since most likely every connection was only used a few times (depending on how often the GC runs) and then closed. In such a scenario the cache-size does of course not really matter...

    So what we need here in the first place is our own implementation of a connection pool with strong references. Funnily, there is already an implementation in our FireBirdDatabase directory, which is however not used in the current implementation of the FireBird database. There is also a comment in the FireBirdDatabase code saying something like "we don't need the connection pool in the database connection string, we use our own implementation". @Albert are you still around? Is it possible that you found out the same about the FireBirdDatabase and therefore began to implement our own ConnectionPool?

    Anyway - I will implement our own ConnectionPool, but to make things more comparable I will first write down a final test protocol, because we have to change the one above due to the findings in the meantime. In particular we have to differentiated between "query-time" (i.e. the time it takes until a query finishes as per the SQLDebug.Log) and the "read-time" (i.e. the time until the data from the query is actually read from the database and put into MediaItem objects - this value we can take from my modification to the MediaLibrary on the last page of this thread, which logs this time to our regular server.log).
    Once the test protocoll is ready, I will run the whole test for the current v0.07 version of this plugin (to be published shortly after Alpha3) and write down the results here compacted in one thread.
    After that I will start to implement further improvements (in particular the ConnectionPool) and test them according to this test protocol to make sure the figures are 100% comparable to our clean v0.07.

    So stay tuned.... ;)
     

    MJGraf

    Retired Team Member
  • Premium Supporter
  • January 13, 2006
    2,478
    1,385
    Database Test Protocol
    In the following I will describe the exact steps I will take for the tests:

    Used Code Base
    I will use the code base of MP2 Alpha 3 to exclude influences caused by other developments. There are, however, two changes in the code necessary for the tests compared to a clean MP2 Alpha 3 installation:

    In MediaLibrary.Search(...) I have replaced
    Code:
    IList<MediaItem> items = cmiq.QueryList();
    by
    Code:
    var sw = System.Diagnostics.Stopwatch.StartNew();
    IList<MediaItem> items = cmiq.QueryList();
    sw.Stop();
    ServiceRegistration.Get<ILogger>().Debug("MediaLibrary: Search time {0} for query {1}", sw.Elapsed, query.ToString());

    In MediaLibrary.GetValueGroups(...) I have replaced
    Code:
    return cdavq.Execute();
    by
    Code:
    var sw = System.Diagnostics.Stopwatch.StartNew();
    var result = cdavq.Execute();
    sw.Stop();
    ServiceRegistration.Get<ILogger>().Debug("MediaLibrary: Search time {0} for query {1}", sw.Elapsed, cdavq.ToString());
    return result;

    Both changes are necessary to log the Read Test results to Server.log.
    The Skin used during the tests is "Simple MP2 Default Skin".

    Time Measurement
    For Read Tests, we have to differentiate between the time it takes until the query has been executed (Query Time) and the time it takes until the data from the query result has actually been read from the database (Read Time).
    The Query Time will be taken from the SQLDebug.log. If a particular Read Test results in multiple SQL queries, I will just add the time shown for all queries in the Debug.log and just report one Query Time here to keep the amount of data manageable.
    The Read Time will be logged into the Server.log by the code changes as described above. The Read Time comprises the Query Time and additionally the time it takes to read the data from the database and put it into the form of the objects we use within MP2.
    Every Read Test will be executed twice without quitting the MP2 Client or restarting the MP2 server in between. The first result is expected to be slower than the second result due to various caching mechanisms in place. The result will be shown as
    Read Test X: y ms, z ms (zs ms)
    where y ms is the result of the first test run and z ms is the result of the second test run, and zs ms is the sum of both, each in milliseconds.

    The Write Test consists of a large import of music files. The time will be taken from the sever.log as the difference between the start time and the end time of the import job. It will be shown as mm:ss.

    Preparation
    • Quit MP2 Client and stop MP2 server
    • Delete the data directories of MP2 Client and MP2 server
    • Start MP2 Server
    • Start MP2 Client and attach it to the MP2 Server
    • Go to Audio/View Mode and select "small list" (to avoid the non-virtualized wrap panel)
    Write Test
    • Add a global share with the local file system resource provide comprising 17.794 MP3 files and 2.332 flac files (in total 20.126), each file comprehensively tagged and each containing embedded cover art in different sizes.
    • Exit MP2 Client and wait for the import to finish
    Read Tests
    There will be four Read Tests:

    Read Test 1
    • Restart the MP2 Server
    • Execute the following synthetic search
      Code:
      var necessaryMias = new List<Guid> { MediaAspect.ASPECT_ID };
      var optionalMias = new List<Guid>(ServiceRegistration.Get<IMediaItemAspectTypeRegistration>().LocallyKnownMediaItemAspectTypes.Keys);
      MediaItemQuery miq = ServiceRegistration.Get<IMediaLibrary>().BuildSimpleTextSearchQuery(searchString, necessaryMias, optionalMias, null, false, true);
      var result = ServiceRegistration.Get<IMediaLibrary>().Search(miq, false);
      with searchString="simon".
      This search leads to 12 SQL queries which are shown here and here and results in 235 MediaItems found and read from the database into RAM including all their metadata and covers.
    • Execute this synthetic search a second time
    Read Test 2
    • Restart the MP2 Server
    • Start the MP2 Client
    • Enter the Audio Section
      This leads to two identical calls to MediaLibrary.GetValueGroups and, hence, to two identical SQL Queries as shown here. [Note: This may be a bug. For now I take the first call to GetValueGroups as the first test run and the second call as second test run. Should this be a bug and be solved, we leave the Audi Section via ESC and then reenter it for the second test run].
      The search returns 2.306 albums.
    Read Test 3
    • Enter an album with 11 tracks (Album 1).
      Each track of Album 1 has an embedded cover art of 48,3 KB so that, besides the text metadata, cover art in the size of 531,3 KB are read from the database.
      This leads to six SQL queries as shown here.
    • Leave Album 1 with ESC
      [Note: This leads to Read Test 2 being executed again, but we ignore these log entries]
    • Reenter the same album for the second test run.
    Read Test 4
    • Leave Album 1 with ESC
      [Note: This leads to Read Test 2 being executed again, but we ignore these log entries]
    • Enter an album with 12 tracks (Album 2).
      Each track of Album 2 has an embedded cover art of 2,80 MB KB so that, besides the text metadata, cover art in the size of 33,6 MB are read from the database.
      This leads to the same six SQL queries as in Read Test 3, but with a different parameter-string V0.
    • Leave Album 2 with ESC
      [Note: This leads to Read Test 2 being executed again, but we ignore these log entries]
    • Reenter the same album for the second test run.
    These are all the Read Tests I am planning to perform for now. I think this gives a good picture of the overall read performance. Read Test 1 results in heavy queries with a lot of "likes" in it and returns a huge amount of data. Read Test 2 is not too complex and does not return any binary data, but it very common in day to day use. Read Tests 3 and 4 are also very common scenarios with Read Test 3 returning a "usual" amount of binary data in form of cover art and Read Test 4 being the "hard core binary data test". I don't think we will regularly store fan art with more than 2.8MB in the database...

    And now finally we can start our tests :D
     
    Last edited:

    MJGraf

    Retired Team Member
  • Premium Supporter
  • January 13, 2006
    2,478
    1,385
    To have a comparison, I have done the test with the standard SQLCEDatabase first.
    I had to make a single code change to the SQLCEDatabase before. I changed
    Code:
    INITIAL_MAX_DATABASE_SIZE = 1024
    to
    Code:
    INITIAL_MAX_DATABASE_SIZE = 2048
    because my import results in a database > 1 GB and if I don't change this, the test would cancel in the middle and I would have to restart the MP2 Server and then continue the test.

    SQLCEDatabase (MP2 Alpha 3 version):
    Import Test: 139:34
    Read Test 1
    Query Time: 616 ms, 608 ms (1224 ms)
    Read Time: 78758 ms, 77671 ms (156429 ms)
    Read Test 2
    Query Time: 96 ms, 102 ms (198 ms)
    Read Time: 1264 ms, 1313 ms (2577 ms)
    Read Test 3
    Query Time: 301 ms, 300 ms (601 ms)
    Read Time: 422 ms, 399 ms (821 ms)
    Read Test 4
    Query Time: 300 ms, 336 ms (636 ms)
    Read Time: 1942 ms, 547 ms (2825 ms)

    Now I'll do the same test with SQLiteDatabase v0.07. If it passes the test, I will publish the results, push the source to Git and upload the binaries to our new plugin repository :)
     

    MJGraf

    Retired Team Member
  • Premium Supporter
  • January 13, 2006
    2,478
    1,385
    And now hold your breath... I just did these tests twice, because I couldn't believe it, but the difference between the two tests was < 5%. So here are the results:

    SQLiteDatabase (v0.07):
    Import Test: 31:38
    Read Test 1

    Query Time: 349 ms, 313 ms (662 ms)
    Read Time: 12457 ms, 12159 ms (24616 ms)
    Read Test 2
    Query Time: 198 ms, 219 ms (417 ms)
    Read Time: 245 ms, 239 ms (484 ms)
    Read Test 3
    Query Time: 0 ms, 3 ms (3 ms) [Note: First test run seems to be below the accuracy of our SQLDebug.Log entries]
    Read Time: 20 ms, 19 ms (39 ms)
    Read Test 4
    Query Time: 18 ms, 16 ms (34 ms)
    Read Time: 224 ms, 116 ms (340 ms)

    Comparison SQLCEDatabase (MP2 Alpha 3 version) vs. SQLiteDatabase (v0.07):
    Import Test: 441%
    Read Test 1

    Query Time: 185%
    Read Time: 635%
    Read Test 2
    Query Time: 47%
    Read Time: 532%
    Read Test 3
    Query Time: 20033%
    Read Time: 2105%
    Read Test 4
    Query Time: 1870%
    Read Time: 830%

    SQLiteDatabase is nearly 4,5x as fast as SQLCEDatabase in writing. It is between 5x and 21x as fast in reading and in general - with one exception - between 1,8x and 200x as fast in executing SQL read queries. Why executing the read queries for Read Test 2 takes 2x longer is not clear yet. But since the Read Time of the same test is again more than 5x faster, I would suspect that SQLiteDatabase already reads something into RAM when the query is executed.

    Not that bad, is it? :D
     
    Last edited:

    MJGraf

    Retired Team Member
  • Premium Supporter
  • January 13, 2006
    2,478
    1,385
    @morpheus_xx @chefkoch, I'd like to upload v0.07 to our new plugin repository but either I don't have upload rights or I'm too dumb to find out how to do that. Could someone please point me to the right direction? Thanks!
     

    Users who are viewing this thread

    Top Bottom