SQLiteDatabase Plugin for MP2 (5 Viewers)

chefkoch · September 30, 2013

FYI: I extended the wiki page about database backends with an overview of the existing ones. Feel free to extend it and/or keep it up to date.

MJGraf · September 30, 2013

Great, thanks!
I already thought about putting the essence of this thread into the Wiki (maybe one page for end users and one page with more technical stuff). Let's see if I find the time over the next weeks...
Michael

MJGraf · October 3, 2013

I implemented a connection pool which gave some good performance improvement - so far only tested with the ImportTest, i.e. for the write speed.
Then I tried a lot of further settings and additional implementations which I thought could further improve the write speed - but they didn't. For documentation purposes here are first the test results and then some explanation what I tried. The results are so far only results from the ImportTest. I thought it is the easiest way to first try to improve write speed, then take the fastest version for the write speed and try to improve read speed.

ConnectionPool
==> 24:39

[EDIT: As of here, the some of the values are not correct. I discovered later that when using "Shared Cache per URI" the page size value in the connection string is disregarded. Although I thought I was using a page size of 4KB, I actually used a page size of 1KB. Therefore the Cache (measured in number of pages in the connection string) was not 400MB but about 100MB]

ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 10000
==> 24:39

ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0
> 30 minutes

ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0 / maunal checkpoint(PASSIVE) in new thread every 1000 commits
==> 24:46

ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0 / maunal checkpoint(PASSIVE) in new task every 1000 commits
==> 25:44

ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0 / maunal checkpoint(FULL) in new task every 1000 commits==>25:58

ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0 / maunal checkpoint(RESTART) in new task every 1000 commits
==> 26:47

ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0 / maunal checkpoint(RESTART) in new task every 100 commits
==> 25:03

ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0 / maunal checkpoint(RESTART) in new task every 100 commits / every commit in own task
==> 24:44

ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0 / maunal checkpoint(PASSIVE) in new thread every 100 commits / every commit in own thread
> 30 minutes

ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0 / checkpoint(PASSIVE) in a separate thread every 60 seconds
==> 27:10

ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0 / checkpoint(PASSIVE) in a separate thread every 10 seconds
==> 23:43

ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0 / checkpoint(RESTART) in a separate thread every 10 seconds
==> 24:06

ConnectionPool: That's a simple implementation of a connection pool using .NET's BlockingCollection backed by a ConcurrentBag. When you call BeginTransaction on the Database it does not always create a new connection to the database. It first tries to get an already opened connection from the BlockingCollection. If there is a connection available, it takes this one from the BlockingCollection and uses it to begin the transaction. If there is no connection available in the BlockingCollection there are two possibilities: Either we have in total less connections then specified as MaxConnections in the connection pool - then it creates a new connection and uses the new one. When there are already MaxConnections in use, the calling thread is blocked until another thread returns its connection which is then used. In the latter case there is a ConnectionTimeout which indicates how long to wait before throwing an exception. When the transaction is disposed, the underlying connection is returned to the pool without closing it. The standard value for MaxConnections I used was 10, the ConnectionTimeout was 30 seconds.
The interesting thing is that there were never more than 3 connections in use - and this only when I was using an MP2 Client while the import was running. So I suspect that the import itself does not use more than 1 or 2 connections. For me this means that we could improve the import speed, if we used more parallel threads / tasks in the ImportRunner itself. But this has nothing to do with the database so I'll not investigate this further for now.

MMAP 256MB: A "PRAGMA MMAP_SIZE=268435456;" was executed on every connection after it was created. SQLite then uses "memory mapped i/o" for the first 256MB of the database file, which is supposed to improve read speed - apparently, it did not have an impact on write speed, but I wanted to try it anyways...

Cache 400MB: I raised the connection cache size in the connection string from "CacheSize=10000" (that's 10.000 pages with a page size of 4KB, i.e. about 40MB) to "CacheSize=100000" (i.e. 400MB). As you can see, no effect for write speed.

Shared Cache per URI: SQLite has a so called SharedCache feature. Usually every connection maintains its own cache. With SharedCache, all connections share the same cache - and additionally, the locking mechanism is different with shared cache as can be seen here. I expect this to improve read speed - but as you see, no impact on write speed. "per URI" means that funnily in System.Data.SQLite there is no separate connection string parameter for the shared cache. But SQLite permits you to switch shared cache on when you don't use a simple file path to specify your database file, but when you use an URI instead which then has the paramter "?cache=shared". So we have to circumvent the limitation of System.Data.SQLite by using such an URI as "file path" and at the same time switch on the shared cache.

AutoCheckpoint 10000: The rest of my tests mainly deal with the checkpoint operation as I explained it some posts above. Autocheckpoint 10000 means that I have raised the standard value of 1000 (i.e. an automatic checkpoint run in the committing thread as soon as the commit leads to a wal file of more than 4MB = 1000 pages) to 10000 pages (i.e. 40MB).

AutoCheckpoint 0: Do not run autocheckpoints at all - this alone resulted in a huge wal file and a very bad performance.

"Maunal" checkpoint(PASSIVE) in new thread every 1000 commits: There I switched off automatic checkpoints and implemented a checkpoint mechanism myself by executing the checkpoint operation in a separate thread every 1000 commits. For the difference between "checkpoint(PASSIVE)", "checkpoint(FULL)" and "checkpoint(RESTART)" see here.

"Maunal" checkpoint(PASSIVE) in new task every 1000 commits: Same as above, but I used a task instead of a thread.

"Maunal" checkpoint(FULL) in new task every 1000 commits: Same as above, but checkpoint(FULL) instead of checkpoint(PASSIVE)

"Maunal" checkpoint(RESTART) in new task every 1000 commits: Same as above, but checkpoint(RESTART) instead of checkpoint(FULL)

"Maunal" checkpoint(RESTART) in new task every 100 commits: Same as above, but checkpoint every 100 commits instead of every 1000.

"Maunal" checkpoint(RESTART) in new task every 100 commits / every commit in own task: Same as above, but additionally, the commit operation itself was conducted in a separate task. This was a hack because here the caller doesn't know, whether his commit has actually been committed. But the result is that commit more or less returns immediately. Interestingly, this did not lead to any material speed improvement. I'm still struggling with the interpretation. It could mean that committing is now not anymore the bottleneck?!?

"Maunal" checkpoint(PASSIVE) in new thread every 100 commits / every commit in own thread: Same as above, but (1) "checkpoint(PASSIVE)" instead of "checkpoint(RESTART)" and (2) using a thread instead of a task.

checkpoint(PASSIVE) in a separate thread every 60 seconds: In the last three tests there is a constantly running background thread automatically dealing with the checkpoint operation. In this case the thread sleeps for 60 seconds and then issues a passive checkpoint and sleeps again for 60 seconds etc. independently from the number of commits. This was meant in a way that finally we would need this thread to be started, when an import starts and the thread to be ended when the import ends because it doesn't make sense to conduct checkpoints when there are no write operations in the database.

checkpoint(PASSIVE) in a separate thread every 10 seconds: Same as above, but checkpoint every 10 seconds instead of every 60.

checkpoint(RESTART) in a separate thread every 10 seconds: Same as above, but checkpoint(RESTART) instead of checkpoint(PASSIVE)

So the overall conclusion of all these tests is that only "checkpoint(PASSIVE) in a separate thread every 10 seconds" gives us a slight performance improvement with respect to write speed. 23:43 instead of 24:39 - about 4,8% faster. Since the implementation of this is relatively complicated (we have to listen to importworker.start and importworker.finished messages, start and stop the import thread accordingly, we have to switch on autocheckpoint when the checkpoint-thread is stopped, switch off autocheckpoint when the checkpoint-thread starts, which has to be done for every connection in the pool, etc.) and the performance improvement is relatively small, I will not implement this for now. But I keep it in mind for the future.

Result is, I will just use the simple ConnectionPool version for further tests and now try to improve read speed.

MJGraf · October 3, 2013

And here is a full set of results - write and read - of the SQLiteDatabase v0.07 with an additional own CachePool, a comparison with a clean v0.07 and my attempt to analyze the results:
SQLiteDatabase (v0.07 with own ConnectionPool (System.Data.SQLite connection pool switched off in connection string)):
Import Test: 24:39
Read Test 1
Query Time: 376 ms, 294 ms (670 ms)
Read Time: 12930 ms, 11719 ms (24649 ms)
Read Test 2
Query Time: 243 ms, 188 ms (431 ms)
Read Time: 277 ms, 204 ms (481 ms)
Read Test 3
Query Time: 2 ms, 0 ms (2 ms) [Note: Second test run seems to be below the accuracy of our SQLDebug.Log entries]
Read Time: ~~15 ms, 3 ms (18 ms)~~ 15ms, 8ms (23ms) [see next post for reason]
Read Test 4
Query Time: 6 ms, 9 ms (15 ms)
Read Time: 100 ms, 94 ms (194 ms)

Comparison SQLiteDatabase v0.07 vs. SQLiteDatabase (v0.07 with own ConnectionPool):
Import Test: 128%
Read Test 1
Query Time: 99%
Read Time: 100%
Read Test 2
Query Time: 98%
Read Time: 101%
Read Test 3
Query Time: 150% [Read with care - may be below the accuracy of the log entries...]
Read Time: 170%
Read Test 4
Query Time: 227%
Read Time: 175%

Analysis:
The ImportTest, i.e. write speed is 28% faster than without the connection pool. This is most likely caused by the time we save in not opening and closing the connection for every transaction.
ReadTest 1 is more or less the same speed. I would suppose that the reason is that this search returns too much data to take advantage of the cache (40MB).
ReadTest 2 is also more or less the same speed. My guess is that most of the time this search takes is used for the query itself - not for reading the data (see QueryTime vs. ReadTime). The cache mostly helps with the read time - not with the query itself. But we should keep this in mind. The QueryTime here is very long compared to the QueryTime of other searches. It is nearly as long as the QueryTime of ReadTest 1, which in the end returns much more data. Maybe we can improve the QueryTime by analyzing and improving the respective SQL statements.
The real beauty of the connection pool can be seen in ReadTest 3 and ReadTest 4. In particular the ReadTime of the second run of ReadTest 3 improved enormously. I suspect this results from the cache feature really working for the first time. Remember, before, we closed the connection after every transaction, i.e. the cache was destroyed and more or less useless. Now we keep the connection open, i.e. the cache can fulfill its duty. Therefore I guess the second time reading the same data is A LOT faster in ReadTest 3 (8ms vs. 19ms, i.e. more than 2x as fast). It is also faster in ReadTest4, but not as much as in ReadTest 3. The reason may be that the amount of data in ReatTest 4 (33,6MB only for covers) is nearly as big as the cache itself (40MB). A bigger cache may help here - let's see...

MJGraf · October 3, 2013

Two updates:

I cannot reproduce anymore the 3ms ReadTime of the second run of ReadTest 3. I repeated this test more than 10 times and I always get 8ms instead of 3ms. Still a good improvement, but not as much as I thought before. I correct that in the test above.

More importantly: My suspicion re ReadTest 4 was correct. If I take 400MB cache size instead of 40MB, the ReadTimes of ReadTest 4 are 122ms (first run) and 47ms (second run). The first run is a little bit slower which I would consider as test inaccuracy. The second run is now excatly 2x as fast as before. I.e. the cache before (40MB) was too small for Read Test 4.

Now the important question is: How shall we determine the cache size? Shall we use a fixed size? Shall we calculate it based on the available RAM? My server has 8GB RAM and although I think MP2 Server can only use 3 or 4GB (since we have a 32Bit .Net application) there is enough RAM for a 400MB cache. But there may be people with much less RAM. Let's assume that for ReadTest 4 a cache size of 128MB or something like that would be enough to improve the speed. Shall we use this as standard fixed cache size? I don't expect people to have bigger covers than >30MB for one album / series / movie so if we have the maximum speed for Read Test 4, it should be enough for every use case. Or shall we take a standard size and make the cache size configurable? In the latter case I don't think this is something for the GUI as it is too much technical - but we can just store it in our config-XML-files and if there is someone who wants to dig so deep and optimize the speed of his system he can do so via the XML-files.

Any opinions on this?

Lehmden · October 3, 2013

Hi.
Perfect will be a cache size based on available RAM. So if I have lot's of RAM I can use a lot of it to speed up my system. If I have a little only, I need to save as much as possible..

Meier · October 3, 2013

MJGraf said:
Now the important question is: How shall we determine the cache size? Shall we use a fixed size? Shall we calculate it based on the available RAM? My server has 8GB RAM and although I think MP2 Server can only use 3 or 4GB (since we have a 32Bit .Net application) there is enough RAM for a 400MB cache. But there may be people with much less RAM. Let's assume that for ReadTest 4 a cache size of 128MB or something like that would be enough to improve the speed. Shall we use this as standard fixed cache size? I don't expect people to have bigger covers than >30MB for one album / series / movie so if we have the maximum speed for Read Test 4, it should be enough for every use case. Or shall we take a standard size and make the cache size configurable? In the latter case I don't think this is something for the GUI as it is too much technical - but we can just store it in our config-XML-files and if there is someone who wants to dig so deep and optimize the speed of his system he can do so via the XML-files.

Any opinions on this?

In my opinioin it should be configurable - but with some Default values - e.g.:
64MB - good for
128MB - good for...
256MB - good for...
and so on. This gives the user Maximum flexibility - but with hints what's a good choice for which situatuion.

MJGraf · October 3, 2013

Another update:

Even a cache of 64MB instead of 40MB helps to improve Read Test 4 considerably. I'm getting 102ms (first run) and 41ms (second run). So it really depends on how big the result of a search is - here with 33,6MB data to be read a cache size of 40MB is too small, but 64MB is absolutely fine.

In the end I think I will implement a standard value calculation based on the total RAM in the computer (maybe in steps of 32MB, 64MB, 128MB, max 256MB cache for total RAM sizes of <=512MB, <=1GB, <= 2GB, > 2GB) which are used if there is no setting in the XML-file. Additionally I will save this standard value as a setting which then can be manually tweaked via the XML-file to override the standard settings.
For now I will just continue with a cache size of 256MB since this would be the standard size for my server.

What we have to keep in mind is that there is another setting I still have to test: memory mapped i/o which may need a lot of RAM. What happens with this setting is that SQLite "maps" the content of the disk directly into RAM without having to copy it from Kernel RAM to "normal" RAM. The downside currently is that when you e.g. set the "mmap_size" so 256MB, it maps the first 256MB of the database file and if the database file is bigger, the rest is not mapped but read in the normal way. So here with my 1,7GB database file, I will hardly ever be able to map the whole file into RAM. But I don't know whether SQLite e.g. stores indices at the beginning of the database file or whether there is a possibility to rearrange the database file in a way to make this happen (e.g. by executing "VACUUM", which recreates the whole database file from scratch). So depending on the results with MMAP, we may need another considerable amount of RAM for this setting. Let's see...

MJGraf · October 3, 2013

And one more update:
MMAP doesn't help at all. I tried with 128MB mmap_size and 512MB mmap_size and the ReadTime for ReadTests 3 and 4 were about the same as without MMAP (< +/- 5%). I also tried a mmap_size of 2GB to get my whole database file mapped, but this was a nighmare. ReadTest 2 was 20 seconds instead of 200ms - probably there was no more RAM for anything else

So for now I let memory mapped i/o switched off completely

MJGraf · October 3, 2013

Now I tested "PRAGMA temp_store=2" which means that temporary tables and indices are created in memory instead of on disk. Result is that it has no influence on ReadTests 2, 3 and 4 - so no need to pursue this one any further...
Maybe our queries do not require SQLite to create any temporary tables or indices...

SQLiteDatabase Plugin for MP2 (5 Viewers)

chefkoch

Retired Team Member

MJGraf

Retired Team Member

MJGraf

Retired Team Member

MJGraf

Retired Team Member

MJGraf

Retired Team Member

Lehmden

Retired Team Member

Meier

Portal Pro

MJGraf

Retired Team Member

MJGraf

Retired Team Member

MJGraf

Retired Team Member

Users who are viewing this thread