Reply to thread

Message: <blockquote data-quote="MJGraf" data-source="post: 1030750" data-attributes="member: 17886">I implemented a connection pool which gave some good performance improvement - so far only tested with the ImportTest, i.e. for the write speed.Then I tried a lot of further settings and additional implementations which I thought could further improve the write speed - but they didn't. For documentation purposes here are first the test results and then some explanation what I tried. The results are so far only results from the ImportTest. I thought it is the easiest way to first try to improve write speed, then take the fastest version for the write speed and try to improve read speed.ConnectionPool==> 24:39[EDIT: As of here, the some of the values are not correct. I discovered later that when using "Shared Cache per URI" the page size value in the connection string is disregarded. Although I thought I was using a page size of 4KB, I actually used a page size of 1KB. Therefore the Cache (measured in number of pages in the connection string) was not 400MB but about 100MB]ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 10000==> 24:39ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0> 30 minutesConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0 / maunal checkpoint(PASSIVE) in new thread every 1000 commits==> 24:46ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0 / maunal checkpoint(PASSIVE) in new task every 1000 commits==> 25:44ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0 / maunal checkpoint(FULL) in new task every 1000 commits==>25:58ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0 / maunal checkpoint(RESTART) in new task every 1000 commits==> 26:47ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0 / maunal checkpoint(RESTART) in new task every 100 commits==> 25:03ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0 / maunal checkpoint(RESTART) in new task every 100 commits / every commit in own task==> 24:44ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0 / maunal checkpoint(PASSIVE) in new thread every 100 commits / every commit in own thread> 30 minutesConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0 / checkpoint(PASSIVE) in a separate thread every 60 seconds==> 27:10ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0 / checkpoint(PASSIVE) in a separate thread every 10 seconds==> 23:43ConnectionPool / MMAP 256MB / Cache 400MB / Shared Cache per URI / AutoCheckpoint 0 / checkpoint(RESTART) in a separate thread every 10 seconds==> 24:06ConnectionPool: That's a simple implementation of a connection pool using .NET's BlockingCollection backed by a ConcurrentBag. When you call BeginTransaction on the Database it does not always create a new connection to the database. It first tries to get an already opened connection from the BlockingCollection. If there is a connection available, it takes this one from the BlockingCollection and uses it to begin the transaction. If there is no connection available in the BlockingCollection there are two possibilities: Either we have in total less connections then specified as MaxConnections in the connection pool - then it creates a new connection and uses the new one. When there are already MaxConnections in use, the calling thread is blocked until another thread returns its connection which is then used. In the latter case there is a ConnectionTimeout which indicates how long to wait before throwing an exception. When the transaction is disposed, the underlying connection is returned to the pool without closing it. The standard value for MaxConnections I used was 10, the ConnectionTimeout was 30 seconds.The interesting thing is that there were never more than 3 connections in use - and this only when I was using an MP2 Client while the import was running. So I suspect that the import itself does not use more than 1 or 2 connections. For me this means that we could improve the import speed, if we used more parallel threads / tasks in the ImportRunner itself. But this has nothing to do with the database so I'll not investigate this further for now.MMAP 256MB: A "PRAGMA MMAP_SIZE=268435456;" was executed on every connection after it was created. SQLite then uses "memory mapped i/o" for the first 256MB of the database file, which is supposed to improve read speed - apparently, it did not have an impact on write speed, but I wanted to try it anyways...Cache 400MB: I raised the connection cache size in the connection string from "CacheSize=10000" (that's 10.000 pages with a page size of 4KB, i.e. about 40MB) to "CacheSize=100000" (i.e. 400MB). As you can see, no effect for write speed.Shared Cache per URI: SQLite has a so called SharedCache feature. Usually every connection maintains its own cache. With SharedCache, all connections share the same cache - and additionally, the locking mechanism is different with shared cache as can be seen <a href="http://www.sqlite.org/sharedcache.html" target="_blank">here</a>. I expect this to improve read speed - but as you see, no impact on write speed. "per URI" means that funnily in System.Data.SQLite there is no separate connection string parameter for the shared cache. But SQLite permits you to switch shared cache on when you don't use a simple file path to specify your database file, but when you use an URI instead which then has the paramter "?cache=shared". So we have to circumvent the limitation of System.Data.SQLite by using such an URI as "file path" and at the same time switch on the shared cache.AutoCheckpoint 10000: The rest of my tests mainly deal with the checkpoint operation as I explained it some posts above. Autocheckpoint 10000 means that I have raised the standard value of 1000 (i.e. an automatic checkpoint run in the committing thread as soon as the commit leads to a wal file of more than 4MB = 1000 pages) to 10000 pages (i.e. 40MB).AutoCheckpoint 0: Do not run autocheckpoints at all - this alone resulted in a huge wal file and a very bad performance."Maunal" checkpoint(PASSIVE) in new thread every 1000 commits: There I switched off automatic checkpoints and implemented a checkpoint mechanism myself by executing the checkpoint operation in a separate thread every 1000 commits. For the difference between "checkpoint(PASSIVE)", "checkpoint(FULL)" and "checkpoint(RESTART)" see <a href="http://sqlite.org/c3ref/wal_checkpoint_v2.html" target="_blank">here</a>."Maunal" checkpoint(PASSIVE) in new task every 1000 commits: Same as above, but I used a task instead of a thread."Maunal" checkpoint(FULL) in new task every 1000 commits: Same as above, but checkpoint(FULL) instead of checkpoint(PASSIVE)"Maunal" checkpoint(RESTART) in new task every 1000 commits: Same as above, but checkpoint(RESTART) instead of checkpoint(FULL)"Maunal" checkpoint(RESTART) in new task every 100 commits: Same as above, but checkpoint every 100 commits instead of every 1000."Maunal" checkpoint(RESTART) in new task every 100 commits / every commit in own task: Same as above, but additionally, the commit operation itself was conducted in a separate task. This was a hack because here the caller doesn't know, whether his commit has actually been committed. But the result is that commit more or less returns immediately. Interestingly, this did not lead to any material speed improvement. I'm still struggling with the interpretation. It could mean that committing is now not anymore the bottleneck?!?"Maunal" checkpoint(PASSIVE) in new thread every 100 commits / every commit in own thread: Same as above, but (1) "checkpoint(PASSIVE)" instead of "checkpoint(RESTART)" and (2) using a thread instead of a task.checkpoint(PASSIVE) in a separate thread every 60 seconds: In the last three tests there is a constantly running background thread automatically dealing with the checkpoint operation. In this case the thread sleeps for 60 seconds and then issues a passive checkpoint and sleeps again for 60 seconds etc. independently from the number of commits. This was meant in a way that finally we would need this thread to be started, when an import starts and the thread to be ended when the import ends because it doesn't make sense to conduct checkpoints when there are no write operations in the database.checkpoint(PASSIVE) in a separate thread every 10 seconds: Same as above, but checkpoint every 10 seconds instead of every 60.checkpoint(RESTART) in a separate thread every 10 seconds: Same as above, but checkpoint(RESTART) instead of checkpoint(PASSIVE)So the overall conclusion of all these tests is that only "checkpoint(PASSIVE) in a separate thread every 10 seconds" gives us a slight performance improvement with respect to write speed. 23:43 instead of 24:39 - about 4,8% faster. Since the implementation of this is relatively complicated (we have to listen to importworker.start and importworker.finished messages, start and stop the import thread accordingly, we have to switch on autocheckpoint when the checkpoint-thread is stopped, switch off autocheckpoint when the checkpoint-thread starts, which has to be done for every connection in the pool, etc.) and the performance improvement is relatively small, I will not implement this for now. But I keep it in mind for the future.Result is, I will just use the simple ConnectionPool version for further tests and now try to improve read speed.</blockquote>

Verification