I am pleased to announce that I am releasing the very first version of KingDB, the fast persisted key-value store. KingDB is a side-project that I have been hacking on intermittently over the last couple of years. It has taken a lot of my personal time, therefore I am very happy to finally have reached that moment.
Go to http://kingdb.org to find the source code, documentation and benchmarks.
KingDB is interesting for many reasons:
- Fast for heavy write workloads and random reads.
- The architecture, code, and data format are simple.
- Multipart API to read and write large entries in smaller parts.
- Multiple threads can access the same database safely.
- Crash-proof: nothing ever gets overwritten.
- Iterators and read-only consistent snapshots.
- Compaction happens in a background thread, and does not block reads or writes.
- The data format allows hot backups to be made.
- Covered by unit tests.
Version 0.9.0 is still alpha code, therefore even if KingDB has many unit tests that ensure the stability of its core components, make sure you run tests in your own environment before using KingDB in production. New features and optimizations will come along the way.
Over the coming weeks I will publish the last articles for the IKVS series, which will cover the architecture and data format of KingDB.
Great work!
At the first glance, I noticed you do not use external libraries. Do you have any plan to migrate to something more familiar to the others? Maybe Boost for the fundamentals and log4cplus, cryptopp for etc.
Thanks! As you have probably already seen, I have included external implementations of CRC32C, xxHash and LZ4, because they are complex and they solve very specific problems, so it made perfect sense to just reuse foolproof code rather than try to reimplement my own versions. I will add external libraries only when it is justified, that is if the component is very complex and/or if there is a clear gain for adding it.
It always seems like a great idea to reuse libraries that already implement a functionality that you want, but most of the time, you only need one feature out of the 20 that the library offers, and you end up having to include the entire thing just for that. Using external libraries sometimes feels too much like wearing someone else’s underwear. This is why in KingDB, for every component that is simple enough I’d rather limit dependencies as much as possible and have my own small implementation of it, finely tuned to solve exactly the problem that I have and nothing more. It’s obviously a tradeoff that needs to be assessed on a case-by-case basis, but that’s the idea. And finally, “familiar” is tricky, because what is familiar to someone will always be foreign to someone else.
Emmanuel, thank you a lot for your great blog.
One of the major restrictions of LevelDB is that it can not be used in a multiprocess environment. Is it possible to open KingDB from several processes at least in read-only mode?
Hello Pavel!
When KingDB v0.9.0 opens a database, it acquires a lock from the operating system, therefore only one process can access a database at any given time. But you can go around that, and here are two options:
1. The easiest is to use KingServer and have your processes on the same machine connect to it. That will create some overhead in the kernel, but because it’s in the same machine, at least you won’t have any network latency issues. The best you can do is try it, do some benchmarking, and see if it’s fast enough for your use case. You can find more in the documentation for KingServer [1].
2. If you want to avoid using KingServer and you can guarantee that once your data has been written, no process will write to it and all processes are read-only, then you can comment out the locking from the KingDB source code, on lines 106-110 in the file “interface/database.h” [2]. The drawback is that every process that opens the database will create its own index, therefore you will have data duplication in the RAM, but if your dataset is not too large it’s maybe something you can live with. That is only for the index though, the data in the HSTable files will be shared among all processes, so no overhead there.
I hope this helps!
[1] https://github.com/goossaert/kingdb/blob/master/doc/kingserver.md
[2] https://github.com/goossaert/kingdb/blob/58994280e789fc7248d61371f03a6c04c844c197/interface/database.h#L106
Thanks for reply, Emmanuel
Unfortunately neither 1st or 2nd option are workable on iOS. I can not use the 1st one because there is no way to launch daemons in iOS. The 2nd option is not useful because it should be possible for both application and extension processes update metadata’s cache at the same time.
Indeed, that won’t work on iOS. Have you tried LMDB already? It supports multi-process concurrency, hopefully that can solve you problem: http://symas.com/mdb/
OMG… LMDB even supports readonly snapshots while reading from a cursor within a single transaction. I’ll try it ASAP.