This is Part 2 of the IKVS series, “Implementing a Key-Value Store”. You can also check the Table of Contents for other parts.
In this article, I will start by explaining why I think it is important to use models for this project and not start completely from scratch. I will describe a set of criteria for selecting key-value store models. Finally, I will go over some well-known key-value store projects, and select a few of them as models using the presented criteria. This article will cover:
1. Not reinventing the wheel
2. Model candidates and selection criteria
3. Overview of the selected key-value stores
1. Not reinventing the wheel
Key-value stores have been out there for at least a good 30 years [1]. One of the most memorable projects is DBM, the initial database manager coded by Kenneth Thompson for Unix version 7 and released in 1979 [2]. Engineers have faced issues related to these database systems, and have either adopted or rejected various design and data structure ideas. They have learned through experiments with real-world problems. It would be foolish to not consider their work and start from scratch, only to repeat errors they have done before.
Gall’s law, from John Gall’s Systemantics: [3]
A complex system that works is invariably found to have evolved from a simple system that worked. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work. You have to start over, beginning with a working simple system.
This quote brings in two fundamental ideas for the development of my key-value store project:
1. Use models. I need to identify key-value stores that have been out there for a while, and even better, which are successors of previously successful key-value stores. This would be the proof that their design is solid, and has been refined over time on multiple iterations. These selected key-value stores shall then be used as models for the project I am currently working on.
2. Start small. The first version of this project should be small and simple, so its design can be easily tested and approved. Improvements and additional features should only come in later versions, if needed.
2. Model candidates and selection criteria
After doing a bit of research around key-value stores and NoSQL databases, I have decided to consider the following ones for a more thorough selection:
- DBM
- Berkeley DB
- Kyoto Cabinet
- Memcached and MemcacheDB
- LevelDB
- MongoDB
- Redis
- OpenLDAP
- SQLite
My criteria are the following:
- I want to create a key-value store using Object-Oriented Programming, so for the design, I will have to draw inspiration from projects coded in an OOP language.
- For the underlying data structure, I want to have an on-disk hash table, so I will need to select projects that offer a way to read and write information to disk.
- I also want to have a network access to the data store.
- I do not need a query engine or ways to access the data in a structured way.
- I do not need to support the full ACID paradigm.
- Given that I am pursuing this project by myself, I want to take for model projects that have been implemented by small teams, ideally one or two persons.
3. Overview of the selected key-value stores
The three big winners are Berkeley DB, Kyoto Cabinet and LevelDB. Berkeley DB and Kyoto Cabinet share a common history as successors to DBM. In addition, Berkeley DB and Kyoto Cabinet are not “first versions”, they are the N-th versions of their authors. This means that they are more solid compared to projects done by people who would be implementing a key-value store for the first time. LevelDB is more recent and is based on LSM Tree data structure, which is of no use as a model for a hash table. However, the code itself is one of the cleanest I have seen ever. All three projects have been developed by one or two persons. Below are more details about each of them.
Berkeley DB
The development of Berkeley DB started in 1986, which means that at the moment I am writing this article, it has been out there for 26 years. Berkeley DB was developed as a successor to DBM, and implements a hash table. The first version was coded by Margo Seltzer [22] and Ozan Yigit [23], while they were at The University of California, Berkeley. The project was later acquired by Oracle, which continued to develop it.
Berkeley DB was originally implemented in C, and continues to be a C-only nowadays. The development went through an incremental process, which added features with each major version. From a simple key-value store, Berkeley DB went on managing access concurrency, transactions and recovery, and replication [4]. Berkeley DB has been used extensively with hundreds of millions of deployed copies [5], which is the proof that its architecture can be trusted as extremely solid. More information about its design can be found in the Introduction of the “Berkeley DB Programmer’s Reference Guide” [6], and the entry of “The Architecture of Open Source Applications, Volume 1” [5].
Kyoto Cabinet
Kyoto Cabinet was introduced in 2009 by Mikio Hirabayashi [24]. It is still in active development at the moment. Kyoto Cabinet is a successor to other key-value stores by the same author, which are Tokyo Cabinet (released in 2007), and QDBM (released in 2003, started in 2000). QDBM was intended as a successor to DBM with higher performances [7]. Kyoto Cabinet is particularly interesting because there is a clear lineage to DBM, and because its author has been working on key-value stores for over 12 years now. After going through the implementation of three key-value stores during so many years, there is no doubt that the author has acquired a solid understanding of the structural needs, along with a strong sense of the causes of performance bottlenecks.
Kyoto Cabinet was implemented in C++, and implements a hash table, a B+ Tree, and other more esoteric data structures. It also offers outstanding performance [16]. Nevertheless, there seems to be some performance problems due to the initial parameters. Indeed, many people have reported that performance is good as long as the number of items remains below a certain threshold which is proportional to the bucket array size, as defined by the parameters at the creation of a database file. Passed that threshold, performance seems to decrease dramatically [18] [19]. Similar problems are present with Tokyo Cabinet [20] [21]. This means that if the requirements of a project are changing while the database is being used, you could run into serious troubles. And we all know how much change is constant in software.
LevelDB
LevelDB is a key-value store developed by Google fellows Jeffrey Dean [8] and Sanjay Ghemawat [9], who worked on Google’s mythical infrastructure projects, MapReduce and BigTable. Given the experience on large-scale problems Dean and Ghemawat have had while working at Google, there are good chances that they know what they are doing. An interesting difference from most key-value store projects is that LevelDB is not using a hash table or a B-tree as underlying data structure, but is based on a Log-Structured Merge Tree [12]. LSM structures are allegedly optimized for SSD drives [13]. A ton of information about LevelDB can be found on the blog High Scalability blog [17].
Released in 2011, LevelDB was implemented in C++, and was designed to be useful as a building block for higher-level storage systems [10]. The implementation of the IndexedDB HTML5 API in future versions of Chrome will be using LevelDB [10] [11]. The performance is mind blowing under a certain workload, as shows the benchmark provided by the authors [14]. However, another benchmark on commodity SSDs made by Andy Twigg at Acunu has shown that if the number of items is increased over 1e6 (one million), and going towards 1e9 entries (one billion), then performance drops dramatically [15]. So it seems that LevelDB might not be the best choice for heavy load or large databases as often required by serious back-end projects.
But this does not matter really, as to me, the best part of LevelDB does not lay in its performance but in its architecture. Looking at the source code and the way things are organized, it is just pure beauty. Everything is clear, simple, and logical. Having access to LevelDB’s source code and using it as a model is an amazing opportunity to create great code.
What about the non-selected key-value stores?
The fact that I did not select the other key-value stores does not mean that I will just discard them fully. I will keep them in mind and might use some elements of their architectures sporadically. However, the current project will not be influenced by those key-value stores as much as it will be by the ones that have been selected.
Join my email list
Translations
This article was translated to Simplified Chinese by Xiong Duo.
References
[1] http://blog.knuthaugen.no/2010/03/a-brief-history-of-nosql.html
[2] http://en.wikipedia.org/wiki/Dbm
[3] http://en.wikipedia.org/wiki/Systemantics
[4] http://en.wikipedia.org/wiki/Berkeley_DB#Origin
[5] http://www.aosabook.org/en/bdb.html
[6] http://docs.oracle.com/cd/E17076_02/html/programmer_reference/intro.html
[7] http://fallabs.com/qdbm/
[8] http://research.google.com/people/jeff/
[9] http://research.google.com/pubs/SanjayGhemawat.html
[10] http://google-opensource.blogspot.com/2011/07/leveldb-fast-persistent-key-value-store.html
[11] http://www.w3.org/TR/IndexedDB/
[12] http://www.igvita.com/2012/02/06/sstable-and-log-structured-storage-leveldb/
[13] http://www.acunu.com/2/post/2011/04/log-file-systems-and-ssds-made-for-each-other.html
[14] http://leveldb.googlecode.com/svn/trunk/doc/benchmark.html
[15] http://www.acunu.com/2/post/2011/08/benchmarking-leveldb.html
[16] http://blog.creapptives.com/post/8330476086/leveldb-vs-kyoto-cabinet-my-findings
[17] http://highscalability.com/blog/2011/8/10/leveldb-fast-and-lightweight-keyvalue-database-from-the-auth.html
[18] http://stackoverflow.com/questions/13054852/kyoto-cabinet-berkeley-db-hash-table-size-limitations
[19] https://groups.google.com/forum/#!topic/tokyocabinet-users/Bzp4fLbmcDw/discussion
[20] http://stackoverflow.com/questions/1051847/why-does-tokyo-tyrant-slow-down-exponentially-even-after-adjusting-bnum
[21] https://groups.google.com/forum/#!topic/tokyocabinet-users/1E06DFQM8mI/discussion
[22] http://www.eecs.harvard.edu/margo/
[23] http://www.cse.yorku.ca/~oz/
[24] http://fallabs.com/mikio/profile.html
nice write ups. I wanted to hear more about comparison of these KV stores. btw a minor correction needed. SDD -> SSD.
thanks buddy.
Typo fixed, thanks.
Hi Emmanuel,
Nice read.
I am myself planning to start up a key value store in C++, for exactly the same reason you started. Mostly using the Boost C++ infrastructure. I will be referring your articles more often.
Nice article, with some very good (and often overlooked) advice in the beginning – research what’s out there, and leverage the mistakes of the past, when implementing something new (to you).