Skip to content

Code Capsule Posts

Implementing a Key-Value Store – Part 9: Data Format and Memory Management in KingDB

This is Part 9 of the IKVS series, “Implementing a Key-Value Store”. You can also check the Table of Contents for other parts. In this series of articles, I describe the research and process through which I am implementing a key-value database, which I have named “KingDB”. The source code is available at http://kingdb.org. Please note that you do not need to read the previous parts to be able to follow. The previous parts were mostly exploratory, and starting with Part 8 is perfectly fine.

In this article, I explain how the storage engine of KingDB works, including details about the data format. I also cover how memory management is done through the use of a compaction process.

Implementing a Key-Value Store – Part 8: Architecture of KingDB

This is Part 8 of the IKVS series, “Implementing a Key-Value Store”. You can also check the Table of Contents for other parts. In this series of articles, I describe the research and process through which I am implementing a key-value database, which I have named “KingDB”. The source code is available at http://kingdb.org. Please note that you do not need to read the previous parts to be able to follow. The previous parts were mostly exploratory, and starting with Part 8 is perfectly fine.

In the previous articles, I have laid out the research and discussion around what needs to be considered when implementing a new key-value store. In this article, I will present the architecture of KingDB, the key-value store of this article series that I have finally finished implementing.

An afternoon in 1983 with Hector

ad-small

Back in 1998, I was 13 years old and had no money to buy a computer. But that wouldn’t stop me. I didn’t care if I had a brand new computer or a used one, all I wanted was a computer. From the conversations of adults around me, I heard it was common for businesses to renew their hardware and throw their old computers into the trash. So I figured, all I had to do was to be in the right trash at the right time, or even better, make the trash come to me. In these years, there were tons of computer paper magazines, so I decided to send a letter to one of them, of which I have now forgotten the name, to publish an ad in their classified section: “13 year-old student in the Paris area, will come to your home or office to get any computers you are about to throw in the bin”. From this classified, I got a 486DX-80 PC, a printer, and a Hector HR2+ computer.

The 486DX-80 and the printer have served me well, and are long gone. Last summer, while visiting my family in France I decided to take a look into the attic. As I was making my way through rusty nails and spider webs, I noticed a bag covered with dust in a dark corner of the room. I had the feeling that I was about to make a very good discovery, and I was not disappointed. I opened the bag with excitement, and there it was, the almighty Hector 2HR+ computer!

In the bag came all the booklets, cables, and cassettes, so I decided that I would spend the afternoon writing code on that machine, and that I would run this code before nightfall. But before I go any further on that, a bit of history about the Hector 2HR+…

Implementing a Key-Value Store – Part 6: Open-Addressing Hash Tables

This is Part 6 of the IKVS series, “Implementing a Key-Value Store”. You can also check the Table of Contents for other parts.

In this article, I will compare several open-addressing hash tables: Linear Probing, Hopscotch hashing, and Robin Hood hashing. I have already done some work on this topic, and in this article I want to gather data for more metrics in order to decide which hash table I will use for my key-value store.

The result section also contains an interesting observation about the maximum DIB for Robin Hood hashing, which originated from Kristofer Karlsson, a software engineer at Spotify and the author of the key-value store Sparkey.

This article will cover:

1. Open-addressing hash tables
2. Metrics
3. Experimental Protocol
4. Results and Discussion
5. Conclusion
6. References

Coding for SSDs – Part 6: A Summary – What every programmer should know about solid-state drives

This is Part 6 over 6 of “Coding for SSDs”. For other parts and sections, you can refer to the Table to Contents. This is a series of articles that I wrote to share what I learned while documenting myself on SSDs, and on how to make code perform well…

Coding for SSDs – Part 5: Access Patterns and System Optimizations

This is Part 5 over 6 of “Coding for SSDs”, covering Sections 7 and 8. For other parts and sections, you can refer to the Table to Contents. This is a series of articles that I wrote to share what I learned while documenting myself on SSDs, and on how to make code perform well on SSDs. If you’re in a rush, you can also go directly to Part 6, which is summarizing the content from all the other parts.

Now that I have covered most of the inner workings of solid-state drives in the previous sections, I can provide data that will help build an understanding of which access patterns should be used and why they are indeed better than others. In this part, I explain how writes should be done, how reads should be done, and why concurrent read and write operations are interfering. I also cover a few optimizations at the level of the filesystem which can improve performance.

Translations: This article was translated to Simplified Chinese by Xiong Duo and to Korean by Matt Lee (이 성욱).

ssd-presentation-05

Coding for SSDs – Part 4: Advanced Functionalities and Internal Parallelism

This is Part 4 over 6 of “Coding for SSDs”, covering Sections 5 and 6. For other parts and sections, you can refer to the Table to Contents. This is a series of articles that I wrote to share what I learned while documenting myself on SSDs, and on how to make code perform well on SSDs. If you’re in a rush, you can also go directly to Part 6, which is summarizing the content from all the other parts.

In this part, I cover briefly some of the main SSD functionalities such as TRIM and over-provisioning. I am also presenting the different levels of internal parallelism in an SSD, and the concept of clustered block.

Translations: This article was translated to Simplified Chinese by Xiong Duo and to Korean by Matt Lee (이 성욱).

ssd-presentation-04