Skip to content

Code Capsule Posts

Coding for SSDs – Part 3: Pages, Blocks, and the Flash Translation Layer

This is Part 3 over 6 of “Coding for SSDs”, covering Sections 3 and 4. For other parts and sections, you can refer to the Table to Contents. This is a series of articles that I wrote to share what I learned while documenting myself on SSDs, and on how to make code perform well on SSDs. If you’re in a rush, you can also go directly to Part 6, which is summarizing the content from all the other parts.

In this part, I am explaining how writes are handled at the page and block level, and I talk about the fundamental concepts of write amplification and wear leveling. Moreover, I describe what is a Flash Translation Layer (FTL), and I cover its two main purposes, logical block mapping and garbage collection. More particularly, I explain how write operations work in the context of a hybrid log-block mapping.

Translations: This article was translated to Simplified Chinese by Xiong Duo and to Korean by Matt Lee (이 성욱).

ssd-presentation-03

Coding for SSDs – Part 2: Architecture of an SSD and Benchmarking

This is Part 2 over 6 of “Coding for SSDs”, covering Sections 1 and 2. For other parts and sections, you can refer to the Table to Contents. This is a series of articles that I wrote to share what I learned while documenting myself on SSDs, and on how to make code perform well on SSDs. If you’re in a rush, you can also go directly to Part 6, which is summarizing the content from all the other parts.

In this part, I am explaining the basics of NAND-flash memory, cell types, and basic SSD internal architecture. I am also covering SSD benchmarking and how to interpret those benchmarks.

Translations: This article was translated to Simplified Chinese by Xiong Duo and to Korean by Matt Lee (이 성욱).

ssd-presentation-02

Coding for SSDs – Part 1: Introduction and Table of Contents

Translations: This article was translated to Simplified Chinese by Xiong Duo and to Korean by Matt Lee (이 성욱). Introduction I want to make solid-state drives (SSDs) the optimal storage solution for my key-value store project. For that reason, I had to make sure I fully understood how SSDs work,…

Robin Hood hashing: backward shift deletion

In my article about Robin Hood hashing [1], I had reached the conclusion that the Robin Hood hashing algorithm was performing poorly after deletions had occurred in the hash table, and I was quite disappointed with my results. In the research I had done, I had stumbled upon an article by Paul Khuong who had done some work on Robin Hood hashing [2], and I decided to email him to ask what he thought of my results. Paul got back to me, and after looking at my code on Github, he suggested that the performance could be greatly improved with only a little change to the removal method. He even submitted a patch to point out what the change would be, and he ran this modified version himself on the test cases I had included in the source code. The results he got were looking really good, and therefore I had no other choice but to write a follow-up article to present the change to the code and show those results.

In this article, I am presenting the backward shift deletion variant for the Robin Hood hashing algorithm. I start by explaining what backward shift deletion is and how to implement it, then I use it on the same test cases that I had run in my previous article. Finally, I discuss the new results.

robin-hood-hashing-backward-shift-more

Robin Hood hashing

As I am still doing research on open addressing hash table algorithms, I tested an approach called Robin Hood hashing. Just like for other reordering schemes, Robin Hood hashing will move entries that are already stored in the hash table as new items are inserted. In the original Robin Hood paper, it is claimed that Robin Hood hashing is very efficient due to the fact that the probe sequence length has a small and almost constant variance.

In this article, I am presenting how Robin Hood hashing handles insertion, deletion, and search, including a graphical representation for the insertion operation. I then use my own C++ implementation of Robin Hood hashing to see how the algorithm behaves under various load conditions and try to determine how efficient it really is. Finally, I discuss some of the drawbacks of Robin Hood hashing, and its applicability for the implementation of an on-disk key-value store.

robin-hood-hashing-more-web

Hopscotch hashing

I am currently experimenting with various hash table algorithms, and I stumbled upon an approach called hopscotch hashing. Hopscotch hashing is a reordering scheme that can be used with the open addressing method for collision resolution in hash tables. When using open addressing with only a probing sequence and no reordering, entries are inserted in the first empty buckets found in the sequence. With a reordering scheme, entries already in the table can be moved as new entries are inserted. Hopscotch hashing is interesting because it guarantees a small number of look-ups to find entries. In addition, those look-ups are in contiguous memory areas, which is very cache friendly.

This article makes minor contributions to the original publication. First, a clear explanation of the insertion process is being given, which is independent from the representation of neighborhoods. The original paper was using the bitmap representation to present the algorithm, and I believe that things are simpler without it. Second, a new neighborhood representation is introduced, it is called the “shadow” representation. It derives the relationship between neighborhoods and initial buckets from the hashed keys, and does not require any extra memory. Finally, a C++ implementation is provided, and experimentation with this implementation allows to shed light on some of the limitations of hopscotch hashing that were not expressed in the original paper.

hopscotch-hashing_more_web

Cuckoo Hashing

As part of my work on my key-value store project, I am currently researching hashing methods with the goal to find one that would fit the performance constraints of on-disk storage. In this article, I am making a quick review of cuckoo hashing, a method to resolve collisions in hash tables. This article is not part of the IKVS series as it is not specific to key-value stores.

cuckoo_preview

Implementing a Key-Value Store – Part 5: Hash table implementations

This is Part 5 of the IKVS series, “Implementing a Key-Value Store”. You can also check the Table of Contents for other parts.

In this article, I will study the actual implementations of hash tables in C++ to understand where are the bottlenecks. Hash functions are CPU-intensive and should be optimized for that. However, most of the inner mechanisms of hash tables are just about efficient memory and I/O access, which will be the main focus of this article. I will study three different hash table implementations in C++, both in-memory and on-disk, and take a look at how the data are organized and accessed. This article will cover:

1. Hash tables
    1.1 Quick introduction to hash tables
    1.2 Hash functions
2. Implementations
    2.1 unordered_map from TR1
    2.2 dense_hash_map from SparseHash
    2.3 HashDB from Kyoto Cabinet
3. Conclusion
4. References

Implementing a Key-Value Store – Part 4: API Design

This is Part 4 of the IKVS series, “Implementing a Key-Value Store”. You can also check the Table of Contents for other parts.

I finally settled on a name for this whole key-value store project, which from now on will be referred as FelixDB KingDB.

In this article, I will take a look at the APIs of four key-value stores and database systems: LevelDB, Kyoto Cabinet, BerkekeyDB and SQLite3. For each major functionality in their APIs, I will compare the naming conventions and method prototypes, to balance the pros and cons and design the API for the key-value store I am currently developing, KingDB. This article will cover:

1. General principles for API design
2. Defining the functionalities for the public API of KingDB
3. Comparing the APIs of existing databases
    3.1 Opening and closing a database
    3.2 Reads and Writes
    3.3 Iteration
    3.4 Parametrization
    3.5 Error management
4. Conclusion
5. References