Skip to content

Tag: hopscotch

Implementing a Key-Value Store – Part 6: Open-Addressing Hash Tables

This is Part 6 of the IKVS series, “Implementing a Key-Value Store”. You can also check the Table of Contents for other parts.

In this article, I will compare several open-addressing hash tables: Linear Probing, Hopscotch hashing, and Robin Hood hashing. I have already done some work on this topic, and in this article I want to gather data for more metrics in order to decide which hash table I will use for my key-value store.

The result section also contains an interesting observation about the maximum DIB for Robin Hood hashing, which originated from Kristofer Karlsson, a software engineer at Spotify and the author of the key-value store Sparkey.

This article will cover:

1. Open-addressing hash tables
2. Metrics
3. Experimental Protocol
4. Results and Discussion
5. Conclusion
6. References

Hopscotch hashing

I am currently experimenting with various hash table algorithms, and I stumbled upon an approach called hopscotch hashing. Hopscotch hashing is a reordering scheme that can be used with the open addressing method for collision resolution in hash tables. When using open addressing with only a probing sequence and no reordering, entries are inserted in the first empty buckets found in the sequence. With a reordering scheme, entries already in the table can be moved as new entries are inserted. Hopscotch hashing is interesting because it guarantees a small number of look-ups to find entries. In addition, those look-ups are in contiguous memory areas, which is very cache friendly.

This article makes minor contributions to the original publication. First, a clear explanation of the insertion process is being given, which is independent from the representation of neighborhoods. The original paper was using the bitmap representation to present the algorithm, and I believe that things are simpler without it. Second, a new neighborhood representation is introduced, it is called the “shadow” representation. It derives the relationship between neighborhoods and initial buckets from the hashed keys, and does not require any extra memory. Finally, a C++ implementation is provided, and experimentation with this implementation allows to shed light on some of the limitations of hopscotch hashing that were not expressed in the original paper.