In my article about Robin Hood hashing [1], I had reached the conclusion that the Robin Hood hashing algorithm was performing poorly after deletions had occurred in the hash table, and I was quite disappointed with my results. In the research I had done, I had stumbled upon an article by Paul Khuong who had done some work on Robin Hood hashing [2], and I decided to email him to ask what he thought of my results. Paul got back to me, and after looking at my code on Github, he suggested that the performance could be greatly improved with only a little change to the removal method. He even submitted a patch to point out what the change would be, and he ran this modified version himself on the test cases I had included in the source code. The results he got were looking really good, and therefore I had no other choice but to write a follow-up article to present the change to the code and show those results.

In this article, I am presenting the backward shift deletion variant for the Robin Hood hashing algorithm. I start by explaining what backward shift deletion is and how to implement it, then I use it on the same test cases that I had run in my previous article. Finally, I discuss the new results.

## 1. Backward shift deletion

The idea to improve the performance of Robin Hood hashing is simple: backward shift deletion. Instead of replacing entries with “tombstone” entries on deletion like I was doing it in my first implementation [1], the idea is to shift backward all the entries following the entry to delete until either an empty bucket, or a bucket with a DIB of 0 (distance to initial bucket). By doing this, every deletion will shift backwards entries and therefore decrease their respective DIBs by 1 (all their initial DIB values would be >= 1). An intuitive way to understand the backward shift is to think that by shifting backward the entries, the table is left as if the deleted entry had never been inserted. This is why even after a large number of deletions, the mean DIB and the variance of the DIB remain constant and low. The backward shift deletion is illustrated in Figure 1 below:

Figure 1: Backward shift deletion in Robin Hood hashing

## 2. Source code

The source code is available on Github [3]. The code includes a Makefile which will make compilation easy. Calling the main program with the –help parameter will give you a detailed description of the available options, with examples at the end describing how to test a hash table using Robin Hood hashing with backward shift deletion.

## Subscribe to my newsletter

## 3. Experimental protocol

In order to test the effect of backward shift deletion on performance, I am going to use the same test cases that I used in my previous article about Robin Hood hashing [1]. The only difference is that since I observed that there was no difference between the tables of size 10k and 100k, this time I am only plotting the results for tables of size 10k.

For more details regarding the test cases, take a look that the “Experiment protocol” section in the my previous article [1].

## 4. Results

The results are presented in the four figures below, each figure showing the same statistic across the different test cases:

- Figure 2 is the mean DIB
- Figure 3 is the median of DIB
- Figure 4 is the 95th percentile of DIB
- Figure 5 is the variance of DIB

Each of these figures is holding sub-figures, each for a different test case:

- (a) is the “batch” test case, with LFM=0.8 and LFR=0.1
- (b) is the “batch” test case, with LFM=0.8 and LFR=0.8
- (c) is the “ripple” test case, with LFM=0.8 and LFR=0.1
- (d) is the “loading” test case

The mean DIBs are adjusted for graphs of the the Robin Hood with tombstones, in such a way that the minimum DIB is shifted to zero, and this in order to make a fair comparison with the other algorithms. Indeed, because the implementation of Robin Hood hashing with tombstones is considering only probes between the minimum and maximum DIBs, the probing of an item never starts at DIB 0 but at the minimum DIB. The graphs for Robin Hood hashing with backward shift deletion and the graphs for basic linear probing are not shifted down.

Figure 2: Mean DIB averaged over 10 instances with random keys

Figure 3: Median of DIB averaged over 10 instances with random keys

Figure 4: 95th percentile of DIB averaged over 10 instances with random keys

Figure 5: Variance of DIB averaged over 10 instances with random keys

## 5. Discussion

In the results presented above, it is clear that Robin Hood hashing with backward shift deletion outperforms both basic linear probing and Robin Hood hashing with tombstones. In addition, the mean DIB and variance of DIB are constant, and this even after a large number of insertion and deletion operations, which is consistent with the results presented in the thesis of Celis [4].

The most striking results are is Figure 4, where the 95th percentile for Robin Hood hashing with backward shift deletion remains at a value of around 7, which proves that even in the worst cases, the number of probes to find an entry will be very small.

## 6. Conclusion

The algorithm I had implemented in my first version of Robin Hood hashing using tombstones [1] was the one described by the pseudo-code of the original thesis [4]. Yet, I was unable to reproduce the results presented in that same thesis. The reason is that the results were describing the theoretical performance based on the mathematical model. And if the math were right, the given pseudo-code was doing it wrong. Thanks to Paul Khuong and his suggestion of using a backward shift on deletion, the practical results now match the theoretical results.

An interesting follow-up would be to experiment to see how costly the backward shift is for the CPU. With these new results, my feeling is now that Robin Hood hashing with backward shift deletion is definitely an interesting algorithm, and given its very linear memory access pattern, it would be worth investigating further for an on-disk key-value store.

## Subscribe to my newsletter

## 7. References

[1] Robin Hood Hashing by Emmanuel Goossaert

[2] Robin Hood Hashing by Paul Khuong

[3] https://github.com/goossaert/hashmap

[4] Robin Hood Hashing, Pedro Celis, 1986

I am confused.

You present 16 graphs, did numerous measurements.

But you have no idea if the performance is higher or lower, when using the change?

I guess that would depend on delete/insert ratio, but still…

Indeed, I wanted to run benchmarks, but then I got sucked into other projects and never got to that point. The results would obviously depend on how optimized is the code for the various implementations. If you time and motivation to do it, the code is available, and I’d love to know what results you get!

In your prior article, you discussed the growth in average DIB with increasing table size and considered Organ-pipe and smart search. However, because backward shift deletion lowers the DIB, do you now consider these search strategies to be moot? Are you aware of any studies that measure the min, max, or average DIB for various patterns of insertion and deletion? That is, deletion in precisely the reverse order to insertion would be expected to precisely revert the DIB, but how might other patterns affect the DIB?

Further reflection reveals that deletion of an entry that was inserted immediately prior to its deletion does not necessarily precisely revert the state of the Robin Hood hash table to its state prior to the insertion. Nevertheless, backward shift deletion decreases the DIB for all entries that are shifted backward and thus improves the performance of the Robin Hood hash table. I will refer to this observation in the response that I will leave to “Implementing a Key-Value Store – Part 6: Open-Addressing Hash Tables” at this URL:

http://codecapsule.com/2014/05/07/implementing-a-key-value-store-part-6-open-addressing-hash-tables/

In fact, I have left a response at the following URL to describe backward shift deletion for a Hopscotch hash table:

http://codecapsule.com/2013/08/11/hopscotch-hashing/

Note that with backward shift deletion, min DIB will always be 0, so the optimization of keeping track of (and skipping) min DIB probes in lookup is pointless.