In my article about Robin Hood hashing , I had reached the conclusion that the Robin Hood hashing algorithm was performing poorly after deletions had occurred in the hash table, and I was quite disappointed with my results. In the research I had done, I had stumbled upon an article by Paul Khuong who had done some work on Robin Hood hashing , and I decided to email him to ask what he thought of my results. Paul got back to me, and after looking at my code on Github, he suggested that the performance could be greatly improved with only a little change to the removal method. He even submitted a patch to point out what the change would be, and he ran this modified version himself on the test cases I had included in the source code. The results he got were looking really good, and therefore I had no other choice but to write a follow-up article to present the change to the code and show those results.
In this article, I am presenting the backward shift deletion variant for the Robin Hood hashing algorithm. I start by explaining what backward shift deletion is and how to implement it, then I use it on the same test cases that I had run in my previous article. Finally, I discuss the new results.
1. Backward shift deletion
The idea to improve the performance of Robin Hood hashing is simple: backward shift deletion. Instead of replacing entries with “tombstone” entries on deletion like I was doing it in my first implementation , the idea is to shift backward all the entries following the entry to delete until either an empty bucket, or a bucket with a DIB of 0 (distance to initial bucket). By doing this, every deletion will shift backwards entries and therefore decrease their respective DIBs by 1 (all their initial DIB values would be >= 1). An intuitive way to understand the backward shift is to think that by shifting backward the entries, the table is left as if the deleted entry had never been inserted. This is why even after a large number of deletions, the mean DIB and the variance of the DIB remain constant and low. The backward shift deletion is illustrated in Figure 1 below:
Figure 1: Backward shift deletion in Robin Hood hashing
2. Source code
The source code is available on Github . The code includes a Makefile which will make compilation easy. Calling the main program with the –help parameter will give you a detailed description of the available options, with examples at the end describing how to test a hash table using Robin Hood hashing with backward shift deletion.
3. Experimental protocol
In order to test the effect of backward shift deletion on performance, I am going to use the same test cases that I used in my previous article about Robin Hood hashing . The only difference is that since I observed that there was no difference between the tables of size 10k and 100k, this time I am only plotting the results for tables of size 10k.
For more details regarding the test cases, take a look that the “Experiment protocol” section in the my previous article .
The results are presented in the four figures below, each figure showing the same statistic across the different test cases:
- Figure 2 is the mean DIB
- Figure 3 is the median of DIB
- Figure 4 is the 95th percentile of DIB
- Figure 5 is the variance of DIB
Each of these figures is holding sub-figures, each for a different test case:
- (a) is the “batch” test case, with LFM=0.8 and LFR=0.1
- (b) is the “batch” test case, with LFM=0.8 and LFR=0.8
- (c) is the “ripple” test case, with LFM=0.8 and LFR=0.1
- (d) is the “loading” test case
The mean DIBs are adjusted for graphs of the the Robin Hood with tombstones, in such a way that the minimum DIB is shifted to zero, and this in order to make a fair comparison with the other algorithms. Indeed, because the implementation of Robin Hood hashing with tombstones is considering only probes between the minimum and maximum DIBs, the probing of an item never starts at DIB 0 but at the minimum DIB. The graphs for Robin Hood hashing with backward shift deletion and the graphs for basic linear probing are not shifted down.
Figure 2: Mean DIB averaged over 10 instances with random keys
Figure 3: Median of DIB averaged over 10 instances with random keys
Figure 4: 95th percentile of DIB averaged over 10 instances with random keys
Figure 5: Variance of DIB averaged over 10 instances with random keys
In the results presented above, it is clear that Robin Hood hashing with backward shift deletion outperforms both basic linear probing and Robin Hood hashing with tombstones. In addition, the mean DIB and variance of DIB are constant, and this even after a large number of insertion and deletion operations, which is consistent with the results presented in the thesis of Celis .
The most striking results are is Figure 4, where the 95th percentile for Robin Hood hashing with backward shift deletion remains at a value of around 7, which proves that even in the worst cases, the number of probes to find an entry will be very small.
The algorithm I had implemented in my first version of Robin Hood hashing using tombstones  was the one described by the pseudo-code of the original thesis . Yet, I was unable to reproduce the results presented in that same thesis. The reason is that the results were describing the theoretical performance based on the mathematical model. And if the math were right, the given pseudo-code was doing it wrong. Thanks to Paul Khuong and his suggestion of using a backward shift on deletion, the practical results now match the theoretical results.
An interesting follow-up would be to experiment to see how costly the backward shift is for the CPU. With these new results, my feeling is now that Robin Hood hashing with backward shift deletion is definitely an interesting algorithm, and given its very linear memory access pattern, it would be worth investigating further for an on-disk key-value store.