Skip to content

Category: Algorithms and Programming

Service Ownership Checklist

Building and maintaining infrastructure services requires to strive for quality and ownership. But it’s not always easy to know what we are missing, and what assumption we are making that we don’t know of. To help myself and my colleagues reason about whether we are addressing the important topics, I came up with something I call the Service Ownership Checklist. It’s still in a draft format, but I’ve already refined it thanks to the help and feedback of many of my peers around me, and I’m now releasing it on my blog hoping that it can help other infrastructure engineers as well.

The way to use this document is to share it with your colleagues and teams, and have them ask each other some of the questions to see how they’re doing on all these topics and challenge their assumptions. You will hopefully uncover unknown issues, and create enough urgency to go and fix them.

This blog post is organized in two parts. The first part is the Service Ownership Checklist, a set of loose questions that can be used in brainstorming and sharing sessions, and the second part is a condensed version in the form of a questionnaire, the Service Ownership Questionnaire, which shows the different levels of quality for each reliability topic.

The SRE organization at Google is running Launch Reviews when they release new services, and for this they use a Launch Review Checklist. I recommend that you read Chapter 27 of the Google SRE book, which covers the subject.

Finally, if you think of any other topics, or of a better way to group the questions into categories, please post a comment below! And if you enjoyed this article, subscribe to the mailing list at the top of this page, and you will receive an update every time a new article is posted.

How to get started with infrastructure and distributed systems

Most of us developers have had experience with web or native applications that run on a single computer, but things are a lot different when you need to build a distributed system to synchronize dozens, sometimes hundreds of computers to work together. I recently received an email from someone asking…

Implementing a Key-Value Store – Part 9: Data Format and Memory Management in KingDB

This is Part 9 of the IKVS series, “Implementing a Key-Value Store”. You can also check the Table of Contents for other parts. In this series of articles, I describe the research and process through which I am implementing a key-value database, which I have named “KingDB”. The source code is available at http://kingdb.org. Please note that you do not need to read the previous parts to be able to follow. The previous parts were mostly exploratory, and starting with Part 8 is perfectly fine.

In this article, I explain how the storage engine of KingDB works, including details about the data format. I also cover how memory management is done through the use of a compaction process.

Implementing a Key-Value Store – Part 8: Architecture of KingDB

This is Part 8 of the IKVS series, “Implementing a Key-Value Store”. You can also check the Table of Contents for other parts. In this series of articles, I describe the research and process through which I am implementing a key-value database, which I have named “KingDB”. The source code is available at http://kingdb.org. Please note that you do not need to read the previous parts to be able to follow. The previous parts were mostly exploratory, and starting with Part 8 is perfectly fine.

In the previous articles, I have laid out the research and discussion around what needs to be considered when implementing a new key-value store. In this article, I will present the architecture of KingDB, the key-value store of this article series that I have finally finished implementing.

An afternoon in 1983 with Hector

ad-small

Back in 1998, I was 13 years old and had no money to buy a computer. But that wouldn’t stop me. I didn’t care if I had a brand new computer or a used one, all I wanted was a computer. From the conversations of adults around me, I heard it was common for businesses to renew their hardware and throw their old computers into the trash. So I figured, all I had to do was to be in the right trash at the right time, or even better, make the trash come to me. In these years, there were tons of computer paper magazines, so I decided to send a letter to one of them, of which I have now forgotten the name, to publish an ad in their classified section: “13 year-old student in the Paris area, will come to your home or office to get any computers you are about to throw in the bin”. From this classified, I got a 486DX-80 PC, a printer, and a Hector HR2+ computer.

The 486DX-80 and the printer have served me well, and are long gone. Last summer, while visiting my family in France I decided to take a look into the attic. As I was making my way through rusty nails and spider webs, I noticed a bag covered with dust in a dark corner of the room. I had the feeling that I was about to make a very good discovery, and I was not disappointed. I opened the bag with excitement, and there it was, the almighty Hector 2HR+ computer!

In the bag came all the booklets, cables, and cassettes, so I decided that I would spend the afternoon writing code on that machine, and that I would run this code before nightfall. But before I go any further on that, a bit of history about the Hector 2HR+…

Implementing a Key-Value Store – Part 6: Open-Addressing Hash Tables

This is Part 6 of the IKVS series, “Implementing a Key-Value Store”. You can also check the Table of Contents for other parts.

In this article, I will compare several open-addressing hash tables: Linear Probing, Hopscotch hashing, and Robin Hood hashing. I have already done some work on this topic, and in this article I want to gather data for more metrics in order to decide which hash table I will use for my key-value store.

The result section also contains an interesting observation about the maximum DIB for Robin Hood hashing, which originated from Kristofer Karlsson, a software engineer at Spotify and the author of the key-value store Sparkey.

This article will cover:

1. Open-addressing hash tables
2. Metrics
3. Experimental Protocol
4. Results and Discussion
5. Conclusion
6. References

Coding for SSDs – Part 6: A Summary – What every programmer should know about solid-state drives

This is Part 6 over 6 of “Coding for SSDs”. For other parts and sections, you can refer to the Table to Contents. This is a series of articles that I wrote to share what I learned while documenting myself on SSDs, and on how to make code perform well…