Algorithms and Programming

Service Ownership Checklist

Published by Emmanuel Goossaert on November 12, 2017

Building and maintaining infrastructure services requires to strive for quality and ownership. But it’s not always easy to know what we are missing, and what assumption we are making that we don’t know of. To help myself and my colleagues reason about whether we are addressing the important topics, I came up with something I call the Service Ownership Checklist. It’s still in a draft format, but I’ve already refined it thanks to the help and feedback of many of my peers around me, and I’m now releasing it on my blog hoping that it can help other infrastructure engineers as well.

The way to use this document is to share it with your colleagues and teams, and have them ask each other some of the questions to see how they’re doing on all these topics and challenge their assumptions. You will hopefully uncover unknown issues, and create enough urgency to go and fix them.

This blog post is organized in two parts. The first part is the Service Ownership Checklist, a set of loose questions that can be used in brainstorming and sharing sessions, and the second part is a condensed version in the form of a questionnaire, the Service Ownership Questionnaire, which shows the different levels of quality for each reliability topic.

The SRE organization at Google is running Launch Reviews when they release new services, and for this they use a Launch Review Checklist. I recommend that you read Chapter 27 of the Google SRE book, which covers the subject.

Finally, if you think of any other topics, or of a better way to group the questions into categories, please post a comment below! And if you enjoyed this article, subscribe to the mailing list at the top of this page, and you will receive an update every time a new article is posted.

Optimize your monitoring for decision-making

Optimize your monitoring for decision-making

Published by Emmanuel Goossaert on August 11, 2016

Working in infrastructure is about building and maintaining complex systems made of many moving parts. To operating such systems you need to make sure they are running and healthy, that is to say that they are performing to the level of quality you have defined, and for this you need…

How to get started with infrastructure and distributed systems

How to get started with infrastructure and distributed systems

Published by Emmanuel Goossaert on January 3, 2016

Most of us developers have had experience with web or native applications that run on a single computer, but things are a lot different when you need to build a distributed system to synchronize dozens, sometimes hundreds of computers to work together. I recently received an email from someone asking…

Implementing a Key-Value Store – Part 9: Data Format and Memory Management in KingDB

Published by Emmanuel Goossaert on August 3, 2015

This is Part 9 of the IKVS series, “Implementing a Key-Value Store”. You can also check the Table of Contents for other parts. In this series of articles, I describe the research and process through which I am implementing a key-value database, which I have named “KingDB”. The source code is available at http://kingdb.org. Please note that you do not need to read the previous parts to be able to follow. The previous parts were mostly exploratory, and starting with Part 8 is perfectly fine.

In this article, I explain how the storage engine of KingDB works, including details about the data format. I also cover how memory management is done through the use of a compaction process.

Implementing a Key-Value Store – Part 8: Architecture of KingDB

Published by Emmanuel Goossaert on May 25, 2015

This is Part 8 of the IKVS series, “Implementing a Key-Value Store”. You can also check the Table of Contents for other parts. In this series of articles, I describe the research and process through which I am implementing a key-value database, which I have named “KingDB”. The source code is available at http://kingdb.org. Please note that you do not need to read the previous parts to be able to follow. The previous parts were mostly exploratory, and starting with Part 8 is perfectly fine.

In the previous articles, I have laid out the research and discussion around what needs to be considered when implementing a new key-value store. In this article, I will present the architecture of KingDB, the key-value store of this article series that I have finally finished implementing.

Introducing KingDB v0.9.0

Published by Emmanuel Goossaert on May 14, 2015

I am pleased to announce that I am releasing the very first version of KingDB, the fast persisted key-value store. KingDB is a side-project that I have been hacking on intermittently over the last couple of years. It has taken a lot of my personal time, therefore I am very…

An afternoon in 1983 with Hector

Published by Emmanuel Goossaert on November 22, 2014

Back in 1998, I was 13 years old and had no money to buy a computer. But that wouldn’t stop me. I didn’t care if I had a brand new computer or a used one, all I wanted was a computer. From the conversations of adults around me, I heard it was common for businesses to renew their hardware and throw their old computers into the trash. So I figured, all I had to do was to be in the right trash at the right time, or even better, make the trash come to me. In these years, there were tons of computer paper magazines, so I decided to send a letter to one of them, of which I have now forgotten the name, to publish an ad in their classified section: “13 year-old student in the Paris area, will come to your home or office to get any computers you are about to throw in the bin”. From this classified, I got a 486DX-80 PC, a printer, and a Hector HR2+ computer.

The 486DX-80 and the printer have served me well, and are long gone. Last summer, while visiting my family in France I decided to take a look into the attic. As I was making my way through rusty nails and spider webs, I noticed a bag covered with dust in a dark corner of the room. I had the feeling that I was about to make a very good discovery, and I was not disappointed. I opened the bag with excitement, and there it was, the almighty Hector 2HR+ computer!

In the bag came all the booklets, cables, and cassettes, so I decided that I would spend the afternoon writing code on that machine, and that I would run this code before nightfall. But before I go any further on that, a bit of history about the Hector 2HR+…

Implementing a Key-Value Store – Part 7: Optimizing Data Structures for SSDs

Published by Emmanuel Goossaert on October 18, 2014

This is Part 7 of the IKVS series, “Implementing a Key-Value Store”. You can also check the Table of Contents for other parts. In this series of articles, I describe the research and process through which I am implementing a key-value store database, which I have named “KingDB”.

In the previous articles, I have spent a fair amount of time reviewing existing key-value stores, interfaces, architectures, and I focused greatly on hash tables. In this article, I will talk about hardware considerations for storing data structures on solid-state drives (SSDs), and I will share a few notes regarding file I/O optimizations in Unix-based operating systems.

This article will cover:

1. Fast data structures on SSDs
2. File I/O optimizations
3. Done is better than perfect
4. References

Implementing a Key-Value Store – Part 6: Open-Addressing Hash Tables

Published by Emmanuel Goossaert on May 7, 2014

This is Part 6 of the IKVS series, “Implementing a Key-Value Store”. You can also check the Table of Contents for other parts.

In this article, I will compare several open-addressing hash tables: Linear Probing, Hopscotch hashing, and Robin Hood hashing. I have already done some work on this topic, and in this article I want to gather data for more metrics in order to decide which hash table I will use for my key-value store.

The result section also contains an interesting observation about the maximum DIB for Robin Hood hashing, which originated from Kristofer Karlsson, a software engineer at Spotify and the author of the key-value store Sparkey.

This article will cover:

1. Open-addressing hash tables
2. Metrics
3. Experimental Protocol
4. Results and Discussion
5. Conclusion
6. References

Coding for SSDs – Part 6: A Summary – What every programmer should know about solid-state drives

Published by Emmanuel Goossaert on February 12, 2014

This is Part 6 over 6 of “Coding for SSDs”. For other parts and sections, you can refer to the Table to Contents. This is a series of articles that I wrote to share what I learned while documenting myself on SSDs, and on how to make code perform well…

Category: Algorithms and Programming