I want to make solid-state drives (SSDs) the optimal storage solution for my key-value store project. For that reason, I had to make sure I fully understood how SSDs work, so that I can optimize my hash table implementation to suit their internal characteristics. There is a lot of incomplete and contradictory information out there, and finding trustworthy information about SSDs was not an easy task. I had to do some substantial reading to find the proper publications and benchmarks in order to convince myself that, if I had to be coding for SSDs, I would know what I was doing.
Then I figured that since I had done all the research, it would be useful to share the conclusions I had reached. My intent was to transform all the information already available into practical knowledge. I ended up writing a 30-page article, not very suitable for the format of a blog. I have therefore decided to split what I had written into logical parts that can be digested independently. The full Table of Contents is available at the bottom of this article.
The most remarkable contribution is Part 6, a summary of the whole “Coding for SSDs” article series, that I am sure programmers who are in a rush will appreciate. This summary covers the basics of SSDs along with all of the recommended access patterns on how reads and writes should be implemented to get the best performance out of solid-state drives.
Another important detail is that “Coding for SSDs” is independent from my key-value store project (IKVS series), and therefore, no prior knowledge of the IKVS articles is needed. I am planning on writing an article for the IKVS series, on how hash table can be implemented to take advantage of the internal characteristics of SSDs, though I have no precise date for that yet.
My only regret is not to have produced any code of my own to prove that the access patterns I recommend are actually the best. However even with such code, I would have needed to perform benchmarks over a large array of different models of solid-state drives to confirm my results, which would have required more time and money than I can afford. I have cited my sources meticulously, and if you think that something is not correct in my recommendations, please leave a comment to shed light on that. And of course, feel free to drop a comment as well if you have questions or would like to contribute in any way.
Finally, remember to subscribe to the newsletter to receive a notification email every time a new article is posted on Code Capsule. The subscription panel is available at the top right corner of the blog.
Table of Content
1. Structure of an SSD
1.1 NAND-flash memory cells
1.2 Organization of an SSD
1.3 Manufacturing process
2. Benchmarking and performance metrics
2.1 Basic benchmarks
2.3 Workloads and metrics
3. Basic operations
3.1 Read, write, erase
3.2 Example of a write
3.3 Write amplification
3.4 Wear leveling
4. Flash Translation Layer (FTL)
4.1 On the necessity of having an FTL
4.2 Logical block mapping
4.3 Notes on the state of the industry
4.4 Garbage collection
5. Advanced functionalities
5.3 Secure Erase
5.4 Native Command Queueing (NCQ)
5.5 Power-loss protection
6. Internal Parallelism in SSDs
6.1 Limited I/O bus bandwidth
6.2 Multiple levels of parallelism
6.3 Clustered blocks
7. Access patterns
7.1 Defining sequential and random I/O operations
7.4 Concurrent reads and writes
8. System optimizations
8.1 Partition alignment
8.2 Filesystem parameters
8.3 Operating system I/O scheduler
8.5 Temporary files