Translations: This article was translated to Simplified Chinese by Xiong Duo and to Korean by Matt Lee (이 성욱).
Introduction
I want to make solid-state drives (SSDs) the optimal storage solution for my key-value store project. For that reason, I had to make sure I fully understood how SSDs work, so that I can optimize my hash table implementation to suit their internal characteristics. There is a lot of incomplete and contradictory information out there, and finding trustworthy information about SSDs was not an easy task. I had to do some substantial reading to find the proper publications and benchmarks in order to convince myself that, if I had to be coding for SSDs, I would know what I was doing.
Then I figured that since I had done all the research, it would be useful to share the conclusions I had reached. My intent was to transform all the information already available into practical knowledge. I ended up writing a 30-page article, not very suitable for the format of a blog. I have therefore decided to split what I had written into logical parts that can be digested independently. The full Table of Contents is available at the bottom of this article.
The most remarkable contribution is Part 6, a summary of the whole “Coding for SSDs” article series, that I am sure programmers who are in a rush will appreciate. This summary covers the basics of SSDs along with all of the recommended access patterns on how reads and writes should be implemented to get the best performance out of solid-state drives.
Another important detail is that “Coding for SSDs” is independent from my key-value store project (IKVS series), and therefore, no prior knowledge of the IKVS articles is needed. I am planning on writing an article for the IKVS series, on how hash table can be implemented to take advantage of the internal characteristics of SSDs, though I have no precise date for that yet.
My only regret is not to have produced any code of my own to prove that the access patterns I recommend are actually the best. However even with such code, I would have needed to perform benchmarks over a large array of different models of solid-state drives to confirm my results, which would have required more time and money than I can afford. I have cited my sources meticulously, and if you think that something is not correct in my recommendations, please leave a comment to shed light on that. And of course, feel free to drop a comment as well if you have questions or would like to contribute in any way.
Finally, remember to subscribe to the newsletter to receive a notification email every time a new article is posted on Code Capsule. The subscription panel is available at the top right corner of the blog.
Table of Content
Part 1: Introduction and Table of Contents
Part 2: Architecture of an SSD and Benchmarking
1. Structure of an SSD
1.1 NAND-flash memory cells
1.2 Organization of an SSD
1.3 Manufacturing process
2. Benchmarking and performance metrics
2.1 Basic benchmarks
2.2 Pre-conditioning
2.3 Workloads and metrics
Part 3: Pages, Blocks, and the Flash Translation Layer
3. Basic operations
3.1 Read, write, erase
3.2 Example of a write
3.3 Write amplification
3.4 Wear leveling
4. Flash Translation Layer (FTL)
4.1 On the necessity of having an FTL
4.2 Logical block mapping
4.3 Notes on the state of the industry
4.4 Garbage collection
Part 4: Advanced Functionalities and Internal Parallelism
5. Advanced functionalities
5.1 TRIM
5.2 Over-provisioning
5.3 Secure Erase
5.4 Native Command Queueing (NCQ)
5.5 Power-loss protection
6. Internal Parallelism in SSDs
6.1 Limited I/O bus bandwidth
6.2 Multiple levels of parallelism
6.3 Clustered blocks
Part 5: Access Patterns and System Optimizations
7. Access patterns
7.1 Defining sequential and random I/O operations
7.2 Writes
7.3 Reads
7.4 Concurrent reads and writes
8. System optimizations
8.1 Partition alignment
8.2 Filesystem parameters
8.3 Operating system I/O scheduler
8.4 Swap
8.5 Temporary files
Part 6: A Summary – What every programmer should know about solid-state drives
What’s next
Part 2 is available here. If you’re in a rush, you can also go directly to Part 6, which is summarizing the content from all the other parts.
You mention aligning partitions to page size, but not ensuring **cluster size** is a multiple of page size (which it rarely is any more). Pretty much every modern OS still defaults to format with a 4KB cluster size, which somewhat defeats the purpose of alignment when most current SSDs have either 8KB or 16KB pages. It is possible to format with other cluster sizes during Windows installation, but it must be done from a command prompt rather than the GUI. (shift-f10 to open it during partition selection part of install IIRC)
Anybody reading this article knows how to open links in a new tab. ie: please remove target=”_blank” from your links, especially the internal ones.
Thank you for this thoroughly researched article! It’s so detailed and the reference links you have provided have been helpful places to leap off and dive into. Thank you!
@Kurt:
target=”_blank” is not a problem, you can force open new tabs in the same window in Firefox.
Nice research you’ve done…. now with VPS that are providing SSDs for a very low cost, it will help achieving max out of those SSDs(I hope so). Will be reading it thoroughly during the weekend.
About the “_blank” on this page, its quite good to have it, while browsing through so many pages keeping a tap on where we begin. Now command key will run a little longer 🙂
The cool thing is that there are ways to reverse engineer the characteristics of an SSD by running specific workloads (size of page, size of block, etc.) So it would actually be possible, based on this data, to find out which exact model of SSDs are used on the node of a VPS! The concern with VPS using SSDs though is that there is no way to know what is the workload of the other tenants, and therefore even if you’re using an “SSD-optimized” data structure, other tenants may not care at all and could be completely messing up the mapping table and garbage collection process of the drive. There are also problems of concurrent I/O and readahead buffer. Finally, if you use over-provisioning in a VPS by formatting to a lower drive capacity than the capacity you’ve been allocated, you’ll be getting better performance, but you’ll also be giving away better performance to the other tenants, who don’t have to pay for it… not so cool.
A fantastic article. You should consider creating a leanpub book on the subject and making (a small amount of) money.
A book on leanpub is indeed a nice idea! Though I’m guessing that the book would be interesting only for a tiny niche market, so I’d rather make the content available for free 🙂
Hi, first of all I want to tell you that you did an excellent job.
Then, a little suggestion: collect all 6 parts in epub and give us the chance to pay you back with a beer!
Take a look at leanpub.com site!
thank you for you prominent research work about ssd. I thank it is very useful for the start-up ssd researchers.
Thank you very. Kindly keep going up.
Good article bro