This is Part 5 of the IKVS series, “Implementing a Key-Value Store”. You can also check the Table of Contents for other parts.
In this article, I will study the actual implementations of hash tables in C++ to understand where are the bottlenecks. Hash functions are CPU-intensive and should be optimized for that. However, most of the inner mechanisms of hash tables are just about efficient memory and I/O access, which will be the main focus of this article. I will study three different hash table implementations in C++, both in-memory and on-disk, and take a look at how the data are organized and accessed. This article will cover:
1. Hash tables
1.1 Quick introduction to hash tables
1.2 Hash functions
2.1 unordered_map from TR1
2.2 dense_hash_map from SparseHash
2.3 HashDB from Kyoto Cabinet
Of all the currently available media, the written format is the only one for which we do not know the exact durations ahead of time. Indeed, we know exactly how long it will take to watch a film or listen to a podcast, but we have no idea how long we’ll have to sit in front of a scientific paper, novel or even a blog post. I think we are missing out on something.
Current solutions and effects of the estimated reading time
The idea of computing an ERT, estimated reading time, is not new. There are a couple of APIs around the internet , and various WordPress plug-ins  already offering rough estimations. Some reader apps and websites are also implementing their own solutions, as it is the case with Readibility, Instapaper, Readmill, and Longreads. They all seem to be based on the same assumption, which is that an average person reads 200 words per minute.
Accurate or not, basic estimations seem to have some effects on readers already. David Micheal Ross has reported that adding an ERT to his articles has decreased his bounce rate by 13% . Brian Cray, who also added basic ERT to his articles, reported that the time spent on site improved by 13.8%, and that people subscribed to his blog, followed him on Twitter, or retweeted his articles 66.7% more often . Even though in both cases the protocol is lacking scientific rigor, these were interesting experiments that invite for a more in-depth study.
However, not everybody is welcoming the ERT. Some have found offending the idea to present or be presented with an ERT, because in their opinion it would show no respect for the time invested by the writers in their work . I disagree, as most of what I read is not poetry but rather technical books, publications, and blog posts. All I want is to absorb the content that was laid out as words right into my brain. I couldn’t care less how fancy the writing style is.
If I wanted to read Proust or Camus, I would do that on a nice Sunday afternoon and take all the time I wanted, but that’s a totally different story. This question is never asked with other media and art forms. I know that watching the film “Pulp Fiction” will take me exactly 154 minutes, and this doesn’t change anything to the fact that it’s an awesome film and that I will have a great time watching it. Knowing in advance how long an article will take me just helps me with my time management, by allowing me to plan better.
But if it’s purely time that is the concern, then maybe instead of knowing how long some text will take to read we should just try to increase the speed at which we can read.
I have tried many “speed reading” techniques , and none of them worked for me. As a matter of fact, I think that speed reading is bullshit. I see reading as a two-way problem. The first possibility is that you are reading something because you want to understand it, and therefore it’s probably a complex thing that requires you to focus and so you have to spend enough time reading it. The second possibility is that you are reading something for entertainment, in which case you are not concerned about time. So either way, reading speed is irrelevant.
Some research has stated that reading on paper was faster than on screen by 10 to 30%  , although some other research said they were equivalent . These results are interesting but have to be considered with caution. I would argue that their results probably don’t apply anymore, as those publications are starting to be a bit old. Reading speed on screen depends greatly on the quality of the displays, and hardware has improved greatly over the last decade. More recent research is also being pursued by Thierry Baccino et al. at the IUL (Integrative Usage Lab) about the profound changes that the digital format will bring to the process of learning how to read  .
My main concern with the reading speed is not the way it’s being measured, but simply that it is misleading. People want to read faster because they associate intelligence with reading speed, and most want to feel and appear to be smart. Who wouldn’t want to read books “Will Hunting” style? We are missing the point, because reading is not about being fast, it is about remembering what we have read. I would happily spend twice as long reading any book if I knew there was a guarantee for its content to be committed permanently to my memory.
One of the reasons why we are slow readers is that most of us are unable to focus for long periods of time, and parasite thoughts come along and mess with the current flow of words. Another reason is that we need to spend some time decoding the format. Some information is better represented as a spreadsheet, as an array, or even as a schematic. A picture is worth a thousand words.
For online content, there are really low hanging fruits as to what can be done to improve the reading experience. Apps such as Readibility can transform any page into a format that is more readable, with better font and layout. And this is of prime importance. In “Thinking: Fast and Slow” , Kahneman states that experiments have shown that with the use of clearer fonts increases cognitive ease and therefore comprehension of a written content. And this is just for the font, this is not even considering all the layout issues, or all the advertisement banners that our brains need to filter out while we are browsing pages online.
Humans are supposed to have hundreds of years of experience in layout, as this has existed since the first books have been around, and it has been greatly perfected by newspapers thereon. Layout is in fact supposed to be someone’s job, it’s called “layout artist”. The problem is that most non-professional content publishers are completely unaware of layout standards and of what makes a page more readable. It feels like if the transition from paper to digital made us forget everything we learned. Making tools to publish content with valid layout standards automatically, without relying on the authors, would improve the overall experience on the internet for everyone.
Ideas for improving ERT
The assumption used by the current solutions for ERT is the 200 words per minutes for an average person. This is why they all fail in providing valuable information. Most users are not adopting ERT because they cannot make anything valuable from the current implementations.
Films, songs and other finite media have a duration by themselves, and therefore this duration is the sum of factors that are external to us. Reading, on the opposite, depends on internal factors such as how experienced are we as readers, how much expertise do we have with the topic and vocabulary used in the article at hands, or even how tired are we at this moment of the day. Thus, achieving to predict with accuracy how long some text will take to read may require to build one model per reader, or maybe one model per group of similar readers.
Building a prototype for such an improved ERT tool could be made through a browser plug-in. It would require a back-end so users can login, and their reading times for specific pages can be stored. There would be technical and privacy issues, as one would want to be careful with personal pages such as emails and Facebook. Getting people to use this browser plug-in would another problem, although I would argue that a large chunk of the users of productivity apps are early adopters, so it wouldn’t be much of an issue getting the first 1,000 users.
Then the fun comes in. With enough data gathered, it may be possible to prove with high confidence that some layouts or writing styles outperform others for reading time and comprehension, and should therefore be selected as standards. But well, that would be way down along the road.
Anyhow, it’s high time we get some accurate ERTs all over the web! Having the freedom to pick articles based on their ERTs, and also to use ERTs to plan for the reading of long content, would just be awesome. It’s one of those things we don’t know we need, and once it will be implemented we won’t even notice it anymore as it will feel just so natural to have.
Maybe I’ll implement it myself as a hack whenever I have some time, or maybe someone else will do it.
 “Reading Online or on Paper: Which is Faster?” by Kurniawan and Zaphiris — http://users.soe.ucsc.edu/~srikur/files/HCII_reading.pdf
 “Reading from paper versus reading from screens” by Dillon, McKnight and Richardson — http://comjnl.oxfordjournals.org/content/31/5/457.abstract
 “E-Books and the Future of Reading” by Harrison, IEEE Computer Graphics and Applications, Volume 20 , Issue 3 (May 2000), pp. 32-39
This is Part 4 of the IKVS series, “Implementing a Key-Value Store”. You can also check the Table of Contents for other parts.
I finally settled on a name for this whole key-value store project, which from now on will be referred as FelixDB.
In this article, I will take a look at the APIs of four key-value stores and database systems: LevelDB, Kyoto Cabinet, BerkekeyDB and SQLite3. For each major functionality in their APIs, I will compare the naming conventions and method prototypes, to balance the pros and cons and design the API for the key-value store I am currently developing, FelixDB. This article will cover:
1. General principles for API design
2. Defining the functionalities for the public API of FelixDB
3. Comparing the APIs of existing databases
3.1 Opening and closing a database
3.2 Reads and Writes
3.5 Error management
Implementing a Key-Value Store – Part 3: Comparative Analysis of the Architectures of Kyoto Cabinet and LevelDB
This is Part 3 of the IKVS series, “Implementing a Key-Value Store”. You can also check the Table of Contents for other parts.
In this article, I will walk through the architectures of Kyoto Cabinet and LevelDB, component by component. The goal, as stated in Part 2 of the IKVS series, is to get insights at how I should create the architecture my own key-value store by analyzing the architectures of existing key-value stores. This article will cover:
1. Intent and methodology of this architecture analysis
2. Overview of the Components of a Key-Value Store
3. Structural and conceptual analysis of Kyoto Cabinet and LevelDB
3.1 Create a map of the code with Doxygen
3.2 Overall architecture
3.6 Error Management
3.7 Memory Management
3.8 Data Storage
4. Code review
4.1 Organization of declarations and definitions
4.3 Code duplication
This is Part 2 of the IKVS series, “Implementing a Key-Value Store”. You can also check the Table of Contents for other parts.
In this article, I will start by explaining why I think it is important to use models for this project and not start completely from scratch. I will describe a set of criteria for selecting key-value store models. Finally, I will go over some well-known key-value store projects, and select a few of them as models using the presented criteria. This article will cover:
1. Not reinventing the wheel
2. Model candidates and selection criteria
3. Overview of the selected key-value stores
When doing system administration to fix a crash on some Unix-based server, I have run several times into the issue of trying to remember how to perform a certain task, but not remembering the exact sequence of commands. After that, I am always doing the same thing, and I have to resort to do a search on Google to find the commands I need. Those tasks are generally not frequent enough to be worth it to memorize the commands or create a script, but frequent enough for the process of searching to become really annoying. It’s also a productivity issue since it requires me to stop the current workflow, open a web browser and perform a search. For me, those things include tasks such as “how to find the number of processors on a machine” or “how to dump a Postgresql table in CSV format.”
I thought that it would be great to have some piece of code to just be able to query Google from the command-line. But that would be a mess, as for each query I would need a simple sequence of commands that I need to type, and not a blog article with fluffy text all around which is what Google is likely to return. Also, I thought about using the API of commandlinefu.com to get results directly from there. So I did a small Python script that performs text search that way, but the results were never exactly what I was looking for, since the commands presented there have been formatted by people who do not have the exact same needs I have. This is what brought me to implement Kir, a tiny utility to allow for text-search directly from the command-line and give the exact list of commands needed.
This is Part 1 of the IKVS series, “Implementing a Key-Value Store”. You can also check the Table of Contents for other parts.
In this article, I will start with a short description of what key-value stores are. Then, I will explain the reasons behind this project, and finally I will expose the main goals for the key-value store that I am planning to implement. Here is the list of the things I will cover in this article:
1. A quick overview of key-value stores
2. Key-value stores versus relational databases
3. Why implement a key-value store
4. The plan
This post is the main article for the series “Implementing a Key-Value Store” (IKVS) that I am starting today. It aims at summing up all the articles of the series in a Table of Contents, and might later hold some general notes on the project.
Its content will change over time until the series is completed. In particular, in the Table of Contents, the titles of the parts that have not been written yet and their ordering might change. Some parts might also be removed and some others added as the writing advances.
More information on the project can be found in Section 1.3 of “Part 1: What are key-value stores, and why implement one?”
Enjoy, and if you have any questions, post a comment!
Table of Contents
1.1 – A quick overview of key-value stores
1.2 – Key-value stores versus relational databases
1.3 – Why implement a key-value store
1.4 – The plan
1.5 – References
2.1 – Not reinventing the wheel
2.2 – Model candidates and selection criteria
2.3 – Overview of the selected key-value stores
2.4 – References
3.1 – Intent and methodology of this architecture analysis
3.2 – Overview of the Components of a Key-Value Store
3.3 – Structural and conceptual analysis of Kyoto Cabinet and LevelDB
3.4 – Code review
3.5 – References
4.1 – General principles for API design
4.2 – Defining the functionalities for the public API of FelixDB
4.3 – Comparing the APIs of existing databases
4.4 – Conclusion
4.5 – References
5.1 – Hash tables
5.2 – Implementations
5.3 – Conclusion
5.4 – References
6 – Implementing a memory-efficient hash table stored on the file system
7 – Memory Management
8 – Networking
9 – Interfaces: REST, memcached, etc.
10 – Going further
I heard about drop-shipping for the first time a few months ago, when I stumbled upon an AMAA on Reddit with some guy claiming he was making $100k per month running drop-shipping websites. This guy also apparently verified the information with some mods of the AMAA sub-reddit, and provided a short introductory guide to drop-shipping that he later removed. Lucky me, I also bookmarked the link to the guide when I bookmarked the AMAA, here is the guide he made. The guide includes, at the very end, a list of the companies that he is using for his marketing. Some comments on Hacker News about this AMAA said that this looks like a scam aimed at promoting those companies.
After reading the post on Reddit, I started to look into drop-shipping as a possibility for creating a small business that would generate small but steady revenue. I already explored other options, as documented in a previous blog post about micro-ISVs. Here is what I found and what I think about drop-shipping.
I am a bit versed in filmmaking, and I recently read “Letters to Young Filmmakers” by UCLA Film School professor Howard Suber. The similarities between the startup world and the film industry are striking.
Making a film, just like making a startup, is about finding the right people for the team, getting the money, executing the idea, and selling the product. For startups it’s about business people, engineers, designers and marketers. For films it’s about screenwriters, actors, producers and directors. Just like startup founders with investors, screenwriters are struggling to pitch their movie ideas to producers. Now I am a lot less surprised to see Hollywood celebrities like Ashton Kutcher investing in startups. But there are majors differences as well. For instance, one can bootstrap a startup with two guys in a garage on ramen noodles, but the money necessary to start a movie project and the number of people involved is by far more significant. Also, while it is relatively easy to try a startup idea in an incremental way with a few hundred thousand dollars and limit losses in case of failure, a film is “make it or break it.” There is always a chance for which the movie will be a failure, but the ways of knowing this in advance are limited. And in that case, the millions of dollars spent for its making are instantly lost.
I have selected a few paragraphs from “Letters to Young Filmmakers” that I am presenting below, in which I found clear similarities with the startup world and great wisdom. I urge you to buy this book and read it from cover to cover. It’s not only for filmmakers, it’s for anybody who ever wanted to get a significant project done.
Becoming a filmmaker
Whether you are a writer, director, producer or other creative person, to a very large extent you do not make a film, you have to get the film made.
Whatever your creative role you do not make a film; you make a contribution to a film. Without the contributions of others, the film won’t come into existence and won’t reach an audience [...] I am not just calling for humility, I’m asking you to learn and understand what all those people you have to work with contribute to the process of film.
Strategy and Artistic Freedom
Some people have difficulty with strategizing in advance. But I think all creative people should try to figure out where they’re going. Creativity is like sailing. You can just embark and see where it takes you, which has a certain kind of appeal. Or you can have a destination. You still have the potential for an interesting voyage, but when you’re done you’ll know you’ve arrived someplace.
Active vs. Effective
There is a big difference between being active and engaging in effective actions. Many people in the film industry are constantly active. They’re on the phone incessantly, sometimes eat several partial meals a day while meeting with a variety of people, attend screenings, go to parties, read scripts, newspapers, emails, and magazines, flip from one TV channel to the next, manically search the Internet, and assume that because they’re busy, they’re being productive.
When you gather a bunch of heroes together, you’re guaranteed to have conflict because each one knows he’s a hero, and expects to take the lead. Well, artists are heroes of a different kind (just ask them), and the bigger the artist the bigger the ego and the more difficulty they have genuinely collaborating.
Mentors and Models
The most important thing about mentors is not whether they’re younger or older, but whether you really respect them.
Listen to the potential wisdom mentors have to offer, but then transform what they tell you into something relevant to your own life, your own time, and your own personality.
Getting Started as a Director
Actors have to act, writers have to write, directors have to direct. However you do it, you need a portfolio that gives you credibility.
The distributor is also primarily responsible for the marketing of the film. As hard as it is to get a film made, it’s equally hard, if not harder, to get people to see you film. I’ve understated it: it’s equally hard to get people aware that your film even exists.
What Determines Success
I’ve never known any [filmmaker] to have an easy life, to be filled with anything other than wild hopes and frequent depression and an underlying sense that they have failed to fulfill their potential. The same could be said about any artist or perhaps any human being who aspires to create something significant.