So I have been working at a startup company for about two months now, and I already learned many great things. I am not talking about new frameworks or programming tricks, but about peer-interaction and productivity. Coming with an academic background to a fully-pragmatic startup is not easy, and I…
I have seen many blog posts about Twitter bots, but they all seem pretty useless to me. People have this tendency to overuse things like Markov chains or other conversational agent library, just to make their bot look cool. But what’s the point of all that, and what problem are…
The rate at which gaming is penetrating the different layers of our societies has been rapidly increasing during the last decade. Sure, part of this is due to the fact that kids playing in the 80’s are now 30 to 40-year old, but another good part of it is a direct consequence of the vulgarization of video games in general. With the Wii and now Kinect, no need for fancy game pads and killer finger skills to play video games. A vector of this acceleration is also the recent urge in the social media world. With Facebook, social games have emerged, taking advantage of social and psychological constraints. Games are no longer confined to the virtual world, and are crushing into the real world.
When developing a web site or web application, it is always a hassle to first commit the code to the repository, and then use FTP or SCP to update the code on the web server. Various IDEs allow to automatize this “code synchronization” process with a simple button to click, however this is still one additional step on the developer’s mind. We all went through these moments where, developing late in the night, we were trying to fix a bug and realized that we were working on the wrong version as we had forgotten to update the code on the web server. What a waste of time! For the development of a recent project, that includes sharing code with other developers, I decided to put an end to this code synchronization issue, by making it fully automatic. In this article, I first present the problem and my design of a configuration-independent solution. Then, I apply this solution to the specific context of my project, which uses Python as a web server, BitBucket/Mercurial as a repository solution, and Apache/PHP to handle the commit notifications.
Manually updating the files on a web server after a commit is a simple and straightforward step. However, it can be the source of various errors, such as:
1. Accidentally uploading the files in a wrong location on the web server
2. Forgetting to update the code on the web server and waste time debugging a deprecated version
3. Being distracted by thinking about updating the code on the web server, and losing focus on the development flow
4. Wasting time, as even if updating manually takes only a few seconds, these seconds will add up dramatically
The actual step of committing the code to the repository can hardly be automatized, as only the developer can say when the code is ready to be pushed. On the contrary, updating the code on the web server could easily be avoided. A simple system would be to get the web server to update its files and restart automatically whenever a commit is made to the repository. Figure 1 below describes the design of this automatic synchronization system.
During the Gold Rush of California in 1849, thousands of men were ready to put at stake all they had in the hope of a better life. The software industry that we are currently experiencing shows, to a certain extent, very similar patterns. Smartphone and source code have replaced the traditional pick and scoop, however the goals and hopes remained the same. But is there anything for developers to expect from this gold rush of software?
As part of my research on image segmentation, I have explored different methods for selecting areas in an image. Recently, I found a statistical color model based upon Lambertain surface reflectance. I have implemented this model using OpenCV 2.1. This article presents the results of some experiments I have run, along with my personal feelings about the model. At the end of the article, you will find links to the source code and to the research papers I used.
My implementation of the Active Appearance Models (AAMs) in C++ is almost done, it is called Paamela. I am currently fixing a couple design issues and finishing up the documentation. Even though I am still not sure whether or not I will make the code open source, I thought it would be nice to share what I have developed so far, in order to help other developers working on similar problems.
As part my C++ implementation of the AAMs (Active Appearance Models), I have been using OpenCV 2.1 full-time for about four weeks now. I thought I would share my feeling of the pros and cons about OpenCV. After reading this post, you should be able to know whether OpenCV is the solution you need or not.
In this article, I present a small raw socket daemon coded in C called “Knock-knock”, that allows port knocking to secure services under Linux. I have been using Knock-knock daily on a Ubuntu server for two years without any problem, and the code is small and simple enough for anybody to understand it.
Counting with MapReduce seems straightforward. All what is needed is to map the pairs to the same intermediate key, and leave the reduce take care of counting all the items. But wait, what if we have millions of items? Then one reducer, that is to say one process on one computer, will be forced to handle millions of pairs at once. Nonetheless this is going to be very slow, and all the interest of having a cluster will be missed, but there is something more important: what if the data is too big to fit in memory? Here I am showing how to count elements using MapReduce in a way that really split up the task between multiple workers.
The one-iteration solution
Let us have a look at the solution discussed above. This solution counts items in a data in only one MapReduce iteration. Note that the values are replace with the value 1. Indeed, as counting does not require to keep track of the values, they are all changed to a common simple value to simplify computations. This solution seems pretty sweet, except that as we can see on Figure 1, reducing all the pairs to the same intermediate key gives one reducer, and one reducer only, a huge workload for counting the items. This can be efficient if the dataset is small. But there are cases in which the dataset is so big that it does not even fit into the memory of a single computer, or maybe it is so big that the computation on only one reducer is going to be very slow, and we need to know the count as soon as possible. As we will see in the next section, there is a way to improve workload balance along with computation time, at the cost of an additional iteration.