Skip to content

Tag: hadoop

Introduction to MapReduce

A few years back, thinking that you could have a cluster in your garage would have been crazy. Programming your own implementation of a reliable and powerful distributed system is feasible, but be ready to spend some months on it. Luckily, big companies and their need to handle increasing quantities of data led us to accessible solutions for cloud computing. The last groundbreaking solution in date, effective on clusters of cheap computers and developed by Google, is MapReduce. This article is yet another post on MapReduce, except that it is aimed at tech-savvy and non tech-savvy people, as it covers in details the different steps of a MapReduce iteration. It also explains how MapReduce is related to functional programming, why it enables parallel computing, and finally how the work is being distributed between workers during an iteration.