Coca Cola started in Atlanta Georgia. Coke products were all made in one plant and distributed locally in Georgia. As the popularity of Coke increased over the years, the Coca Cola Company had to change the production and distribution of it's product in order to meet global demand. Now there are bottling plants all over the world that use raw materials and Coke recipes to produce and distribute billions of servings of Coke products a year. As your data grows like the demand for Coke, one plant will not be enough to satisfy your data production needs. Enter Hadoop.
Hadoop is a distributed data crunching architecture very much like Coke's bottling and distribution network. So, why is it impractical to just scale one bottling plant as demand grows? Let's see. Physical infrastructure becomes a problem. As production scales up, you need more raw material. This means a larger loading dock, more trucks, and larger roads to accommodate increased traffic. You also need more machinery, more power, and more people to run the plant. Of course all of this can scale to a point but could you imagine the infrastructure that Coca Cola would need in Atlanta to produce all of the billions of cans of Coke that they sell a year?
You also need a delivery network that delivers product in a timely manner. This is more problematic. Delivering Coke to new markets that are further away from the plant means more trucks and aged product. For all of these reasons Coca Cola made a transformational decision to ship raw material all over the country and eventually world and have regional plants bottle Coke locally. You can do the same thing with data.
Instead of having one large plant that has a finite production capability, Hadoop allows you to have thousands of smaller distributed plants that work together in a network of data production. This solves many big data problems in the same way that Coke solved their production problem. A Hadoop cluster of computers is made up of as many production plants as you need to process your data. Your raw material is your data. Hadoop automatically distributes this raw material to all of your data production plants or nodes. When you run a Hadoop job, instead of pulling data to the program, it pushes the program to the data just like a recipe would be distributed to all of the bottling plants that produce Coke. This approach has many benefits over large singular databases or having one huge bottling plant.
Increase capacity without affecting production
Adding a new plant has no impact on the other plants in the network. When a new plant comes online, the Hadoop system automatically distributes raw material to it and sends it data crunching recipes so that it can immediately increase capacity.
Upgrade individual plants without halting production
If there is a new conveyor belt that can increase the capacity of a plant, while one plant is being upgraded the rest can increase production slightly in order to absorb the temporarily reduced capacity.
Absorb plant failures
With one large system, if something catastrophic happens, all production stops. With a bottling plant network, if a plant in Wisconsin has to halt production because of local flooding, the plants in Illinois and Ohio can ratchet up capacity temporarily to meet demand while flood waters subside.
And then there's scalability
If I have one big plant that that is at capacity, what do I do? Do I build another big plant and double my capacity even though I may initially only need to increase production by 5%? Since the plants are smaller, Hadoop allows you to add what you need when you need it.
There is much more to Hadoop than described here, but I think that this gives you a good idea as to why you should take a look at it if you have big data that you want to exploit in some way. This distribution and production technique did wonders for Coca Cola, just think of what it could do for you.
0 TrackBacks
Listed below are links to blogs that reference this entry: Hadoop for Managers.
TrackBack URL for this entry: http://www.nearinfinity.com/mt/mt-tb.cgi/1662



Leave a comment