Tuesday, July 16, 2013

When to use Hbase and when for MapReduce..?

Very often I do get a query on when to use HBase and when MapReduce. HBase provides an SQL like interface with Phoenix and MapReduce provides a similar SQL interface with Hive. Both can be used to get insights from the data.


I would like the analogy of HBase/MapReduce to an plane/train. A train can carry a lot of material at a slow pace, while a plane relatively can carry less material at a faster pace. Depending on the amount of the material to be transferred from one location to another and the urgency, either a plane or a train can me used to move material.

Similarly HBase (or in fact any database) provides relatively low latency (response time) at the cost of low throughput (data transferred/processed), while MapReduce provides high latency at the cost of high throughput. So, depending on the NFR (Non Functional Requirements) of the application either HBase or MapReduce can be picked.

E-commerce or any customer facing application requires a quick response time to the end user and and also only a few records related to the customer have to be picked, so HBase would fit the bill. But, for all the back end/batch processing MapReduce can be used.

No comments:

Post a Comment