Hadoop is designed to run on commodity hardware and can scale up or down without system interruption. It consists of three main functions: storage, processing and resource management.
Processing – MapReduce Computation in Hadoop is based on the MapReduce paradigm that distributes tasks across a cluster of coordinated “nodes.” It was designed to run on commodity hardware and to scale up or down without system interruption.
Storage – HDFS Storage is accomplished with the Hadoop Distributed File System (HDFS) – a reliable and distributed file system that allows large volumes of data to be stored and rapidly accessed across large clusters of commodity servers.
Resource Management – YARN (New in Hadoop 2.0) YARN performs the resource management function in Hadoop 2.0 and extends MapReduce capabilities by supporting non-MapReduce workloads associated with other programming models. The YARN based architecture of Hadoop 2 is the most significant change introduced to the Hadoop project.