Introduction to Hadoop 


Hadoop is Open source Apache Framework which is written in java programming language. Hadoop allows distributed processing of large data-sets across many computers which is using programming models.

In 2008, Yahoo released Hadoop as an open-source Project. Now, Apache Software Foundation (ASF) maintained and managed the framework ecosystem of technologies of Hadoop a known global community of software developers.

About the course

Hadoop is designed to scale up from single server to thousands of machines. The Apache Hadoop consists of modules namely Hadoop Distributed File System (HDFS), Hadoop yarn, Hadoop common and Hadoop MapReduce. Hadoop is the collection of software package like Apache Pig, Apache Hive, Apache Hbase, Apache Spark, Apache Zookeeper, Apache Sqoop.


  • Basics of Java

What are the important features of Hadoop?

  • Scalability: Liable to handle more data by adding more number of nodes
  • Low cost: Hadoop is an open source framework and uses commodity hardware to store large amount of data.
  • Fault Tolerance: All the applications and data are protected irrespective of any kind of hardware failure.
  • Computing power: The more are computing nodes the more is computing power as it is able to store more data.

Working steps of Hadoop

  • Stage1: In this stage user/application can submit the job at different file location, such as DFS file, jar files
  • Stage2: In this stage Hadoop client submits the job in job tracker and then followed by   different steps like compilation, monitoring and diagnosis
  • Stage3: In the stage task tracker executes the task as per MapReduce implementation and output the reduce function in the output file of file system

Leave a Comment

Your email address will not be published. Required fields are marked *