JobTracker & TaskTrackers in Hadoop

Hadoop System

Working Of JobTracker and TaskTracker

Job Tracker and Task Tracker are 2 essential process involved in Map Reduce execution in MRv1 (or Hadoop version 1). Both processes are now deprecated in MRv2 (or Hadoop version 2) and replaced by Resource Manager, Application Master and Node Manager Daemons.

Job Tracker -

1. Job Tracker process runs on a separate node and not usually on a Data Node.

2. Job Tracker is an essential Daemon for MapReduce execution in MRv1. It is replaced by ResourceManager/ApplicationMaster in MRv2.

3. JobTracker receives the requests for MapReduce execution from the client.

4. JobTracker talks to the NameNode to determine the location of the data.

5. JobTracker finds the best TaskTracker nodes to execute tasks based on the data locality (proximity of the data) and the available slots to execute a task on a given node.

6. JobTracker monitors the individual TaskTrackers and the submits back the overall status of the job back to the client.

7. Job Tracker process is critical to the Hadoop cluster in terms of MapReduce execution.

8. When the Job Tracker is down, HDFS will still be functional but the MapReduce execution can not be started and the existing MapReduce jobs will be halted.

TaskTracker -

1. TaskTracker runs on DataNode. Mostly on all DataNodes.

2. TaskTracker is replaced by Node Manager in MRv2.

3. Mapper and Reducer tasks are executed on DataNodes administered by TaskTrackers.

4. TaskTrackers will be assigned Mapper and Reducer tasks to execute by JobTracker.

5. TaskTracker will be in constant communication with the JobTracker signalling the progress of the task in execution.

6. TaskTracker failure is not considered fatal. When a TaskTracker becomes unresponsive, JobTracker will assign the task executed by the TaskTracker to another node.

Introduction to Hadoop Job Tracker