[hadoop] how to kill hadoop jobs

I want to kill all my hadoop jobs automatically when my code encounters an unhandled exception. I am wondering what is the best practice to do it?

Thanks

This question is related to hadoop kill jobs

The answer is


Depending on the version, do:

version <2.3.0

Kill a hadoop job:

hadoop job -kill $jobId

You can get a list of all jobId's doing:

hadoop job -list

version >=2.3.0

Kill a hadoop job:

yarn application -kill $ApplicationId

You can get a list of all ApplicationId's doing:

yarn application -list

Simply forcefully kill the process ID, the hadoop job will also be killed automatically . Use this command:

kill -9 <process_id> 

eg: process ID no: 4040 namenode

username@hostname:~$ kill -9 4040

Use below command to kill all jobs running on yarn.

For accepted jobs use below command.

for x in $(yarn application -list -appStates ACCEPTED | awk 'NR > 2 { print $1 }'); do yarn application -kill $x; done

For running, jobs use the below command.

for x in $(yarn application -list -appStates RUNNING | awk 'NR > 2 { print $1 }'); do yarn application -kill $x; done


Use of folloing command is depreciated

hadoop job -list
hadoop job -kill $jobId

consider using

mapred job -list
mapred job -kill $jobId

Run list to show all the jobs, then use the jobID/applicationID in the appropriate command.

Kill mapred jobs:

mapred job -list
mapred job -kill <jobId>

Kill yarn jobs:

yarn application -list
yarn application -kill <ApplicationId>

An unhandled exception will (assuming it's repeatable like bad data as opposed to read errors from a particular data node) eventually fail the job anyway.

You can configure the maximum number of times a particular map or reduce task can fail before the entire job fails through the following properties:

  • mapred.map.max.attempts - The maximum number of attempts per map task. In other words, framework will try to execute a map task these many number of times before giving up on it.
  • mapred.reduce.max.attempts - Same as above, but for reduce tasks

If you want to fail the job out at the first failure, set this value from its default of 4 to 1.