[linux] Get exit code of a background process

I have a command CMD called from my main bourne shell script that takes forever.

I want to modify the script as follows:

  1. Run the command CMD in parallel as a background process (CMD &).
  2. In the main script, have a loop to monitor the spawned command every few seconds. The loop also echoes some messages to stdout indicating progress of the script.
  3. Exit the loop when the spawned command terminates.
  4. Capture and report the exit code of the spawned process.

Can someone give me pointers to accomplish this?

This question is related to linux shell unix process

The answer is


My solution was to use an anonymous pipe to pass the status to a monitoring loop. There are no temporary files used to exchange status so nothing to cleanup. If you were uncertain about the number of background jobs the break condition could be [ -z "$(jobs -p)" ].

#!/bin/bash

exec 3<> <(:)

{ sleep 15 ; echo "sleep/exit $?" >&3 ; } &

while read -u 3 -t 1 -r STAT CODE || STAT="timeout" ; do
    echo "stat: ${STAT}; code: ${CODE}"
    if [ "${STAT}" = "sleep/exit" ] ; then
        break
    fi
done

A simple example, similar to the solutions above. This doesn't require monitoring any process output. The next example uses tail to follow output.

$ echo '#!/bin/bash' > tmp.sh
$ echo 'sleep 30; exit 5' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh &
[1] 7454
$ pid=$!
$ wait $pid
[1]+  Exit 5                  ./tmp.sh
$ echo $?
5

Use tail to follow process output and quit when the process is complete.

$ echo '#!/bin/bash' > tmp.sh
$ echo 'i=0; while let "$i < 10"; do sleep 5; echo "$i"; let i=$i+1; done; exit 5;' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh
0
1
2
^C
$ ./tmp.sh > /tmp/tmp.log 2>&1 &
[1] 7673
$ pid=$!
$ tail -f --pid $pid /tmp/tmp.log
0
1
2
3
4
5
6
7
8
9
[1]+  Exit 5                  ./tmp.sh > /tmp/tmp.log 2>&1
$ wait $pid
$ echo $?
5

I would change your approach slightly. Rather than checking every few seconds if the command is still alive and reporting a message, have another process that reports every few seconds that the command is still running and then kill that process when the command finishes. For example:

#!/bin/sh

cmd() { sleep 5; exit 24; }

cmd &   # Run the long running process
pid=$!  # Record the pid

# Spawn a process that coninually reports that the command is still running
while echo "$(date): $pid is still running"; do sleep 1; done &
echoer=$!

# Set a trap to kill the reporter when the process finishes
trap 'kill $echoer' 0

# Wait for the process to finish
if wait $pid; then
    echo "cmd succeeded"
else
    echo "cmd FAILED!! (returned $?)"
fi

With this method, your script doesnt have to wait for the background process, you will only have to monitor a temporary file for the exit status.

FUNCmyCmd() { sleep 3;return 6; };

export retFile=$(mktemp); 
FUNCexecAndWait() { FUNCmyCmd;echo $? >$retFile; }; 
FUNCexecAndWait&

now, your script can do anything else while you just have to keep monitoring the contents of retFile (it can also contain any other information you want like the exit time).

PS.: btw, I coded thinking in bash


1: In bash, $! holds the PID of the last background process that was executed. That will tell you what process to monitor, anyway.

4: wait <n> waits until the process with PID <n> is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process.

2, 3: ps or ps | grep " $! " can tell you whether the process is still running. It is up to you how to understand the output and decide how close it is to finishing. (ps | grep isn't idiot-proof. If you have time you can come up with a more robust way to tell whether the process is still running).

Here's a skeleton script:

# simulate a long process that will have an identifiable exit code
(sleep 15 ; /bin/false) &
my_pid=$!

while   ps | grep " $my_pid "     # might also need  | grep -v grep  here
do
    echo $my_pid is still in the ps output. Must still be running.
    sleep 3
done

echo Oh, it looks like the process is done.
wait $my_pid
# The variable $? always holds the exit code of the last command to finish.
# Here it holds the exit code of $my_pid, since wait exits with that code. 
my_status=$?
echo The exit status of the process was $my_status

Another solution is to monitor processes via the proc filesystem (safer than ps/grep combo); when you start a process it has a corresponding folder in /proc/$pid, so the solution could be

#!/bin/bash
....
doSomething &
local pid=$!
while [ -d /proc/$pid ]; do # While directory exists, the process is running
    doSomethingElse
    ....
else # when directory is removed from /proc, process has ended
    wait $pid
    local exit_status=$?
done
....

Now you can use the $exit_status variable however you like.


The pid of a backgrounded child process is stored in $!. You can store all child processes' pids into an array, e.g. PIDS[].

wait [-n] [jobspec or pid …]

Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last command waited for. If a job spec is given, all processes in the job are waited for. If no arguments are given, all currently active child processes are waited for, and the return status is zero. If the -n option is supplied, wait waits for any job to terminate and returns its exit status. If neither jobspec nor pid specifies an active child process of the shell, the return status is 127.

Use wait command you can wait for all child processes finish, meanwhile you can get exit status of each child processes via $? and store status into STATUS[]. Then you can do something depending by status.

I have tried the following 2 solutions and they run well. solution01 is more concise, while solution02 is a little complicated.

solution01

#!/bin/bash

# start 3 child processes concurrently, and store each pid into array PIDS[].
process=(a.sh b.sh c.sh)
for app in ${process[@]}; do
  ./${app} &
  PIDS+=($!)
done

# wait for all processes to finish, and store each process's exit code into array STATUS[].
for pid in ${PIDS[@]}; do
  echo "pid=${pid}"
  wait ${pid}
  STATUS+=($?)
done

# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[@]}; do
  if [[ ${st} -ne 0 ]]; then
    echo "$i failed"
  else
    echo "$i finish"
  fi
  ((i+=1))
done

solution02

#!/bin/bash

# start 3 child processes concurrently, and store each pid into array PIDS[].
i=0
process=(a.sh b.sh c.sh)
for app in ${process[@]}; do
  ./${app} &
  pid=$!
  PIDS[$i]=${pid}
  ((i+=1))
done

# wait for all processes to finish, and store each process's exit code into array STATUS[].
i=0
for pid in ${PIDS[@]}; do
  echo "pid=${pid}"
  wait ${pid}
  STATUS[$i]=$?
  ((i+=1))
done

# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[@]}; do
  if [[ ${st} -ne 0 ]]; then
    echo "$i failed"
  else
    echo "$i finish"
  fi
  ((i+=1))
done

Our team had the same need with a remote SSH-executed script which was timing out after 25 minutes of inactivity. Here is a solution with the monitoring loop checking the background process every second, but printing only every 10 minutes to suppress an inactivity timeout.

long_running.sh & 
pid=$!

# Wait on a background job completion. Query status every 10 minutes.
declare -i elapsed=0
# `ps -p ${pid}` works on macOS and CentOS. On both OSes `ps ${pid}` works as well.
while ps -p ${pid} >/dev/null; do
  sleep 1
  if ((++elapsed % 600 == 0)); then
    echo "Waiting for the completion of the main script. $((elapsed / 60))m and counting ..."
  fi
done

# Return the exit code of the terminated background process. This works in Bash 4.4 despite what Bash docs say:
# "If neither jobspec nor pid specifies an active child process of the shell, the return status is 127."
wait ${pid}

This is how I solved it when I had a similar need:

# Some function that takes a long time to process
longprocess() {
        # Sleep up to 14 seconds
        sleep $((RANDOM % 15))
        # Randomly exit with 0 or 1
        exit $((RANDOM % 2))
}

pids=""
# Run five concurrent processes
for i in {1..5}; do
        ( longprocess ) &
        # store PID of process
        pids+=" $!"
done

# Wait for all processes to finish, will take max 14s
# as it waits in order of launch, not order of finishing
for p in $pids; do
        if wait $p; then
                echo "Process $p success"
        else
                echo "Process $p fail"
        fi
done

As I see almost all answers use external utilities (mostly ps) to poll the state of the background process. There is a more unixesh solution, catching the SIGCHLD signal. In the signal handler it has to be checked which child process was stopped. It can be done by kill -0 <PID> built-in (universal) or checking the existence of /proc/<PID> directory (Linux specific) or using the jobs built-in ( specific. jobs -l also reports the pid. In this case the 3rd field of the output can be Stopped|Running|Done|Exit . ).

Here is my example.

The launched process is called loop.sh. It accepts -x or a number as an argument. For -x is exits with exit code 1. For a number it waits num*5 seconds. In every 5 seconds it prints its PID.

The launcher process is called launch.sh:

#!/bin/bash

handle_chld() {
    local tmp=()
    for((i=0;i<${#pids[@]};++i)); do
        if [ ! -d /proc/${pids[i]} ]; then
            wait ${pids[i]}
            echo "Stopped ${pids[i]}; exit code: $?"
        else tmp+=(${pids[i]})
        fi
    done
    pids=(${tmp[@]})
}

set -o monitor
trap "handle_chld" CHLD

# Start background processes
./loop.sh 3 &
pids+=($!)
./loop.sh 2 &
pids+=($!)
./loop.sh -x &
pids+=($!)

# Wait until all background processes are stopped
while [ ${#pids[@]} -gt 0 ]; do echo "WAITING FOR: ${pids[@]}"; sleep 2; done
echo STOPPED

For more explanation see: Starting a process from bash script failed


#/bin/bash

#pgm to monitor
tail -f /var/log/messages >> /tmp/log&
# background cmd pid
pid=$!
# loop to monitor running background cmd
while :
do
    ps ax | grep $pid | grep -v grep
    ret=$?
    if test "$ret" != "0"
    then
        echo "Monitored pid ended"
        break
    fi
    sleep 5

done

wait $pid
echo $?

This may be extending beyond your question, however if you're concerned about the length of time processes are running for, you may be interested in checking the status of running background processes after an interval of time. It's easy enough to check which child PIDs are still running using pgrep -P $$, however I came up with the following solution to check the exit status of those PIDs that have already expired:

cmd1() { sleep 5; exit 24; }
cmd2() { sleep 10; exit 0; }

pids=()
cmd1 & pids+=("$!")
cmd2 & pids+=("$!")

lasttimeout=0
for timeout in 2 7 11; do
  echo -n "interval-$timeout: "
  sleep $((timeout-lasttimeout))

  # you can only wait on a pid once
  remainingpids=()
  for pid in ${pids[*]}; do
     if ! ps -p $pid >/dev/null ; then
        wait $pid
        echo -n "pid-$pid:exited($?); "
     else
        echo -n "pid-$pid:running; "
        remainingpids+=("$pid")
     fi
  done
  pids=( ${remainingpids[*]} )

  lasttimeout=$timeout
  echo
done

which outputs:

interval-2: pid-28083:running; pid-28084:running; 
interval-7: pid-28083:exited(24); pid-28084:running; 
interval-11: pid-28084:exited(0); 

Note: You could change $pids to a string variable rather than array to simplify things if you like.


Examples related to linux

grep's at sign caught as whitespace How to prevent Google Colab from disconnecting? "E: Unable to locate package python-pip" on Ubuntu 18.04 How to upgrade Python version to 3.7? Install Qt on Ubuntu Get first line of a shell command's output Cannot connect to the Docker daemon at unix:/var/run/docker.sock. Is the docker daemon running? Run bash command on jenkins pipeline How to uninstall an older PHP version from centOS7 How to update-alternatives to Python 3 without breaking apt?

Examples related to shell

Comparing a variable with a string python not working when redirecting from bash script Get first line of a shell command's output How to run shell script file using nodejs? Run bash command on jenkins pipeline Way to create multiline comments in Bash? How to do multiline shell script in Ansible How to check if a file exists in a shell script How to check if an environment variable exists and get its value? Curl to return http status code along with the response docker entrypoint running bash script gets "permission denied"

Examples related to unix

Docker CE on RHEL - Requires: container-selinux >= 2.9 What does `set -x` do? How to find files modified in last x minutes (find -mmin does not work as expected) sudo: npm: command not found How to sort a file in-place How to read a .properties file which contains keys that have a period character using Shell script gpg decryption fails with no secret key error Loop through a comma-separated shell variable Best way to find os name and version in Unix/Linux platform Resource u'tokenizers/punkt/english.pickle' not found

Examples related to process

Fork() function in C How to kill a nodejs process in Linux? Xcode process launch failed: Security Understanding PrimeFaces process/update and JSF f:ajax execute/render attributes Linux Script to check if process is running and act on the result CreateProcess error=2, The system cannot find the file specified How to make parent wait for all child processes to finish? How to use [DllImport("")] in C#? Visual Studio "Could not copy" .... during build How to terminate process from Python using pid?