How do I write a bash script to restart a process if it dies

Question

I have a python script that ll be checking a queue and performing an action on each item     checkqueue py while True    check queue     do something     How do I write a bash script that will check if it s running  and if not  start it   Roughly the following pseudo code  or maybe it should do something like ps   grep       keepalivescript sh if processidfile exists    if processid is running       exit  all ok  run checkqueue py write processid to processidfile   I ll call that from a crontab     crontab   5          path to keepalivescript sh

User · Answer

The easiest way to do it is using flock on file. In Python script you'd do

lf = open('/tmp/script.lock','w')
if(fcntl.flock(lf, fcntl.LOCK_EX|fcntl.LOCK_NB) != 0): 
   sys.exit('other instance already running')
lf.write('%d\n'%os.getpid())
lf.flush()

In shell you can actually test if it's running:

if [ `flock -xn /tmp/script.lock -c 'echo 1'` ]; then 
   echo 'it's not running'
   restart.
else
   echo -n 'it's already running with PID '
   cat /tmp/script.lock
fi

But of course you don't have to test, because if it's already running and you restart it, it'll exit with 'other instance already running'

When process dies, all it's file descriptors are closed and all locks are automatically removed.

User · Answer

if   test -f  PIDFILE      psgrep  cat  PIDFILE   then     restart process       Write PIDFILE     echo     gt  PIDFILE fi

User · Answer

You should use monit  a standard unix tool that can monitor different things on the system and react accordingly   From the docs  http   mmonit com monit documentation monit html pid testing   check process checkqueue py with pidfile  var run checkqueue pid        if changed pid then exec  checkqueue restart sh    You can also configure monit to email you when it does do a restart

User · Answer

I ve used the following script with great success on numerous servers   pid  jps -v   grep  INSTALLATION   awk   print  1    echo  INSTALLATION found at PID  pid  while   -e  proc  pid    do sleep 0 1  done   notes    It s looking for a java process  so I can use jps  this is much more consistent across distributions than ps  INSTALLATION contains enough of the process path that s it s totally unambiguous Use sleep while waiting for the process to die  avoid hogging resources      This script is actually used to shut down a running instance of tomcat  which I want to shut down  and wait for  at the command line  so launching it as a child process simply isn t an option for me

User · Answer

I m not sure how portable it is across operating systems  but you might check if your system contains the  run-one  command  i e   man run-one   Specifically  this set of commands includes  run-one-constantly   which seems to be exactly what is needed   From man page      run-one-constantly COMMAND  ARGS    Note  obviously this could be called from within your script  but also it removes the need for having a script at all

User · Answer

Have a look at monit  http   mmonit com monit    It handles start  stop and restart of your script and can do health checks plus restarts if necessary   Or do a simple script   while true do  your script sleep 1 done

User · Answer

I use this for my npm Process     bin bash for           do date    T  echo Start Process cd  toFolder sudo process date    T  echo Crash sleep 1 done

User · Answer

Avoid PID-files  crons  or anything else that tries to evaluate processes that aren t their children   There is a very good reason why in UNIX  you can ONLY wait on your children   Any method  ps parsing  pgrep  storing a PID       that tries to work around that is flawed and has gaping holes in it   Just say no   Instead you need the process that monitors your process to be the process  parent   What does this mean   It means only the process that starts your process can reliably wait for it to end   In bash  this is absolutely trivial   until myserver  do     echo  Server  myserver  crashed with exit code      Respawning     gt  amp 2     sleep 1 done   The above piece of bash code runs myserver in an until loop   The first line starts myserver and waits for it to end   When it ends  until checks its exit status   If the exit status is 0  it means it ended gracefully  which means you asked it to shut down somehow  and it did so successfully    In that case we don t want to restart it  we just asked it to shut down     If the exit status is not 0  until will run the loop body  which emits an error message on STDERR and restarts the loop  back to line 1  after 1 second   Why do we wait a second   Because if something s wrong with the startup sequence of myserver and it crashes immediately  you ll have a very intensive loop of constant restarting and crashing on your hands   The sleep 1 takes away the strain from that   Now all you need to do is start this bash script  asynchronously  probably   and it will monitor myserver and restart it as necessary   If you want to start the monitor on boot  making the server  survive  reboots   you can schedule it in your user s cron 1  with an  reboot rule   Open your cron rules with crontab   crontab -e   Then add a rule to start your monitor script    reboot  usr local bin myservermonitor     Alternatively  look at inittab 5  and  etc inittab   You can add a line in there to have myserver start at a certain init level and be respawned automatically     Edit   Let me add some information on why not to use PID files   While they are very popular  they are also very flawed and there s no reason why you wouldn t just do it the correct way   Consider this    PID recycling  killing the wrong process      etc init d foo start  start foo  write foo s PID to  var run foo pid A while later  foo dies somehow  A while later  any random process that starts  call it bar  takes a random PID  imagine it taking foo s old PID  You notice foo s gone   etc init d foo restart reads  var run foo pid  checks to see if it s still alive  finds bar  thinks it s foo  kills it  starts a new foo   PID files go stale   You need over-complicated  or should I say  non-trivial  logic to check whether the PID file is stale  and any such logic is again vulnerable to 1   What if you don t even have write access or are in a read-only environment  It s pointless overcomplication  see how simple my example above is   No need to complicate that  at all    See also  Are PID-files still flawed when doing it   39 right  39    By the way  even worse than PID files is parsing ps   Don t ever do this    ps is very unportable   While you find it on almost every UNIX system  its arguments vary greatly if you want non-standard output   And standard output is ONLY for human consumption  not for scripted parsing  Parsing ps leads to a LOT of false positives   Take the ps aux   grep PID example  and now imagine someone starting a process with a number somewhere as argument that happens to be the same as the PID you stared your daemon with   Imagine two people starting an X session and you grepping for X to kill yours   It s just all kinds of bad    If you don t want to manage the process yourself  there are some perfectly good systems out there that will act as monitor for your processes   Look into runit  for example

User · Answer

In-line  while true  do  lt your-bash-snippet gt   amp  amp  break  done   e g  while true  do openconnect x x x x xxxx  amp  amp  break  done

[bash] How do I write a bash script to restart a process if it dies?

Examples related to bash

Examples related to scripting

Examples related to cron