What killed my process and why

Question

My application runs as a background process on Linux  It is currently started at the command line in a Terminal window   Recently a user was executing the application for a while and it died mysteriously  The text      Killed   was on the terminal  This happened two times  I asked if someone at a different Terminal used the kill command to kill the process  No   Under what conditions would Linux decide to kill my process   I believe the shell displayed  killed  because the process died after receiving the kill 9  signal   If Linux sent the kill signal should there be a message in a system log somewhere that explains why it was killed

User · Answer

This looks like a good article on the subject: Taming the OOM killer.

The gist is that Linux overcommits memory. When a process asks for more space, Linux will give it that space, even if it is claimed by another process, under the assumption that nobody actually uses all of the memory they ask for. The process will get exclusive use of the memory it has allocated when it actually uses it, not when it asks for it. This makes allocation quick, and might allow you to "cheat" and allocate more memory than you really have. However, once processes start using this memory, Linux might realize that it has been too generous in allocating memory it doesn't have, and will have to kill off a process to free some up. The process to be killed is based on a score taking into account runtime (long-running processes are safer), memory usage (greedy processes are less safe), and a few other factors, including a value you can adjust to make a process less likely to be killed. It's all described in the article in a lot more detail.

Edit: And here is another article that explains pretty well how a process is chosen (annotated with some kernel code examples). The great thing about this is that it includes some commentary on the reasoning behind the various badness() rules.

User · Answer

The user has the ability to kill his own programs  using kill or Control C  but I get the impression that s not what happened  and that the user complained to you   root has the ability to kill programs of course  but if someone has root on your machine and is killing stuff you have bigger problems   If you are not the sysadmin  the sysadmin may have set up quotas on CPU  RAM  ort disk usage and auto-kills processes that exceed them   Other than those guesses  I m not sure without more info about the program

User · Answer

In my case this was happening with a Laravel queue worker  The system logs did not mention any killing so I looked further and it turned out that the worker was basically killing itself because of a job that exceeded the memory limit  which is set to 128M by default    Running the queue worker with --timeout 600 and --memory 1024 fixed the problem for me

User · Answer

As dwc and Adam Jaskiewicz have stated  the culprit is likely the OOM Killer  However  the next question that follows is  How do I prevent this   There are several ways    Give your system more RAM if you can  easy if its a VM  Make sure the OOM killer chooses a different process  Disable the OOM Killer Choose a Linux distro which ships with the OOM Killer disabled    I found  2  to be especially easy to implement  thanks to this article

User · Answer

We have had recurring problems under Linux at a customer site  Red Hat  I think   with OOMKiller  out-of-memory killer  killing both our principle application  i e  the reason the server exists  and it s data base processes     In each case OOMKiller simply decided that the processes were using to much resources    the machine wasn t even about to fail for lack of resources   Neither the application nor it s database has problems with memory leaks  or any other resource leak    I am not a Linux expert  but I rather gathered it s algorithm for deciding when to kill something and what to kill is complex   Also  I was told  I can t speak as to the accuracy of this  that OOMKiller is baked into the Kernel and you can t simply not run it

User · Answer

Try   dmesg -T  grep -E -i -B100  killed process    Where -B100 signifies the number of lines before the kill happened   Omit -T on Mac OS

User · Answer

If the user or sysadmin did not kill the program the kernel may have  The kernel would only kill a process under exceptional circumstances such as extreme resource starvation  think mem swap exhaustion

User · Answer

This is the Linux out of memory manager  OOM   Your process was selected due to  badness  - a combination of recentness  resident size  memory in use  rather than just allocated  and other factors   sudo journalctl -xb   You ll see a message like   Jul 20 11 05 00 someapp kernel  Mem-Info  Jul 20 11 05 00 someapp kernel  Node 0 DMA per-cpu  Jul 20 11 05 00 someapp kernel  CPU    0  hi     0  btch    1 usd    0 Jul 20 11 05 00 someapp kernel  Node 0 DMA32 per-cpu  Jul 20 11 05 00 someapp kernel  CPU    0  hi   186  btch   31 usd   30 Jul 20 11 05 00 someapp kernel  active anon 206043 inactive anon 6347 isolated anon 0                                     active file 722 inactive file 4126 isolated file 0                                     unevictable 0 dirty 5 writeback 0 unstable 0                                     free 12202 slab reclaimable 3849 slab unreclaimable 14574                                     mapped 792 shmem 12802 pagetables 1651 bounce 0                                     free cma 0 Jul 20 11 05 00 someapp kernel  Node 0 DMA free 4576kB min 708kB low 884kB high 1060kB active anon 10012kB inactive anon 488kB active file 4kB inactive file 4kB unevictable 0kB isolated anon  0kB isolated file  0kB present Jul 20 11 05 00 someapp kernel  lowmem reserve    0 968 968 968 Jul 20 11 05 00 someapp kernel  Node 0 DMA32 free 44232kB min 44344kB low 55428kB high 66516kB active anon 814160kB inactive anon 24900kB active file 2884kB inactive file 16500kB unevictable 0kB isolated anon  0kB isolated Jul 20 11 05 00 someapp kernel  lowmem reserve    0 0 0 0 Jul 20 11 05 00 someapp kernel  Node 0 DMA  17 4kB  UEM  22 8kB  UEM  15 16kB  UEM  12 32kB  UEM  8 64kB  E  9 128kB  UEM  2 256kB  UE  3 512kB  UM  0 1024kB 0 2048kB 0 4096kB   4580kB Jul 20 11 05 00 someapp kernel  Node 0 DMA32  216 4kB  UE  601 8kB  UE  448 16kB  UE  311 32kB  UEM  135 64kB  UEM  74 128kB  UEM  5 256kB  EM  0 512kB 0 1024kB 1 2048kB  R  0 4096kB   44232kB Jul 20 11 05 00 someapp kernel  Node 0 hugepages total 0 hugepages free 0 hugepages surp 0 hugepages size 2048kB Jul 20 11 05 00 someapp kernel  17656 total pagecache pages Jul 20 11 05 00 someapp kernel  0 pages in swap cache Jul 20 11 05 00 someapp kernel  Swap cache stats  add 0  delete 0  find 0 0 Jul 20 11 05 00 someapp kernel  Free swap    0kB Jul 20 11 05 00 someapp kernel  Total swap   0kB Jul 20 11 05 00 someapp kernel  262141 pages RAM Jul 20 11 05 00 someapp kernel  7645 pages reserved Jul 20 11 05 00 someapp kernel  264073 pages shared Jul 20 11 05 00 someapp kernel  240240 pages non-shared Jul 20 11 05 00 someapp kernel    pid     uid  tgid total vm      rss nr ptes swapents oom score adj name Jul 20 11 05 00 someapp kernel     241      0   241    13581     1610      26        0             0 systemd-journal Jul 20 11 05 00 someapp kernel     246      0   246    10494      133      22        0         -1000 systemd-udevd Jul 20 11 05 00 someapp kernel     264      0   264    29174      121      26        0         -1000 auditd Jul 20 11 05 00 someapp kernel     342      0   342    94449      466      67        0             0 NetworkManager Jul 20 11 05 00 someapp kernel     346      0   346   137495     3125      88        0             0 tuned Jul 20 11 05 00 someapp kernel     348      0   348    79595      726      60        0             0 rsyslogd Jul 20 11 05 00 someapp kernel     353     70   353     6986       72      19        0             0 avahi-daemon Jul 20 11 05 00 someapp kernel     362     70   362     6986       58      18        0             0 avahi-daemon Jul 20 11 05 00 someapp kernel     378      0   378     1621       25       8        0             0 iprinit Jul 20 11 05 00 someapp kernel     380      0   380     1621       26       9        0             0 iprupdate Jul 20 11 05 00 someapp kernel     384     81   384     6676      142      18        0          -900 dbus-daemon Jul 20 11 05 00 someapp kernel     385      0   385     8671       83      21        0             0 systemd-logind Jul 20 11 05 00 someapp kernel     386      0   386    31573      153      15        0             0 crond Jul 20 11 05 00 someapp kernel     391    999   391   128531     2440      48        0             0 polkitd Jul 20 11 05 00 someapp kernel     400      0   400     9781       23       8        0             0 iprdump Jul 20 11 05 00 someapp kernel     419      0   419    27501       32      10        0             0 agetty Jul 20 11 05 00 someapp kernel     855      0   855    22883      258      43        0             0 master Jul 20 11 05 00 someapp kernel     862     89   862    22926      254      44        0             0 qmgr Jul 20 11 05 00 someapp kernel   23631      0 23631    20698      211      43        0         -1000 sshd Jul 20 11 05 00 someapp kernel   12884      0 12884    81885     3754      80        0             0 firewalld Jul 20 11 05 00 someapp kernel   18130      0 18130    33359      291      65        0             0 sshd Jul 20 11 05 00 someapp kernel   18132   1000 18132    33791      748      64        0             0 sshd Jul 20 11 05 00 someapp kernel   18133   1000 18133    28867      122      13        0             0 bash Jul 20 11 05 00 someapp kernel   18428     99 18428   208627    42909     151        0             0 node Jul 20 11 05 00 someapp kernel   18486     89 18486    22909      250      46        0             0 pickup Jul 20 11 05 00 someapp kernel   18515   1000 18515   352905   141851     470        0             0 npm Jul 20 11 05 00 someapp kernel   18520      0 18520    33359      291      66        0             0 sshd Jul 20 11 05 00 someapp kernel   18522   1000 18522    33359      294      64        0             0 sshd Jul 20 11 05 00 someapp kernel   18523   1000 18523    28866      115      12        0             0 bash Jul 20 11 05 00 someapp kernel  Out of memory  Kill process 18515  npm  score 559 or sacrifice child Jul 20 11 05 00 someapp kernel  Killed process 18515  npm  total-vm 1411620kB  anon-rss 567404kB  file-rss 0kB

User · Answer

A tool like systemtap  or a tracer  can monitor kernel signal-transmission logic and report   e g   https   sourceware org systemtap examples process sigmon stp    stap     sigmon stp -x 31994 SIGKILL    SPID     SNAME            RPID  RNAME            SIGNUM SIGNAME    5609     bash             31994 find             9      SIGKILL   The filtering if block in that script can be adjusted to taste  or eliminated to trace systemwide signal traffic   Causes can be further isolated by collecting backtraces  add a print backtrace   and or print ubacktrace   to the probe  for kernel- and userspace- respectively

User · Answer

In an lsf environment  interactive or otherwise  if the application exceeds memory utilization beyond some preset threshold by the admins on the queue or the resource request in submit to the queue the processes will be killed so other users don t fall victim to a potential run away  It doesn t always send an email when it does so  depending on how its set up   One solution in this case is to find a queue with larger resources or define larger resource requirements in the submission   You may also want to review man ulimit  Although I don t remember ulimit resulting in Killed its been a while since I needed that

User · Answer

Let me first explain when and why OOMKiller get invoked   Say you have 512 RAM   1GB Swap memory  So in theory  your CPU has access to total of 1 5GB of virtual memory   Now  for some time everything is running fine within 1 5GB of total memory  But all of sudden  or gradually  your system has started consuming more and more memory and it reached at a point around 95  of total memory used   Now say any process has requested large chunck of memory from the kernel  Kernel check for the available memory and find that there is no way it can allocate your process more memory  So it will try to free some memory calling invoking OOMKiller  http   linux-mm org OOM    OOMKiller has its own algorithm to score the rank for every process  Typically which process uses more memory becomes the victim to be killed   Where can I find logs of OOMKiller   Typically in  var log directory  Either  var log kern log or  var log dmesg  Hope this will help you   Some typical solutions    Increase memory  not swap   Find the memory leaks in your program and fix them  Restrict memory any process can consume  for example JVM memory can be restricted using JAVA OPTS   See the logs and google

User · Answer

I encountered this problem lately  Finally  I found my processes were killed just after Opensuse zypper update was called automatically  To disable zypper update solved my problem

User · Answer

Solved this issue by increasing swap size   https   askubuntu com questions 1075505 how-do-i-increase-swapfile-in-ubuntu-18-04

User · Answer

The PAM module to limit resources caused exactly the results you described  My process died mysteriously with the text Killed on the console window  No log output  neither in syslog nor in kern log  The top program helped me to discover that exactly after one minute of CPU usage my process gets killed

[linux] What killed my process and why?

Examples related to linux

Examples related to process

Examples related to kill

Examples related to signals