[linux] Linux Process States

In Linux, what happens to the state of a process when it needs to read blocks from a disk? Is it blocked? If so, how is another process chosen to execute?

This question is related to linux process kernel cpu states

The answer is


As already explained by others, processes in "D" state (uninterruptible sleep) are responsible for the hang of ps process. To me it has happened many times with RedHat 6.x and automounted NFS home directories.

To list processes in D state you can use the following commands:

cd /proc
for i in [0-9]*;do echo -n "$i :";cat $i/status |grep ^State;done|grep D

To know the current directory of the process and, may be, the mounted NFS disk that has issues you can use a command similar to the following example (replace 31134 with the sleeping process number):

# ls -l /proc/31134/cwd
lrwxrwxrwx 1 pippo users 0 Aug  2 16:25 /proc/31134/cwd -> /auto/pippo

I found that giving the umount command with the -f (force) switch, to the related mounted nfs file system, was able to wake-up the sleeping process:

umount -f /auto/pippo

the file system wasn't unmounted, because it was busy, but the related process did wake-up and I was able to solve the issue without rebooting.


Yes, tasks waiting for IO are blocked, and other tasks get executed. Selecting the next task is done by the Linux scheduler.


Assuming your process is a single thread, and that you're using blocking I/O, your process will block waiting for the I/O to complete. The kernel will pick another process to run in the meantime based on niceness, priority, last run time, etc. If there are no other runnable processes, the kernel won't run any; instead, it'll tell the hardware the machine is idle (which will result in lower power consumption).

Processes that are waiting for I/O to complete typically show up in state D in, e.g., ps and top.


Generally the process will block. If the read operation is on a file descriptor marked as non-blocking or if the process is using asynchronous IO it won't block. Also if the process has other threads that aren't blocked they can continue running.

The decision as to which process runs next is up to the scheduler in the kernel.


Yes, the task gets blocked in the read() system call. Another task which is ready runs, or if no other tasks are ready, the idle task (for that CPU) runs.

A normal, blocking disc read causes the task to enter the "D" state (as others have noted). Such tasks contribute to the load average, even though they're not consuming the CPU.

Some other types of IO, especially ttys and network, do not behave quite the same - the process ends up in "S" state and can be interrupted and doesn't count against the load average.


When a process needs to fetch data from a disk, it effectively stops running on the CPU to let other processes run because the operation might take a long time to complete – at least 5ms seek time for a disk is common, and 5ms is 10 million CPU cycles, an eternity from the point of view of the program!

From the programmer point of view (also said "in userspace"), this is called a blocking system call. If you call write(2) (which is a thin libc wrapper around the system call of the same name), your process does not exactly stop at that boundary; it continues, in the kernel, running the system call code. Most of the time it goes all the way up to a specific disk controller driver (filename ? filesystem/VFS ? block device ? device driver), where a command to fetch a block on disk is submitted to the proper hardware, which is a very fast operation most of the time.

THEN the process is put in sleep state (in kernel space, blocking is called sleeping – nothing is ever 'blocked' from the kernel point of view). It will be awakened once the hardware has finally fetched the proper data, then the process will be marked as runnable and will be scheduled. Eventually, the scheduler will run the process.

Finally, in userspace, the blocking system call returns with proper status and data, and the program flow goes on.

It is possible to invoke most I/O system calls in non-blocking mode (see O_NONBLOCK in open(2) and fcntl(2)). In this case, the system calls return immediately and only report submitting the disk operation. The programmer will have to explicitly check at a later time whether the operation completed, successfully or not, and fetch its result (e.g., with select(2)). This is called asynchronous or event-based programming.

Most answers here mentioning the D state (which is called TASK_UNINTERRUPTIBLE in the Linux state names) are incorrect. The D state is a special sleep mode which is only triggered in a kernel space code path, when that code path can't be interrupted (because it would be too complex to program), with the expectation that it would block only for a very short time. I believe that most "D states" are actually invisible; they are very short lived and can't be observed by sampling tools such as 'top'.

You can encounter unkillable processes in the D state in a few situations. NFS is famous for that, and I've encountered it many times. I think there's a semantic clash between some VFS code paths, which assume to always reach local disks and fast error detection (on SATA, an error timeout would be around a few 100 ms), and NFS, which actually fetches data from the network which is more resilient and has slow recovery (a TCP timeout of 300 seconds is common). Read this article for the cool solution introduced in Linux 2.6.25 with the TASK_KILLABLE state. Before this era there was a hack where you could actually send signals to NFS process clients by sending a SIGKILL to the kernel thread rpciod, but forget about that ugly trick.…


A process performing I/O will be put in D state (uninterruptable sleep), which frees the CPU until there is a hardware interrupt which tells the CPU to return to executing the program. See man ps for the other process states.

Depending on your kernel, there is a process scheduler, which keeps track of a runqueue of processes ready to execute. It, along with a scheduling algorithm, tells the kernel which process to assign to which CPU. There are kernel processes and user processes to consider. Each process is allocated a time-slice, which is a chunk of CPU time it is allowed to use. Once the process uses all of its time-slice, it is marked as expired and given lower priority in the scheduling algorithm.

In the 2.6 kernel, there is a O(1) time complexity scheduler, so no matter how many processes you have up running, it will assign CPUs in constant time. It is more complicated though, since 2.6 introduced preemption and CPU load balancing is not an easy algorithm. In any case, it’s efficient and CPUs will not remain idle while you wait for the I/O.


Examples related to linux

grep's at sign caught as whitespace How to prevent Google Colab from disconnecting? "E: Unable to locate package python-pip" on Ubuntu 18.04 How to upgrade Python version to 3.7? Install Qt on Ubuntu Get first line of a shell command's output Cannot connect to the Docker daemon at unix:/var/run/docker.sock. Is the docker daemon running? Run bash command on jenkins pipeline How to uninstall an older PHP version from centOS7 How to update-alternatives to Python 3 without breaking apt?

Examples related to process

Fork() function in C How to kill a nodejs process in Linux? Xcode process launch failed: Security Understanding PrimeFaces process/update and JSF f:ajax execute/render attributes Linux Script to check if process is running and act on the result CreateProcess error=2, The system cannot find the file specified How to make parent wait for all child processes to finish? How to use [DllImport("")] in C#? Visual Studio "Could not copy" .... during build How to terminate process from Python using pid?

Examples related to kernel

How do I convert dmesg timestamp to custom date format? What is the difference between the kernel space and the user space? What is difference between monolithic and micro kernel? What is the difference between the operating system and the kernel? What is the difference between Trap and Interrupt? "FATAL: Module not found error" using modprobe Interpreting segfault messages What is an OS kernel ? How does it differ from an operating system? Linux Process States What is __gxx_personality_v0 for?

Examples related to cpu

Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2 Difference between core and processor How to enable support of CPU virtualization on Macbook Pro? How to get overall CPU usage (e.g. 57%) on Linux How to obtain the number of CPUs/cores in Linux from the command line? How to create a CPU spike with a bash command How to fast get Hardware-ID in C#? Optimal number of threads per core Linux Process States How to write super-fast file-streaming code in C#?

Examples related to states

onSaveInstanceState () and onRestoreInstanceState () Linux Process States