[linux] Maximum number of threads per process in Linux?

What is the maximum number of threads that can be created by a process under Linux?

How (if possible) can this value be modified?

This question is related to linux multithreading

The answer is


To set permanently,

vim /etc/sysctl.conf

and add

kernel.threads-max = "value"

To retrieve it:

cat /proc/sys/kernel/threads-max

To set it:

echo 123456789 | sudo tee -a /proc/sys/kernel/threads-max

123456789 = # of threads


It probably shouldn't matter. You are going to get much better performance designing your algorithm to use a fixed number of threads (eg, 4 or 8 if you have 4 or 8 processors). You can do this with work queues, asynchronous IO, or something like libevent.


In practical terms, the limit is usually determined by stack space. If each thread gets a 1MB stack (I can't remember if that is the default on Linux), then you a 32-bit system will run out of address space after 3000 threads (assuming that the last gb is reserved to the kernel).

However, you'll most likely experience terrible performance if you use more than a few dozen threads. Sooner or later, you get too much context-switching overhead, too much overhead in the scheduler, and so on. (Creating a large number of threads does little more than eat a lot of memory. But a lot of threads with actual work to do is going to slow you down as they're fighting for the available CPU time)

What are you doing where this limit is even relevant?


Depends on your system, just write a sample program [ by creating processes in a loop ] and check using ps axo pid,ppid,rss,vsz,nlwp,cmd. When it can no more create threads check nlwp count [ nlwp is the number threads ] voila you got your fool proof answer instead of going thru books


I think we missed another restriction which will also block the new thread creation, this is the kernel.pid_max limit.

root@myhost:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.7 LTS
Release:    16.04
Codename:   xenial
root@myhost:~# uname -a
Linux myhost 4.4.0-190-generic #220-Ubuntu SMP Fri Aug 28 23:02:15 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

I find that at least in my system, this threshold kernel.pid_max is 32768. When I launch any simple JVM process, it reports error like below:

java/jstack/jstat ...
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Cannot create GC thread. Out of system resources.
# An error report file with more information is saved as:
# /root/hs_err_pid1390.log

Check the memory, sufficient.

root@lascorehadoop-15a32:~# free -mh
              total        used        free      shared  buff/cache   available
Mem:           125G         11G         41G        1.2G         72G        111G
Swap:            0B          0B          0B

Check the system thread:

~# ps -eLf|wc -l
31506

But I check the system limit by ulimit:

root@myhost:~# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 515471
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 98000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 515471
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

From the ulimit output, we could see that currently thread number is far less than the maximum user process limit.

In fact, the limit which is reached is the kernel.pid_max

Very easy to check and tuning it: https://www.cyberciti.biz/tips/howto-linux-increase-pid-limits.html


To retrieve it:

cat /proc/sys/kernel/threads-max

To set it:

echo 123456789 | sudo tee -a /proc/sys/kernel/threads-max

123456789 = # of threads


It probably shouldn't matter. You are going to get much better performance designing your algorithm to use a fixed number of threads (eg, 4 or 8 if you have 4 or 8 processors). You can do this with work queues, asynchronous IO, or something like libevent.


I think we missed another restriction which will also block the new thread creation, this is the kernel.pid_max limit.

root@myhost:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.7 LTS
Release:    16.04
Codename:   xenial
root@myhost:~# uname -a
Linux myhost 4.4.0-190-generic #220-Ubuntu SMP Fri Aug 28 23:02:15 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

I find that at least in my system, this threshold kernel.pid_max is 32768. When I launch any simple JVM process, it reports error like below:

java/jstack/jstat ...
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Cannot create GC thread. Out of system resources.
# An error report file with more information is saved as:
# /root/hs_err_pid1390.log

Check the memory, sufficient.

root@lascorehadoop-15a32:~# free -mh
              total        used        free      shared  buff/cache   available
Mem:           125G         11G         41G        1.2G         72G        111G
Swap:            0B          0B          0B

Check the system thread:

~# ps -eLf|wc -l
31506

But I check the system limit by ulimit:

root@myhost:~# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 515471
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 98000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 515471
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

From the ulimit output, we could see that currently thread number is far less than the maximum user process limit.

In fact, the limit which is reached is the kernel.pid_max

Very easy to check and tuning it: https://www.cyberciti.biz/tips/howto-linux-increase-pid-limits.html


Thread count limit:

$ cat /proc/sys/kernel/threads-max 

How it is calculated:

max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE);

and: x86_64 page size (PAGE_SIZE) is 4K; Like all other architectures, x86_64 has a kernel stack for every active thread. These thread stacks are THREAD_SIZE (2*PAGE_SIZE) big;

for mempages :

cat /proc/zoneinfo | grep spanned | awk '{totalpages=totalpages+$2} END {print totalpages}';

so actually the number is not related with limitation of thread memory stack size (ulimit -s).

P.S: thread memory stack limitation is 10M in my rhel VM, and for 1.5G memory, this VM can only afford 150 threads?


Depends on your system, just write a sample program [ by creating processes in a loop ] and check using ps axo pid,ppid,rss,vsz,nlwp,cmd. When it can no more create threads check nlwp count [ nlwp is the number threads ] voila you got your fool proof answer instead of going thru books


check the stack size per thread with ulimit, in my case Redhat Linux 2.6:

    ulimit -a
...
    stack size              (kbytes, -s) 10240

Each of your threads will get this amount of memory (10MB) assigned for it's stack. With a 32bit program and a maximum address space of 4GB, that is a maximum of only 4096MB / 10MB = 409 threads !!! Minus program code, minus heap-space will probably lead to an observed max. of 300 threads.

You should be able to raise this by compiling and running on 64bit or setting ulimit -s 8192 or even ulimit -s 4096. But if this is advisable is another discussion...


You can see the current value by the following command- cat /proc/sys/kernel/threads-max

You can also set the value like

echo 100500 > /proc/sys/kernel/threads-max

The value you set would be checked against the available RAM pages. If the thread structures occupies more than 1/8th) of the available RAM pages, thread-max would be reduced accordingly.


Yes, to increase the threads number you need to increase the virtual memory or decrease the stack size. In Raspberry Pi I didn’t find a way to increase the virtual memory, if a decrease the stack size from default 8MB to 1MB It is possibly get more than 1000 threads per process but decrease the stack size with the “ulimit -s” command make this for all threads. So, my solution was use “pthread_t” instance “thread class” because the pthread_t let me set the stack size per each thread. Finally, I am available to archive more than 1000 threads per process in Raspberry Pi each one with 1MB of stack.


For anyone looking at this now, on systemd systems (in my case, specifically Ubuntu 16.04) there is another limit enforced by the cgroup pids.max parameter.

This is set to 12,288 by default, and can be overriden in /etc/systemd/logind.conf

Other advice still applies including pids_max, threads-max, max_maps_count, ulimits, etc.


In practical terms, the limit is usually determined by stack space. If each thread gets a 1MB stack (I can't remember if that is the default on Linux), then you a 32-bit system will run out of address space after 3000 threads (assuming that the last gb is reserved to the kernel).

However, you'll most likely experience terrible performance if you use more than a few dozen threads. Sooner or later, you get too much context-switching overhead, too much overhead in the scheduler, and so on. (Creating a large number of threads does little more than eat a lot of memory. But a lot of threads with actual work to do is going to slow you down as they're fighting for the available CPU time)

What are you doing where this limit is even relevant?


It probably shouldn't matter. You are going to get much better performance designing your algorithm to use a fixed number of threads (eg, 4 or 8 if you have 4 or 8 processors). You can do this with work queues, asynchronous IO, or something like libevent.


You can see the current value by the following command- cat /proc/sys/kernel/threads-max

You can also set the value like

echo 100500 > /proc/sys/kernel/threads-max

The value you set would be checked against the available RAM pages. If the thread structures occupies more than 1/8th) of the available RAM pages, thread-max would be reduced accordingly.


To set permanently,

vim /etc/sysctl.conf

and add

kernel.threads-max = "value"

Use nbio non-blocking i/o library or whatever, if you need more threads for doing I/O calls that block


check the stack size per thread with ulimit, in my case Redhat Linux 2.6:

    ulimit -a
...
    stack size              (kbytes, -s) 10240

Each of your threads will get this amount of memory (10MB) assigned for it's stack. With a 32bit program and a maximum address space of 4GB, that is a maximum of only 4096MB / 10MB = 409 threads !!! Minus program code, minus heap-space will probably lead to an observed max. of 300 threads.

You should be able to raise this by compiling and running on 64bit or setting ulimit -s 8192 or even ulimit -s 4096. But if this is advisable is another discussion...


We can see the maximum number of threads defined in the following file in linux

cat /proc/sys/kernel/threads-max

(OR)

sysctl -a | grep threads-max


For anyone looking at this now, on systemd systems (in my case, specifically Ubuntu 16.04) there is another limit enforced by the cgroup pids.max parameter.

This is set to 12,288 by default, and can be overriden in /etc/systemd/logind.conf

Other advice still applies including pids_max, threads-max, max_maps_count, ulimits, etc.


This is WRONG to say that LINUX doesn't have a separate threads per process limit.

Linux implements max number of threads per process indirectly!!

number of threads = total virtual memory / (stack size*1024*1024)

Thus, the number of threads per process can be increased by increasing total virtual memory or by decreasing stack size. But, decreasing stack size too much can lead to code failure due to stack overflow while max virtual memory is equals to the swap memory.

Check you machine:

Total Virtual Memory: ulimit -v (default is unlimited, thus you need to increase swap memory to increase this)

Total Stack Size: ulimit -s (default is 8Mb)

Command to increase these values:

ulimit -s newvalue

ulimit -v newvalue

*Replace new value with the value you want to put as limit.

References:

http://dustycodes.wordpress.com/2012/02/09/increasing-number-of-threads-per-process/


It probably shouldn't matter. You are going to get much better performance designing your algorithm to use a fixed number of threads (eg, 4 or 8 if you have 4 or 8 processors). You can do this with work queues, asynchronous IO, or something like libevent.


@dragosrsupercool

Linux doesn't use the virtual memory to calculate the maximum of thread, but the physical ram installed on the system

 max_threads = totalram_pages / (8 * 8192 / 4096);

http://kavassalis.com/2011/03/linux-and-the-maximum-number-of-processes-threads/

kernel/fork.c

/* The default maximum number of threads is set to a safe
 * value: the thread structures can take up at most half
 * of memory.
 */
max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE);

So thread max is different between every system, because the ram installed can be from different sizes, I know Linux doesn't need to increase the virtual memory, because on 32 bit we got 3 GB for user space and 1 GB for the kernel, on 64 bit we got 128 TB of virtual memory, that happen on Solaris, if you want increase the virtual memory you need to add swap space.


proper 100k threads on linux:

ulimit -s  256
ulimit -i  120000
echo 120000 > /proc/sys/kernel/threads-max
echo 600000 > /proc/sys/vm/max_map_count
echo 200000 > /proc/sys/kernel/pid_max 

 ./100k-pthread-create-app

2018 update from @Thomas, on systemd systems:

/etc/systemd/logind.conf: UserTasksMax=100000

We can see the maximum number of threads defined in the following file in linux

cat /proc/sys/kernel/threads-max

(OR)

sysctl -a | grep threads-max


To retrieve it:

cat /proc/sys/kernel/threads-max

To set it:

echo 123456789 | sudo tee -a /proc/sys/kernel/threads-max

123456789 = # of threads


Yes, to increase the threads number you need to increase the virtual memory or decrease the stack size. In Raspberry Pi I didn’t find a way to increase the virtual memory, if a decrease the stack size from default 8MB to 1MB It is possibly get more than 1000 threads per process but decrease the stack size with the “ulimit -s” command make this for all threads. So, my solution was use “pthread_t” instance “thread class” because the pthread_t let me set the stack size per each thread. Finally, I am available to archive more than 1000 threads per process in Raspberry Pi each one with 1MB of stack.


proper 100k threads on linux:

ulimit -s  256
ulimit -i  120000
echo 120000 > /proc/sys/kernel/threads-max
echo 600000 > /proc/sys/vm/max_map_count
echo 200000 > /proc/sys/kernel/pid_max 

 ./100k-pthread-create-app

2018 update from @Thomas, on systemd systems:

/etc/systemd/logind.conf: UserTasksMax=100000

@dragosrsupercool

Linux doesn't use the virtual memory to calculate the maximum of thread, but the physical ram installed on the system

 max_threads = totalram_pages / (8 * 8192 / 4096);

http://kavassalis.com/2011/03/linux-and-the-maximum-number-of-processes-threads/

kernel/fork.c

/* The default maximum number of threads is set to a safe
 * value: the thread structures can take up at most half
 * of memory.
 */
max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE);

So thread max is different between every system, because the ram installed can be from different sizes, I know Linux doesn't need to increase the virtual memory, because on 32 bit we got 3 GB for user space and 1 GB for the kernel, on 64 bit we got 128 TB of virtual memory, that happen on Solaris, if you want increase the virtual memory you need to add swap space.


To retrieve it:

cat /proc/sys/kernel/threads-max

To set it:

echo 123456789 | sudo tee -a /proc/sys/kernel/threads-max

123456789 = # of threads


In practical terms, the limit is usually determined by stack space. If each thread gets a 1MB stack (I can't remember if that is the default on Linux), then you a 32-bit system will run out of address space after 3000 threads (assuming that the last gb is reserved to the kernel).

However, you'll most likely experience terrible performance if you use more than a few dozen threads. Sooner or later, you get too much context-switching overhead, too much overhead in the scheduler, and so on. (Creating a large number of threads does little more than eat a lot of memory. But a lot of threads with actual work to do is going to slow you down as they're fighting for the available CPU time)

What are you doing where this limit is even relevant?


Thread count limit:

$ cat /proc/sys/kernel/threads-max 

How it is calculated:

max_threads = mempages / (8 * THREAD_SIZE / PAGE_SIZE);

and: x86_64 page size (PAGE_SIZE) is 4K; Like all other architectures, x86_64 has a kernel stack for every active thread. These thread stacks are THREAD_SIZE (2*PAGE_SIZE) big;

for mempages :

cat /proc/zoneinfo | grep spanned | awk '{totalpages=totalpages+$2} END {print totalpages}';

so actually the number is not related with limitation of thread memory stack size (ulimit -s).

P.S: thread memory stack limitation is 10M in my rhel VM, and for 1.5G memory, this VM can only afford 150 threads?


This is WRONG to say that LINUX doesn't have a separate threads per process limit.

Linux implements max number of threads per process indirectly!!

number of threads = total virtual memory / (stack size*1024*1024)

Thus, the number of threads per process can be increased by increasing total virtual memory or by decreasing stack size. But, decreasing stack size too much can lead to code failure due to stack overflow while max virtual memory is equals to the swap memory.

Check you machine:

Total Virtual Memory: ulimit -v (default is unlimited, thus you need to increase swap memory to increase this)

Total Stack Size: ulimit -s (default is 8Mb)

Command to increase these values:

ulimit -s newvalue

ulimit -v newvalue

*Replace new value with the value you want to put as limit.

References:

http://dustycodes.wordpress.com/2012/02/09/increasing-number-of-threads-per-process/