[c++] C/C++ maximum stack size of program

I want to do DFS on a 100 X 100 array. (Say elements of array represents graph nodes) So assuming worst case, depth of recursive function calls can go upto 10000 with each call taking upto say 20 bytes. So is it feasible means is there a possibility of stackoverflow?

What is the maximum size of stack in C/C++?

Please specify for gcc for both
1) cygwin on Windows
2) Unix

What are the general limits?

This question is related to c++ c stack

The answer is


Platform-dependent, toolchain-dependent, ulimit-dependent, parameter-dependent.... It is not at all specified, and there are many static and dynamic properties that can influence it.


Yes, there is a possibility of stack overflow. The C and C++ standard do not dictate things like stack depth, those are generally an environmental issue.

Most decent development environments and/or operating systems will let you tailor the stack size of a process, either at link or load time.

You should specify which OS and development environment you're using for more targeted assistance.

For example, under Ubuntu Karmic Koala, the default for gcc is 2M reserved and 4K committed but this can be changed when you link the program. Use the --stack option of ld to do that.


I am not sure what you mean by doing a depth first search on a rectangular array, but I assume you know what you are doing.

If the stack limit is a problem you should be able to convert your recursive solution into an iterative solution that pushes intermediate values onto a stack which is allocated from the heap.


I just ran out of stack at work, it was a database and it was running some threads, basically the previous developer had thrown a big array on the stack, and the stack was low anyway. The software was compiled using Microsoft Visual Studio 2015.

Even though the thread had run out of stack, it silently failed and continued on, it only stack overflowed when it came to access the contents of the data on the stack.

The best advice i can give is to not declare arrays on the stack - especially in complex applications and particularly in threads, instead use heap. That's what it's there for ;)

Also just keep in mind it may not fail immediately when declaring the stack, but only on access. My guess is that the compiler declares stack under windows "optimistically", i.e. it will assume that the stack has been declared and is sufficiently sized until it comes to use it and then finds out that the stack isn't there.

Different operating systems may have different stack declaration policies. Please leave a comment if you know what these policies are.


(Added 26 Sept. 2020)

On 24 Oct. 2009, as @pixelbeat first pointed out here, Bruno Haible empirically discovered the following default thread stack sizes for several systems. He said that in a multithreaded program, "the default thread stack size is:"

- glibc i386, x86_64    7.4 MB
- Tru64 5.1             5.2 MB
- Cygwin                1.8 MB
- Solaris 7..10           1 MB
- MacOS X 10.5          460 KB
- AIX 5                  98 KB
- OpenBSD 4.0            64 KB
- HP-UX 11               16 KB

Note that the above units are all in MB and KB (base 1000 numbers), NOT MiB and KiB (base 1024 numbers). I've proven this to myself by verifying the 7.4 MB case.

He also stated that:

32 KB is more than you can safely allocate on the stack in a multithreaded program

And he said:

And the default stack size for sigaltstack, SIGSTKSZ, is

  • only 16 KB on some platforms: IRIX, OSF/1, Haiku.
  • only 8 KB on some platforms: glibc, NetBSD, OpenBSD, HP-UX, Solaris.
  • only 4 KB on some platforms: AIX.

Bruno

He wrote the following simple Linux C program to empirically determine the above values. You can run it on your system today to quickly see what your maximum thread stack size is, or you can run it online on GDBOnline here: https://onlinegdb.com/rkO9JnaHD.

Explanation: It simply creates a single new thread, so as to check the thread stack size and NOT the program stack size, in case they differ, then it has that thread repeatedly allocate 128 bytes of memory on the stack (NOT the heap), using the Linux alloca() call, after which it writes a 0 to the first byte of this new memory block, and then it prints out how many total bytes it has allocated. It repeats this process, allocating 128 more bytes on the stack each time, until the program crashes with a Segmentation fault (core dumped) error. The last value printed is the estimated maximum thread stack size allowed for your system.

Important note: alloca() allocates on the stack: even though this looks like dynamic memory allocation onto the heap, similar to a malloc() call, alloca() does NOT dynamically allocate onto the heap. Rather, alloca() is a specialized Linux function to "pseudo-dynamically" (I'm not sure what I'd call this, so that's the term I chose) allocate directly onto the stack as though it was statically-allocated memory. Stack memory used and returned by alloca() is scoped at the function-level, and is therefore "automatically freed when the function that called alloca() returns to its caller." That's why its static scope isn't exited and memory allocated by alloca() is NOT freed each time a for loop iteration is completed and the end of the for loop scope is reached. See man 3 alloca for details. Here's the pertinent quote (emphasis added):

DESCRIPTION
The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller.

RETURN VALUE
The alloca() function returns a pointer to the beginning of the allocated space. If the allocation causes stack overflow, program behavior is undefined.

Here is Bruno Haible's program from 24 Oct. 2009, copied directly from the GNU mailing list here:

Again, you can run it live online here.

// By Bruno Haible
// 24 Oct. 2009
// Source: https://lists.gnu.org/archive/html/bug-coreutils/2009-10/msg00262.html

// =============== Program for determining the default thread stack size =========

#include <alloca.h>
#include <pthread.h>
#include <stdio.h>
void* threadfunc (void*p) {
  int n = 0;
  for (;;) {
    printf("Allocated %d bytes\n", n);
    fflush(stdout);
    n += 128;
    *((volatile char *) alloca(128)) = 0;
  }
}

int main()
{
  pthread_t thread;
  pthread_create(&thread, NULL, threadfunc, NULL);
  for (;;) {}
}

When I run it on GDBOnline using the link above, I get the exact same results each time I run it, as both a C and a C++17 program. It takes about 10 seconds or so to run. Here are the last several lines of the output:

Allocated 7449856 bytes
Allocated 7449984 bytes
Allocated 7450112 bytes
Allocated 7450240 bytes
Allocated 7450368 bytes
Allocated 7450496 bytes
Allocated 7450624 bytes
Allocated 7450752 bytes
Allocated 7450880 bytes
Segmentation fault (core dumped)

So, the thread stack size is ~7.45 MB for this system, as Bruno mentioned above (7.4 MB).

I've made a few changes to the program, mostly just for clarity, but also for efficiency, and a bit for learning.

Summary of my changes:

  1. [learning] I passed in BYTES_TO_ALLOCATE_EACH_LOOP as an argument to the threadfunc() just for practice passing in and using generic void* arguments in C.

  2. [efficiency] I made the main thread sleep instead of wastefully spinning.

  3. [clarity] I added more-verbose variable names, such as BYTES_TO_ALLOCATE_EACH_LOOP and bytes_allocated.

  4. [clarity] I changed this:

     *((volatile char *) alloca(128)) = 0;
    

    to this:

     volatile uint8_t * byte_buff = 
             (volatile uint8_t *)alloca(BYTES_TO_ALLOCATE_EACH_LOOP);
     byte_buff[0] = 0;
    

Here is my modified test program, which does exactly the same thing as Bruno's, and even has the same results:

You can run it online here, or download it from my repo here. If you choose to run it locally from my repo, here's the build and run commands I used for testing:

  1. Build and run it as a C program:

     mkdir -p bin && \
     gcc -Wall -Werror -g3 -O3 -std=c11 -pthread -o bin/tmp \
     onlinegdb--empirically_determine_max_thread_stack_size_GS_version.c && \
     time bin/tmp
    
  2. Build and run it as a C++ program:

     mkdir -p bin && \
     g++ -Wall -Werror -g3 -O3 -std=c++17 -pthread -o bin/tmp \
     onlinegdb--empirically_determine_max_thread_stack_size_GS_version.c && \
     time bin/tmp
    

It takes < 0.5 seconds to run locally on a fast computer with a thread stack size of ~7.4 MB.

Here's the program:

// =============== Program for determining the default thread stack size =========

// Modified by Gabriel Staples, 26 Sept. 2020

// Originally by Bruno Haible
// 24 Oct. 2009
// Source: https://lists.gnu.org/archive/html/bug-coreutils/2009-10/msg00262.html

#include <alloca.h>
#include <pthread.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <unistd.h> // sleep

/// Thread function to repeatedly allocate memory within a thread, printing
/// the total memory allocated each time, until the program crashes. The last
/// value printed before the crash indicates how big a thread's stack size is.
void* threadfunc(void* bytes_to_allocate_each_loop)
{
    const uint32_t BYTES_TO_ALLOCATE_EACH_LOOP =
            *(uint32_t*)bytes_to_allocate_each_loop;

    uint32_t bytes_allocated = 0;
    while (true)
    {
        printf("bytes_allocated = %u\n", bytes_allocated);
        fflush(stdout);
        // NB: it appears that you don't necessarily need `volatile` here,
        // but you DO definitely need to actually use (ex: write to) the
        // memory allocated by `alloca()`, as we do below, or else the
        // `alloca()` call does seem to get optimized out on some systems,
        // making this whole program just run infinitely forever without
        // ever hitting the expected segmentation fault.
        volatile uint8_t * byte_buff =
                (volatile uint8_t *)alloca(BYTES_TO_ALLOCATE_EACH_LOOP);
        byte_buff[0] = 0;
        bytes_allocated += BYTES_TO_ALLOCATE_EACH_LOOP;
    }
}

int main()
{
    const uint32_t BYTES_TO_ALLOCATE_EACH_LOOP = 128;

    pthread_t thread;
    pthread_create(&thread, NULL, threadfunc,
                   (void*)(&BYTES_TO_ALLOCATE_EACH_LOOP));
    while (true)
    {
        const unsigned int SLEEP_SEC = 10000;
        sleep(SLEEP_SEC);
    }

    return 0;
}

Sample output (same results as Bruno Haible's original program):

bytes_allocated = 7450240                                                                                                                                                        
bytes_allocated = 7450368                                                                                                                                                        
bytes_allocated = 7450496                                                                                                                                                        
bytes_allocated = 7450624                                                                                                                                                        
bytes_allocated = 7450752                                                                                                                                                        
bytes_allocated = 7450880                                                                                                                                                        
Segmentation fault (core dumped) 

Stacks for threads are often smaller. You can change the default at link time, or change at run time also. For reference, some defaults are:

  • glibc i386, x86_64: 7.4 MB
  • Tru64 5.1: 5.2 MB
  • Cygwin: 1.8 MB
  • Solaris 7..10: 1 MB
  • MacOS X 10.5: 460 KB
  • AIX 5: 98 KB
  • OpenBSD 4.0: 64 KB
  • HP-UX 11: 16 KB

Examples related to c++

Method Call Chaining; returning a pointer vs a reference? How can I tell if an algorithm is efficient? Difference between opening a file in binary vs text How can compare-and-swap be used for a wait-free mutual exclusion for any shared data structure? Install Qt on Ubuntu #include errors detected in vscode Cannot open include file: 'stdio.h' - Visual Studio Community 2017 - C++ Error How to fix the error "Windows SDK version 8.1" was not found? Visual Studio 2017 errors on standard headers How do I check if a Key is pressed on C++

Examples related to c

conflicting types for 'outchar' Can't compile C program on a Mac after upgrade to Mojave Program to find largest and second largest number in array Prime numbers between 1 to 100 in C Programming Language In c, in bool, true == 1 and false == 0? How I can print to stderr in C? Visual Studio Code includePath "error: assignment to expression with array type error" when I assign a struct field (C) Compiling an application for use in highly radioactive environments How can you print multiple variables inside a string using printf?

Examples related to stack

Java balanced expressions check {[()]} What is the default stack size, can it grow, how does it work with garbage collection? Parenthesis/Brackets Matching using Stack algorithm Stack array using pop() and push() What does "ulimit -s unlimited" do? Java ArrayList how to add elements at the beginning How to clone object in C++ ? Or Is there another solution? what is the basic difference between stack and queue? Object creation on the stack/heap? What are SP (stack) and LR in ARM?