[c] Where in memory are my variables stored in C?

By considering that the memory is divided into four segments: data, heap, stack, and code, where do global variables, static variables, constant data types, local variables (defined and declared in functions), variables (in main function), pointers, and dynamically allocated space (using malloc and calloc) get stored in memory?

I think they would be allocated as follows:

  • Global variables -------> data
  • Static variables -------> data
  • Constant data types -----> code
  • Local variables (declared and defined in functions) --------> stack
  • Variables declared and defined in main function -----> heap
  • Pointers (for example, char *arr, int *arr) -------> heap
  • Dynamically allocated space (using malloc and calloc) --------> stack

I am referring to these variables only from the C perspective.

Please correct me if I am wrong as I am new to C.

This question is related to c memory memory-management types

The answer is


One thing one needs to keep in mind about the storage is the as-if rule. The compiler is not required to put a variable in a specific place - instead it can place it wherever it pleases for as long as the compiled program behaves as if it were run in the abstract C machine according to the rules of the abstract C machine. This applies to all storage durations. For example:

  • a variable that is not accessed all can be eliminated completely - it has no storage... anywhere. Example - see how there is 42 in the generated assembly code but no sign of 404.
  • a variable with automatic storage duration that does not have its address taken need not be stored in memory at all. An example would be a loop variable.
  • a variable that is const or effectively const need not be in memory. Example - the compiler can prove that foo is effectively const and inlines its use into the code. bar has external linkage and the compiler cannot prove that it would not be changed outside the current module, hence it is not inlined.
  • an object allocated with malloc need not reside in memory allocated from heap! Example - notice how the code does not have a call to malloc and neither is the value 42 ever stored in memory, it is kept in a register!
  • thus an object that has been allocated by malloc and the reference is lost without deallocating the object with free need not leak memory...
  • the object allocated by malloc need not be within the heap below the program break (sbrk(0)) on Unixen...

  • Variables/automatic variables ---> stack section
  • Dynamically allocated variables ---> heap section
  • Initialised global variables -> data section
  • Uninitialised global variables -> data section (bss)
  • Static variables -> data section
  • String constants -> text section/code section
  • Functions -> text section/code section
  • Text code -> text section/code section
  • Registers -> CPU registers
  • Command line inputs -> environmental/command line section
  • Environmental variables -> environmental/command line section

pointers(ex:char *arr,int *arr) -------> heap

Nope, they can be on the stack or in the data segment. They can point anywhere.


I am referring to these variables only from the C perspective.

From the perspective of the C language, all that matters is extent, scope, linkage, and access; exactly how items are mapped to different memory segments is up to the individual implementation, and that will vary. The language standard doesn't talk about memory segments at all. Most modern architectures act mostly the same way; block-scope variables and function arguments will be allocated from the stack, file-scope and static variables will be allocated from a data or code segment, dynamic memory will be allocated from a heap, some constant data will be stored in read-only segments, etc.


Linux minimal runnable examples with disassembly analysis

Since this is an implementation detail not specified by standards, let's just have a look at what the compiler is doing on a particular implementation.

In this answer, I will either link to specific answers that do the analysis, or provide the analysis directly here, and summarize all results here.

All of those are in various Ubuntu / GCC versions, and the outcomes are likely pretty stable across versions, but if we find any variations let's specify more precise versions.

Local variable inside a function

Be it main or any other function:

void f(void) {
    int my_local_var;
}

As shown at: What does <value optimized out> mean in gdb?

  • -O0: stack
  • -O3: registers if they don't spill, stack otherwise

For motivation on why the stack exists see: What is the function of the push / pop instructions used on registers in x86 assembly?

Global variables and static function variables

/* BSS */
int my_global_implicit;
int my_global_implicit_explicit_0 = 0;

/* DATA */
int my_global_implicit_explicit_1 = 1;

void f(void) {
    /* BSS */
    static int my_static_local_var_implicit;
    static int my_static_local_var_explicit_0 = 0;

    /* DATA */
    static int my_static_local_var_explicit_1 = 1;
}
  • if initialized to 0 or not initialized (and therefore implicitly initialized to 0): .bss section, see also: Why is the .bss segment required?
  • otherwise: .data section

char * and char c[]

As shown at: Where are static variables stored in C and C++?

void f(void) {
    /* RODATA / TEXT */
    char *a = "abc";

    /* Stack. */
    char b[] = "abc";
    char c[] = {'a', 'b', 'c', '\0'};
}

TODO will very large string literals also be put on the stack? Or .data? Or does compilation fail?

Function arguments

void f(int i, int j);

Must go through the relevant calling convention, e.g.: https://en.wikipedia.org/wiki/X86_calling_conventions for X86, which specifies either specific registers or stack locations for each variable.

Then as shown at What does <value optimized out> mean in gdb?, -O0 then slurps everything into the stack, while -O3 tries to use registers as much as possible.

If the function gets inlined however, they are treated just like regular locals.

const

I believe that it makes no difference because you can typecast it away.

Conversely, if the compiler is able to determine that some data is never written to, it could in theory place it in .rodata even if not const.

TODO analysis.

Pointers

They are variables (that contain addresses, which are numbers), so same as all the rest :-)

malloc

The question does not make much sense for malloc, since malloc is a function, and in:

int *i = malloc(sizeof(int));

*i is a variable that contains an address, so it falls on the above case.

As for how malloc works internally, when you call it the Linux kernel marks certain addresses as writable on its internal data structures, and when they are touched by the program initially, a fault happens and the kernel enables the page tables, which lets the access happen without segfaul: How does x86 paging work?

Note however that this is basically exactly what the exec syscall does under the hood when you try to run an executable: it marks pages it wants to load to, and writes the program there, see also: How does kernel get an executable binary file running under linux? Except that exec has some extra limitations on where to load to (e.g. is the code is not relocatable).

The exact syscall used for malloc is mmap in modern 2020 implementations, and in the past brk was used: Does malloc() use brk() or mmap()?

Dynamic libraries

Basically get mmaped to memory: https://unix.stackexchange.com/questions/226524/what-system-call-is-used-to-load-libraries-in-linux/462710#462710

envinroment variables and main's argv

Above initial stack: https://unix.stackexchange.com/questions/75939/where-is-the-environment-string-actual-stored TODO why not in .data?


A popular desktop architecture divides a process's virtual memory in several segments:

  • Text segment: contains the executable code. The instruction pointer takes values in this range.

  • Data segment: contains global variables (i.e. objects with static linkage). Subdivided in read-only data (such as string constants) and uninitialized data ("BSS").

  • Stack segment: contains the dynamic memory for the program, i.e. the free store ("heap") and the local stack frames for all the threads. Traditionally the C stack and C heap used to grow into the stack segment from opposite ends, but I believe that practice has been abandoned because it is too unsafe.

A C program typically puts objects with static storage duration into the data segment, dynamically allocated objects on the free store, and automatic objects on the call stack of the thread in which it lives.

On other platforms, such as old x86 real mode or on embedded devices, things can obviously be radically different.


Corrected your wrong sentences

constant data types ----->  code //wrong

local constant variables -----> stack

initialized global constant variable -----> data segment

uninitialized global constant variable -----> bss

variables declared and defined in main function  ----->  heap //wrong

variables declared and defined in main function -----> stack

pointers(ex:char *arr,int *arr) ------->  heap //wrong

dynamically allocated space(using malloc,calloc) --------> stack //wrong

pointers(ex:char *arr,int *arr) -------> size of that pointer variable will be in stack.

Consider that you are allocating memory of n bytes (using malloc or calloc) dynamically and then making pointer variable to point it. Now that n bytes of memory are in heap and the pointer variable requries 4 bytes (if 64 bit machine 8 bytes) which will be in stack to store the starting pointer of the n bytes of memory chunk.

Note : Pointer variables can point the memory of any segment.

int x = 10;
void func()
{
int a = 0;
int *p = &a: //Now its pointing the memory of stack
int *p2 = &x; //Now its pointing the memory of data segment
chat *name = "ashok" //Now its pointing the constant string literal 
                     //which is actually present in text segment.
char *name2 = malloc(10); //Now its pointing memory in heap
...
}

dynamically allocated space(using malloc,calloc) --------> heap


For those future visitors who may be interested in knowing about those memory segments, I am writing important points about 5 memory segments in C:

Some heads up:

  1. Whenever a C program is executed some memory is allocated in the RAM for the program execution. This memory is used for storing the frequently executed code (binary data), program variables, etc. The below memory segments talks about the same:
  2. Typically there are three types of variables:
    • Local variables (also called as automatic variables in C)
    • Global variables
    • Static variables
    • You can have global static or local static variables, but the above three are the parent types.

5 Memory Segments in C:

1. Code Segment

  • The code segment, also referred as the text segment, is the area of memory which contains the frequently executed code.
  • The code segment is often read-only to avoid risk of getting overridden by programming bugs like buffer-overflow, etc.
  • The code segment does not contain program variables like local variable (also called as automatic variables in C), global variables, etc.
  • Based on the C implementation, the code segment can also contain read-only string literals. For example, when you do printf("Hello, world") then string "Hello, world" gets created in the code/text segment. You can verify this using size command in Linux OS.
  • Further reading

Data Segment

The data segment is divided in the below two parts and typically lies below the heap area or in some implementations above the stack, but the data segment never lies between the heap and stack area.

2. Uninitialized data segment

  • This segment is also known as bss.
  • This is the portion of memory which contains:
    1. Uninitialized global variables (including pointer variables)
    2. Uninitialized constant global variables.
    3. Uninitialized local static variables.
  • Any global or static local variable which is not initialized will be stored in the uninitialized data segment
  • For example: global variable int globalVar; or static local variable static int localStatic; will be stored in the uninitialized data segment.
  • If you declare a global variable and initialize it as 0 or NULL then still it would go to uninitialized data segment or bss.
  • Further reading

3. Initialized data segment

  • This segment stores:
    1. Initialized global variables (including pointer variables)
    2. Initialized constant global variables.
    3. Initialized local static variables.
  • For example: global variable int globalVar = 1; or static local variable static int localStatic = 1; will be stored in initialized data segment.
  • This segment can be further classified into initialized read-only area and initialized read-write area. Initialized constant global variables will go in the initialized read-only area while variables whose values can be modified at runtime will go in the initialized read-write area.
  • The size of this segment is determined by the size of the values in the program's source code, and does not change at run time.
  • Further reading

4. Stack Segment

  • Stack segment is used to store variables which are created inside functions (function could be main function or user-defined function), variable like
    1. Local variables of the function (including pointer variables)
    2. Arguments passed to function
    3. Return address
  • Variables stored in the stack will be removed as soon as the function execution finishes.
  • Further reading

5. Heap Segment

  • This segment is to support dynamic memory allocation. If the programmer wants to allocate some memory dynamically then in C it is done using the malloc, calloc, or realloc methods.
  • For example, when int* prt = malloc(sizeof(int) * 2) then eight bytes will be allocated in heap and memory address of that location will be returned and stored in ptr variable. The ptr variable will be on either the stack or data segment depending on the way it is declared/used.
  • Further reading

Examples related to c

conflicting types for 'outchar' Can't compile C program on a Mac after upgrade to Mojave Program to find largest and second largest number in array Prime numbers between 1 to 100 in C Programming Language In c, in bool, true == 1 and false == 0? How I can print to stderr in C? Visual Studio Code includePath "error: assignment to expression with array type error" when I assign a struct field (C) Compiling an application for use in highly radioactive environments How can you print multiple variables inside a string using printf?

Examples related to memory

How does the "view" method work in PyTorch? How do I release memory used by a pandas dataframe? How to solve the memory error in Python Docker error : no space left on device Default Xmxsize in Java 8 (max heap size) How to set Apache Spark Executor memory What is the best way to add a value to an array in state How do I read a large csv file with pandas? How to clear variables in ipython? Error occurred during initialization of VM Could not reserve enough space for object heap Could not create the Java virtual machine

Examples related to memory-management

When to create variables (memory management) How to check if pytorch is using the GPU? How to delete multiple pandas (python) dataframes from memory to save RAM? Is there a way to delete created variables, functions, etc from the memory of the interpreter? C++ error : terminate called after throwing an instance of 'std::bad_alloc' How to delete object? Android Studio - How to increase Allocated Heap Size Implementing IDisposable correctly Calculating Page Table Size Pointer-to-pointer dynamic two-dimensional array

Examples related to types

Cannot invoke an expression whose type lacks a call signature How to declare a Fixed length Array in TypeScript Typescript input onchange event.target.value Error: Cannot invoke an expression whose type lacks a call signature Class constructor type in typescript? What is dtype('O'), in pandas? YAML equivalent of array of objects in JSON Converting std::__cxx11::string to std::string Append a tuple to a list - what's the difference between two ways? How to check if type is Boolean