[c++] stdcall and cdecl

There are (among others) two types of calling conventions - stdcall and cdecl. I have few questions on them:

  1. When a cdecl function is called, how does a caller know if it should free up the stack ? At the call site, does the caller know if the function being called is a cdecl or a stdcall function ? How does it work ? How does the caller know if it should free up the stack or not ? Or is it the linkers responsibility ?
  2. If a function which is declared as stdcall calls a function(which has a calling convention as cdecl), or the other way round, would this be inappropriate ?
  3. In general, can we say that which call will be faster - cdecl or stdcall ?

This question is related to c++ stdcall cdecl

The answer is


Raymond Chen gives a nice overview of what __stdcall and __cdecl does.

(1) The caller "knows" to clean up the stack after calling a function because the compiler knows the calling convention of that function and generates the necessary code.

void __stdcall StdcallFunc() {}

void __cdecl CdeclFunc()
{
    // The compiler knows that StdcallFunc() uses the __stdcall
    // convention at this point, so it generates the proper binary
    // for stack cleanup.
    StdcallFunc();
}

It is possible to mismatch the calling convention, like this:

LRESULT MyWndProc(HWND hwnd, UINT msg,
    WPARAM wParam, LPARAM lParam);
// ...
// Compiler usually complains but there's this cast here...
windowClass.lpfnWndProc = reinterpret_cast<WNDPROC>(&MyWndProc);

So many code samples get this wrong it's not even funny. It's supposed to be like this:

// CALLBACK is #define'd as __stdcall
LRESULT CALLBACK MyWndProc(HWND hwnd, UINT msg
    WPARAM wParam, LPARAM lParam);
// ...
windowClass.lpfnWndProc = &MyWndProc;

However, assuming the programmer doesn't ignore compiler errors, the compiler will generate the code needed to clean up the stack properly since it'll know the calling conventions of the functions involved.

(2) Both ways should work. In fact, this happens quite frequently at least in code that interacts with the Windows API, because __cdecl is the default for C and C++ programs according to the Visual C++ compiler and the WinAPI functions use the __stdcall convention.

(3) There should be no real performance difference between the two.


Calling conventions have nothing to do with the C/C++ programming languages and are rather specifics on how a compiler implements the given language. If you consistently use the same compiler, you never need to worry about calling conventions.

However, sometimes we want binary code compiled by different compilers to inter-operate correctly. When we do so we need to define something called the Application Binary Interface (ABI). The ABI defines how the compiler converts the C/C++ source into machine-code. This will include calling conventions, name mangling, and v-table layout. cdelc and stdcall are two different calling conventions commonly used on x86 platforms.

By placing the information on the calling convention into the source header, the compiler will know what code needs to be generated to inter-operate correctly with the given executable.


In CDECL arguments are pushed onto the stack in revers order, the caller clears the stack and result is returned via processor registry (later I will call it "register A"). In STDCALL there is one difference, the caller doeasn't clear the stack, the calle do.

You are asking which one is faster. No one. You should use native calling convention as long as you can. Change convention only if there is no way out, when using external libraries that requires certain convention to be used.

Besides, there are other conventions that compiler may choose as default one i.e. Visual C++ compiler uses FASTCALL which is theoretically faster because of more extensive usage of processor registers.

Usually you must give a proper calling convention signature to callback functions passed to some external library i.e. callback to qsort from C library must be CDECL (if the compiler by default uses other convention then we must mark the callback as CDECL) or various WinAPI callbacks must be STDCALL (whole WinAPI is STDCALL).

Other usual case may be when you are storing pointers to some external functions i.e. to create a pointer to WinAPI function its type definition must be marked with STDCALL.

And below is an example showing how does the compiler do it:

/* 1. calling function in C++ */
i = Function(x, y, z);

/* 2. function body in C++ */
int Function(int a, int b, int c) { return a + b + c; }

CDECL:

/* 1. calling CDECL 'Function' in pseudo-assembler (similar to what the compiler outputs) */
push on the stack a copy of 'z', then a copy of 'y', then a copy of 'x'
call (jump to function body, after function is finished it will jump back here, the address where to jump back is in registers)
move contents of register A to 'i' variable
pop all from the stack that we have pushed (copy of x, y and z)

/* 2. CDECL 'Function' body in pseudo-assembler */
/* Now copies of 'a', 'b' and 'c' variables are pushed onto the stack */
copy 'a' (from stack) to register A
copy 'b' (from stack) to register B
add A and B, store result in A
copy 'c' (from stack) to register B
add A and B, store result in A
jump back to caller code (a, b and c still on the stack, the result is in register A)

STDCALL:

/* 1. calling STDCALL in pseudo-assembler (similar to what the compiler outputs) */
push on the stack a copy of 'z', then a copy of 'y', then a copy of 'x'
call
move contents of register A to 'i' variable

/* 2. STDCALL 'Function' body in pseaudo-assembler */
pop 'a' from stack to register A
pop 'b' from stack to register B
add A and B, store result in A
pop 'c' from stack to register B
add A and B, store result in A
jump back to caller code (a, b and c are no more on the stack, result in register A)

I want to improve on @adf88's answer. I feel that pseudocode for the STDCALL does not reflect the way of how it happens in reality. 'a', 'b', and 'c' aren't popped from the stack in the function body. Instead they are popped by the ret instruction (ret 12 would be used in this case) that in one swoop jumps back to the caller and at the same time pops 'a', 'b', and 'c' from the stack.

Here is my version corrected according to my understanding:

STDCALL:

/* 1. calling STDCALL in pseudo-assembler (similar to what the compiler outputs) */
push on the stack a copy of 'z', then copy of 'y', then copy of 'x'
call
move contents of register A to 'i' variable

/* 2. STDCALL 'Function' body in pseaudo-assembler */ copy 'a' (from stack) to register A copy 'b' (from stack) to register B add A and B, store result in A copy 'c' (from stack) to register B add A and B, store result in A jump back to caller code and at the same time pop 'a', 'b' and 'c' off the stack (a, b and c are removed from the stack in this step, result in register A)


a) When a cdecl function is called by the caller, how does a caller know if it should free up the stack?

The cdecl modifier is part of the function prototype (or function pointer type etc.) so the caller get the info from there and acts accordingly.

b) If a function which is declared as stdcall calls a function(which has a calling convention as cdecl), or the other way round, would this be inappropriate?

No, it's fine.

c) In general, can we say that which call will be faster - cdecl or stdcall?

In general, I would refrain from any such statements. The distinction matters eg. when you want to use va_arg functions. In theory, it could be that stdcall is faster and generates smaller code because it allows to combine popping the arguments with popping the locals, but OTOH with cdecl, you can do the same thing, too, if you're clever.

The calling conventions that aim to be faster usually do some register-passing.


Those things are Compiler- and Platform-specific. Neither the C nor the C++ standard say anything about calling conventions except for extern "C" in C++.

how does a caller know if it should free up the stack ?

The caller knows the calling convention of the function and handles the call accordingly.

At the call site, does the caller know if the function being called is a cdecl or a stdcall function ?

Yes.

How does it work ?

It is part of the function declaration.

How does the caller know if it should free up the stack or not ?

The caller knows the calling conventions and can act accordingly.

Or is it the linkers responsibility ?

No, the calling convention is part of a function's declaration so the compiler knows everything it needs to know.

If a function which is declared as stdcall calls a function(which has a calling convention as cdecl), or the other way round, would this be inappropriate ?

No. Why should it?

In general, can we say that which call will be faster - cdecl or stdcall ?

I don't know. Test it.


I noticed a posting that say that it does not matter if you call a __stdcall from a __cdecl or visa versa. It does.

The reason: with __cdecl the arguments that are passed to the called functions are removed form the stack by the calling function, in __stdcall, the arguments are removed from the stack by the called function. If you call a __cdecl function with a __stdcall, the stack is not cleaned up at all, so eventually when the __cdecl uses a stacked based reference for arguments or return address will use the old data at the current stack pointer. If you call a __stdcall function from a __cdecl, the __stdcall function cleans up the arguments on the stack, and then the __cdecl function does it again, possibly removing the calling functions return information.

The Microsoft convention for C tries to circumvent this by mangling the names. A __cdecl function is prefixed with an underscore. A __stdcall function prefixes with an underscore and suffixed with an at sign “@” and the number of bytes to be removed. Eg __cdecl f(x) is linked as _f, __stdcall f(int x) is linked as _f@4 where sizeof(int) is 4 bytes)

If you manage to get past the linker, enjoy the debugging mess.


It's specified in the function type. When you have a function pointer, it's assumed to be cdecl if not explicitly stdcall. This means that if you get a stdcall pointer and a cdecl pointer, you can't exchange them. The two function types can call each other without issues, it's just getting one type when you expect the other. As for speed, they both perform the same roles, just in a very slightly different place, it's really irrelevant.


The caller and the callee need to use the same convention at the point of invokation - that's the only way it could reliably work. Both the caller and the callee follow a predefined protocol - for example, who needs to clean up the stack. If conventions mismatch your program runs into undefined behavior - likely just crashes spectacularly.

This is only required per invokation site - the calling code itself can be a function with any calling convention.

You shouldn't notice any real difference in performance between those conventions. If that becomes a problem you usually need to make less calls - for example, change the algorithm.