The history of calling conventions, part 1 The great thing about calling conventions on the x86 platform is that there are so many to choose from!

In the 16-bit world, part of the calling convention was fixed by the instruction set: The BP register defaults to the SS selector, whereas the other registers default to the DS selector. So the BP register was necessarily the register used for accessing stack-based parameters.

The registers for return values were also chosen automatically by the instruction set. The AX register acted as the accumulator and therefore was the obvious choice for passing the return value. The 8086 instruction set also has special instructions which treat the DX:AX pair as a single 32-bit value, so that was the obvious choice to be the register pair used to return 32-bit values.

That left SI, DI, BX and CX.

(Terminology note: Registers that do not need to be preserved across a function call are often called "scratch".)

When deciding which registers should be preserved by a calling convention, you need to balance the needs of the caller against the needs of the callee. The caller would prefer that all registers be preserved, since that removes the need for the caller to worry about saving/restoring the value across a call. The callee would prefer that no registers be preserved, since that removes the need to save the value on entry and restore it on exit.

If you require too few registers to be preserved, then callers become filled with register save/restore code. But if you require too many registers to be preserved, then callees become obligated to save and restore registers that the caller might not have really cared about. This is particularly important for leaf functions (functions that do not call any other functions).

The non-uniformity of the x86 instruction set was also a contributing factor. The CX register could not be used to access memory, so you wanted to have some register other than CX be scratch, so that a leaf function can at least access memory without having to preserve any registers. So BX was chosen to be scratch, leaving SI and DI as preserved.

So here's the rundown of 16-bit calling conventions:

All
All calling conventions in the 16-bit world preserve registers BP, SI, DI (others scratch) and put the return value in DX:AX or AX, as appropriate for size.

C (__cdecl)
Functions with a variable number of parameters constrain the C calling convention considerably. It pretty much requires that the stack be caller-cleaned and that the parameters be pushed right to left, so that the first parameter is at a fixed position relative to the top of the stack. The classic (pre-prototype) C language allowed you to call functions without telling the compiler what parameters the function requested, and it was common practice to pass the wrong number of parameters to a function if you "knew" that the called function wouldn't mind. (See "open" for a classic example of this. The third parameter is optional if the second parameter does not specify that a file should be created.)

In summary: Caller cleans the stack, parameters pushed right to left.

Function name decoration consists of a leading underscore. My guess is that the leading underscore prevented a function name from accidentally colliding with an assembler reserved word. (Imagine, for example, if you had a function called "call".)

Pascal (__pascal)
Pascal does not support functions with a variable number of parameters, so it can use the callee-clean convention. Parameters are pushed from left to right, because, well, it seemed the natural thing to do. Function name decoration consists of conversion to uppercase. This is necessary because Pascal is not a case-sensitive language.

Nearly all Win16 functions are exported as Pascal calling convention. The callee-clean convention saves three bytes at each call point, with a fixed overhead of two bytes per function. So if a function is called ten times, you save 3*10 = 30 bytes for the call points, and pay 2 bytes in the function itself, for a net savings of 28 bytes. It was also fractionally faster. On Win16, saving a few hundred bytes and a few cycles was a big deal.

Fortran (__fortran)
The Fortran calling convention is the same as the Pascal calling convention. It got a separate name probably because Fortran has strange pass-by-reference behavior.

Fastcall (__fastcall)
The Fastcall calling convention passes the first parameter in the DX register and the second in the CX register (I think). Whether this was actually faster depended on your call usage. It was generally faster since parameters passed in registers do not need to be spilled to the stack, then reloaded by the callee. On the other hand, if significant computation occurs between the computation of the first and second parameters, the caller has to spill it anyway. To add insult to injury, the called function often spilled the register into memory because it needed to spare the register for something else, which in the "significant computation between the first two parameters" case means that you get a double-spill. Ouch!

Consequently, __fastcall was typically faster only for short leaf functions, and even then it might not be.

Okay, those are the 16-bit calling conventions I remember. Part 2 will discuss 32-bit calling conventions, if I ever get around to writing it.

The history of calling conventions, part 2 Foreshadowing: This information will actually be useful in a future discussion. Well, not the fine details, but you may notice something that explains... um... it's hard to describe. Just wait for it.

Curiously, it is only the 8086 and x86 platforms that have multiple calling conventions. All the others have only one!

Now we're going deep into trivia that absolutely nobody remembers or even cares about: The 32-bit calling conventions you don't see any more.

All

All of the processors listed here are RISC-style, which means there are lots of registers, none of which have any particular meaning. Well, aside from the zero register which is hard-wired to zero. (It turns out zero is a very handy number to have readily available.) Any meanings attached to the registers are those imposed by the calling convention.

As a throwback to the processors of old, the "call" instruction stores the return address in a register instead of being pushed onto the stack. A good thing, too, since the processor doesn't officially know about a "stack", it being a construction of the calling convention.

As always, registers or stack space used to pass parameters may be used as scratch by the called function, as can the return value register.

You may notice that all of the RISC calling conventions are basically the same. Once again, evidence that the 8086/x86 is the weirdo. A wildly popular weirdo, mind you.

The Alpha AXP

The Alpha AXP ("AXP" being yet another of those faux-acronyms that officially doesn't stand for anything) has 32 integer registers, one of which is hard-wired to zero. By convention, one of the registers is the "stack pointer", one is the "return address" register; and two others have special meanings unrelated to parameter passing.

The first six parameters are passed in registers, with the remaining parameters on the stack. If the function is variadic, the parameters can be spilled onto the stack so they can be accessed as an array.

Seven other registers are preserved across calls, one is the return value, and the remaining thirteen are scratch. 1 zero register + 1 stack pointer + 1 return address + 2 special + 6 parameters + 7 preserved + 1 return value + 13 scratch = 32 total integer registers.

Function names on the Alpha AXP are completely undecorated.

The MIPS R4000

The first four parameters are passed in a0, a1, a2 and a3; the remainder are spilled onto the stack. What's more, there are four "dead spaces" on the stack where the four register parameters "would have been" if they had been passed on the stack. These are for use by the callee to spill the register parameters back onto the stack if desired. (Particularly handy for variadic functions.)

Function names on the MIPS are completely undecorated.

The PowerPC

The first eight parameters are passed in registers (r3 through r10), and the return address is managed manually.

I forget what happens to parameters nine and up...

Function names on the PowerPC are decorated by prepending two periods.

Postclaimer: I haven't had personal experience with the MIPS or PPC processors, so my discussion of those processors may be a tad off, but the basic idea I think is sound.

The history of calling conventions, part 3 Okay, here we go: The 32-bit x86 calling conventions.

(By the way, in case people didn't get it: I'm only talking in the context of calling conventions you're likely to encounter when doing Windows programming or which are used by Microsoft compilers. I do not intend to cover calling conventions for other operating systems or that are specific to a particular language or compiler vendor.)

Remember: If a calling convention is used for a C++ member function, then there is a hidden "this" parameter that is the implicit first parameter to the function.

All

The 32-bit x86 calling conventions all preserve the EDI, ESI, EBP, and EBX registers, using the EDX:EAX pair for return values.

C (__cdecl)

The same constraints apply to the 32-bit world as in the 16-bit world. The parameters are pushed from right to left (so that the first parameter is nearest to top-of-stack), and the caller cleans the parameters. Function names are decorated by a leading underscore.

__stdcall

This is the calling convention used for Win32, with exceptions for variadic functions (which necessarily use __cdecl) and a very few functions that use __fastcall. Parameters are pushed from right to left [corrected 10:18am] and the callee cleans the stack. Function names are decorated by a leading underscore and a trailing @-sign followed by the number of bytes of parameters taken by the function.

__fastcall

The first two parameters are passed in ECX and EDX, with the remainder passed on the stack as in __stdcall. Again, the callee cleans the stack. Function names are decorated by a leading @-sign and a trailing @-sign followed by the number of bytes of parameters taken by the function (including the register parameters).

thiscall

The first parameter (which is the "this" parameter) is passed in ECX, with the remainder passed on the stack as in __stdcall. Once again, the callee cleans the stack. Function names are decorated by the C++ compiler in an extraordinarily complicated mechanism that encodes the types of each of the parameters, among other things. This is necessary because C++ permits function overloading, so a complex decoration scheme must be used so that the various overloads have different decorated names.

There are some nice diagrams on MSDN illustrating some of these calling conventions.

Remember that a calling convention is a contract between the caller and the callee. For those of you crazy enough to write in assembly language, this means that your callback functions need to preserve the registers mandated by the calling convention because the caller (the operating system) is relying on it. If you corrupt, say, the EBX register across a call, don't be surprised when things fall apart on you. More on this in a future entry.