"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." -- Brian W. Kernighan
struct NODE {
int value;
struct NODE next; /* illegal */
};
Since the compiler hasn't fully "seen" the NODE struct, it can't be used anywhere, even inside itself. However,
this works:
struct NODE {
int value;
struct NODE *next; /* OK */
};
Since all pointers are of the same size, the compiler will accept this. The compiler doesn't fully know
what's in a NODE struct, but it knows the size of a pointer to it.
If two structures are mutually dependent on each other, it may not be possible, since physically in the source file, one of the declarations must come after the other. With C compilers, this is generally not a problem since the struct keyword is required:
struct A {
int value;
struct B *b; /* B is a struct */
};
struct B {
int value;
struct A *a; /* A is a struct */
};
However, C++ certainly won't handle this because the struct
keyword is usually missing:
struct A {
int value;
B *b; /* What is B? */
};
struct B {
int value;
A *a; /* A is a struct */
};
The solution is to use a forward (or incomplete) declaration:
struct B; /* incomplete or forward declaration */
struct A {
int value;
B *b; /* B is a struct */
};
struct B {
int value;
A *a; /* A is a struct */
};
#define MAX_PATH 12
Initializing:
struct DATE { int month; int day; int year; };struct DATETIME { struct DATE date; struct TIME time; };struct FILEINFO { int length; char name[MAX_PATH]; struct DATETIME dt; };
struct TIME t = {9, 30, 0};
struct DATE d = {12, 31, 2002};
struct DATETIME dt1 = { {12, 31, 2002}, {9, 30, 0} };
struct FILEINFO fi1 = {1024, "foo.txt", { {12, 31, 2002}, {9, 30, 0} } };
struct DATE d2 = {12, 0, 0}; /* month = 12, day = 0, year = 0 */
struct DATE d3 = {12}; /* month = 12, day = 0, year = 0 */
struct FILEINFO fi2 = {1024, "foo.txt", {{0, 0, 0}, {0, 0, 0}}}; /* date/time all 0 */
struct FILEINFO fi3 = {1024, "foo.txt"}; /* date/time all 0 */
struct FILEINFO fi4 = {0}; /* all fields are 0 */
struct FILEINFO fi5;
memset(&fi, 0, sizeof(fi5)); /* all fields are 0 */
ASM code for initializing
We can visualize the struct in memory like this:struct FILEINFO fi;
Now highlighting the two fields of the DATETIME struct:![]()
Now highlighting the fields of the DATE and TIME structs:![]()
Using the diagrams as an aid, what does the following code print out?![]()
struct FILEINFO fi = {123, "foo.txt", { {12, 31, 2000}, {9, 33, 40} } };
int i = *((int *)&fi + 7);
char c1 = *((char *)&fi + 9);
char c2 = *((char *)&fi + 25);
char c3 = *((char *)&fi + 32);
int x = *( (int *) ((struct DATE *)&fi + 2) );
printf("%i, %c, %i, %c, %i\n", i, c1, c2, c3, x);
enum Kind {OPERATOR, INTEGER, FLOAT, IDENTIFIER};
struct Symbol
{
enum Kind kind;
char op;
int ival;
float fval;
char id;
};
A Symbol struct in memory would look something like this:
If we wanted to store the information about this expression:![]()
A + 23 * 3.14We could do this:
void main(void)
{
struct Symbol sym1, sym2, sym3, sym4, sym5;
sym1.kind = IDENTIFIER;
sym1.id = 'A';
sym2.kind = OPERATOR;
sym2.op = '+';
sym3.kind = INTEGER;
sym3.ival = 23;
sym4.kind = OPERATOR;
sym4.op = '*';
sym5.kind = FLOAT;
sym5.fval = 3.14F;
}
Memory usage would look something like this:
When dealing with mutually exclusive data members, a better solution is to create a union and use that instead:
| The union | The new struct |
|---|---|
|
|
The same rules for naming structs apply to unions as well, so we could even typedef the union:
| The union | The new struct |
|---|---|
|
|
struct NewSymbol
{
enum Kind kind;
union
{
char op;
int ival;
float fval;
char id;
} data;
};
Our NewSymbol struct would look like this in memory:
Our code needs to be modified slightly:![]()
void main(void)
{
struct NewSymbol sym1, sym2, sym3, sym4, sym5;
sym1.kind = IDENTIFIER;
sym1.data.id = 'A';
sym2.kind = OPERATOR;
sym2.data.op = '+';
sym3.kind = INTEGER;
sym3.data.ival = 23;
sym4.kind = OPERATOR;
sym4.data.op = '*';
sym5.kind = FLOAT;
sym5.data.fval = 3.14F;
}
And the memory usage would look something like this:
Using a union to get at individual bytes of data:
void TestUnion(void)
{
union
{
int i;
unsigned char bytes[4];
}val;
val.i = 257;
printf("%3i %3i %3i %3i\n",
val.bytes[0], val.bytes[1], val.bytes[2], val.bytes[3]);
val.i = 32767;
printf("%3i %3i %3i %3i\n",
val.bytes[0], val.bytes[1], val.bytes[2], val.bytes[3]);
val.i = 32768;
printf("%3i %3i %3i %3i\n",
val.bytes[0], val.bytes[1], val.bytes[2], val.bytes[3]);
}
This prints out:
The values in binary:1 1 0 0 255 127 0 0 0 128 0 0
As little-endian:257: 00000000 00000000 00000001 00000001 32767: 00000000 00000000 01111111 11111111 32768: 00000000 00000000 10000000 00000000
257: 00000001 00000001 00000000 00000000 32767: 11111111 01111111 00000000 00000000 32768: 00000000 10000000 00000000 00000000
Changing the union to this:
union
{
int i;
signed char bytes[4];
}val;
Gives this output (the bit patterns are the same):
1 1 0 0 -1 127 0 0 0 -128 0 0
struct NewSymbol sym1 = {OPERATOR, {'+'} }; /* fine, op is first member */
struct NewSymbol sym2 = {FLOAT, {3.14} }; /* this won't work as expected */
Given the code above, what is printed below?
printf("%c, %i, %f, %c\n", sym1.data.op, sym1.data.ival,
sym1.data.fval, sym1.data.id);
printf("%c, %i, %f, %c\n", sym2.data.op, sym2.data.ival,
sym2.data.fval, sym2.data.id);
|
|
we can access the fields like this:
We can also define A within B like this:struct B st; st.b = 5; st.foo.a = 10;
struct B
{
int b;
struct A
{
int a;
}sa;
};
and access the fields in the same way:
There's an interesting "feature" to the C language that gives the inner (or nested) structure the same scope as the surrounding structure. So, given the above definition of B, we can do this:struct B st; st.b = 5; st.sa.a = 10;
If you create another structure named A, you'll get a compiler error for having a duplicate definition:struct B s1; struct A s2; /* A is in scope (same scope as B) */ s1.b = 5; s1.sa.a = 10; s2.a = 20;
|
|
Note that you can't do this:
or this:struct B.A s2; /* syntax error */
So, to use the nested structure in B:struct B::A s2; /* syntax error, no scope resolution operator in C */
struct A s2; /* Valid C, invalid C++ */ B::A s2; /* Valid C++, invalid C */
One solution to prevent the pollution of the global namespace is to leave the tag off of the struct within B:Note: The scoping rules for nested structs is one of those few areas that C and C++ disagree on.
struct B
{
int b;
struct /* no tag */
{
int a;
}sa;
};
Finally, this example shows the scoping rules in more detail. Given structure B and function Foo:
|
|
we get errors with X, Y, and Z if used within another function:
int main(void)
{
struct B s1; // Ok
struct A s2; // A has same scope as B (global)
struct X s3; // Error, X is not in scope
struct Y s4; // Error, Y is not in scope
struct Z s5; // Error, Z is not in scope
return 0;
}
struct Symbol sym1;
printf("sizeof(sym1.kind) = %i\n", sizeof(sym1.kind));
printf("sizeof(sym1.op) = %i\n", sizeof(sym1.op));
printf("sizeof(sym1.ival) = %i\n", sizeof(sym1.ival));
printf("sizeof(sym1.fval) = %i\n", sizeof(sym1.fval));
printf("sizeof(sym1.id) = %i\n", sizeof(sym1.id));
printf("sizeof(sym1) = %i\n", sizeof(sym1));
Recall the Symbol structure and diagram:
|
|
Butsizeof(sym1.kind) = 4 sizeof(sym1.op) = 1 sizeof(sym1.ival) = 4 sizeof(sym1.fval) = 4 sizeof(sym1.id) = 1 sizeof(sym1) = 20
What's going on?4 + 1 + 4 + 4 + 1 != 20
which is 20 bytes in size because all data is aligned on 4-byte boundaries. This means that the char data is actually padded with 3 bytes extra so the data that follows will be aligned properly. (Note the term "follows").
To change the structure alignment, use this compiler directive:
where n is the alignment. The n specifies the value, in bytes, to be used for packing. In Microsoft Visual Studio, the default value for n is 8. Valid values are 1, 2, 4, 8, and 16.#pragma pack(n)
The alignment of a member will be on a boundary that is either a multiple of n or a multiple of the size of the member, whichever is smaller.
For example, to align the fields of the Symbol structure on 2-byte boundaries:
#pragma pack(2) /* align on 2-byte boundaries */
struct Symbol
{
enum Kind kind;
char op;
int ival;
float fval;
char id;
};
#pragma pack() /* restore compiler's default alignment setting */
Now, it would look like this in memory:
To align the fields on 1-byte boundaries:
#pragma pack(1) /* align on 1-byte boundaries */
struct Symbol
{
enum Kind kind;
char op;
int ival;
float fval;
char id;
};
#pragma pack() /* restore compiler's alignment setting */
Now, it would look like this in memory:
An actual printout from MS VS 6.0:
#pragma pack(4) #pragma pack(2) #pragma pack(1)
--------------------------------------------------------------------------------
&sym1 = 0012FEDC &sym1 = 0012FEE0 &sym1 = 0012FEE0
&sym1.kind = 0012FEDC &sym1.kind = 0012FEE0 &sym1.kind = 0012FEE0
&sym1.op = 0012FEE0 &sym1.op = 0012FEE4 &sym1.op = 0012FEE4
&sym1.ival = 0012FEE4 &sym1.ival = 0012FEE6 &sym1.ival = 0012FEE5
&sym1.fval = 0012FEE8 &sym1.fval = 0012FEEA &sym1.fval = 0012FEE9
&sym1.id = 0012FEEC &sym1.id = 0012FEEE &sym1.id = 0012FEED
sizeof(sym1.kind) = 4 sizeof(sym1.kind) = 4 sizeof(sym1.kind) = 4
sizeof(sym1.op) = 1 sizeof(sym1.op) = 1 sizeof(sym1.op) = 1
sizeof(sym1.ival) = 4 sizeof(sym1.ival) = 4 sizeof(sym1.ival) = 4
sizeof(sym1.fval) = 4 sizeof(sym1.fval) = 4 sizeof(sym1.fval) = 4
sizeof(sym1.id) = 1 sizeof(sym1.id) = 1 sizeof(sym1.id) = 1
sizeof(sym1) = 20 sizeof(sym1) = 16 sizeof(sym1) = 14
The code to print the addresses:
struct Symbol sym1;
printf("&sym1 = %p\n", &sym1);
printf("&sym1.kind = %p\n", &sym1.kind);
printf("&sym1.op = %p\n", &sym1.op);
printf("&sym1.ival = %p\n", &sym1.ival);
printf("&sym1.fval = %p\n", &sym1.fval);
printf("&sym1.id = %p\n", &sym1.id);
Note that if we used any of these alignments:
the layout would still look like this:#pragma pack(4) /* align on 4-byte boundaries */ #pragma pack(8) /* align on 8-byte boundaries */ #pragma pack(16) /* align on 16-byte boundaries */
This is because none of the members of the structure are larger than 4 bytes (so they will never need to be aligned on 8-byte or 16-byte boundaries.)
Notes
Pragmas are not part of the ANSI C language and are compiler-dependent. Although most compilers support the pack pragma, you should be aware that different compilers may have different default alignments. Also, MS says that the default alignment for Win32 is 8 bytes, not 4. You should consult the documentation for your compiler to determine the behavior.
This is from the top of stdlib.h from MS VC++ 6.0:
Given these two logically equivalent structures, what are the ramifications of laying out the members in these ways?#ifdef _MSC_VER /* * Currently, all MS C compilers for Win32 platforms default to 8 byte * alignment. */ #pragma pack(push,8)
struct BEAVIS { char a; double b; char c; double d; char e; double f; char g; double h; };struct BUTTHEAD { char a; char c; char e; char g; double b; double d; double f; double h; };
So using the Symbol structure example above with the address of sym1 being 100:structvar.member ==> *([address of structvar] + [offset of member])
Or more accurately:struct Symbol sym1; sym1.ival ==> *(&sym1 + 8) sym1.id ==> *(&sym1 + 16)
Note that the code above assumes structures are aligned on 4-byte boundaries:sym1.ival ==> *( (int *)( (char *)&sym1 + 8) ) sym1.id ==> *( (char *)&sym1 + 16 )
Code to print the values of a Symbol structure variable using pointer/offset with 4-byte alignment:
TestStructOffset4(void)
{
struct Symbol sym1 = {IDENTIFIER, '+', 123, 3.14F, 'A'};
char *psym = (char *)&sym1;
int kind = *((int *)(psym + 0)); /* 3 */
char op = *(psym + 4); /* '+' */
int ival = *((int *)(psym + 8)); /* 123 */
float fval = *((float *)(psym + 12)); /* 3.14 */
char id = *(psym + 16); /* 'A' */
/* 3, +, 123, 3.140000, A */
printf("%i, %c, %i, %f, %c\n", kind, op, ival, fval, id);
}
Code to print the values of a Symbol structure variable using pointer/offset with 1-byte alignment:
TestStructOffset1(void)
{
struct Symbol sym1 = {IDENTIFIER, '+', 123, 3.14F, 'A'};
char *psym = (char *)&sym1;
int kind = *((int *)(psym + 0)); /* 3 */
char op = *(psym + 4); /* '+' */
int ival = *((int *)(psym + 5)); /* 123 */
float fval = *((float *)(psym + 9)); /* 3.14 */
char id = *(psym + 13); /* 'A' */
/* 3, +, 123, 3.140000, A */
printf("%i, %c, %i, %f, %c\n", kind, op, ival, fval, id);
}
typedef unsigned short WORD;
typedef unsigned long DWORD;
typedef struct tagBITMAPFILEHEADER {
WORD bfType; /* 2 bytes */
DWORD bfSize; /* 4 bytes */
WORD bfReserved1; /* 2 bytes */
WORD bfReserved2; /* 2 bytes */
DWORD bfOffBits; /* 4 bytes */
} BITMAPFILEHEADER, *PBITMAPFILEHEADER;
And this function:
void PrintBitmapHeader(BITMAPFILEHEADER *header)
{
printf("Type: %c%c (%04X)\n", header->bfType & 0xFF,
header->bfType >> 8,
header->bfType);
printf("Size: %lu (%08X)\n", header->bfSize, header->bfSize);
printf("Res1: %lu (%04X)\n", header->bfReserved1, header->bfReserved1);
printf("Res2: %lu (%04X)\n", header->bfReserved2, header->bfReserved2);
printf("Offs: %lu (%08X)\n", header->bfOffBits, header->bfOffBits);
}
What should this program display? (Hint: the size of the file is 207,158 bytes, the offset
to the bitmap itself is 1078 bytes, and the two reserved fields are 0.)
void main(void)
{
BITMAPFILEHEADER header;
FILE *fp = fopen("foo.bmp", "rb");
assert(fp);
fread(&header, sizeof(BITMAPFILEHEADER), 1, fp);
PrintBitmapHeader(&header);
fclose(fp);
}
Given the "hint" above, the expected output should be:
However, the actual output is:Type: BM (4D42) Size: 207158 (00032936) Res1: 0 (0000) Res2: 0 (0000) Offs: 1078 (00000436)
Why is this incorrect?Type: BM (4D42) Size: 3 (00000003) Res1: 0 (0000) Res2: 1078 (0436) Offs: 2621440 (00280000)
The actual bytes in the bitmap file look like this:
Separated by fields it looks like this:42 4D 36 29 03 00 00 00 00 00 36 04 00 00 28 00 . . . .
And the BITMAPFILEHEADER structure in memory looks like this:Type Size Res1 Res2 Offset Other stuff 42 4D | 36 29 03 00 | 00 00 | 00 00 | 36 04 00 00 | 28 00 . . . .
Why is the structure aligned like this? This means that:![]()
sizeof(BITMAPFILEHEADER) == 16
Reading the header with the code:
causes the first 16 bytes (sizeof(BITMAPFILEHEADER)) of the file to be read into the buffer (memory pointed to by &header), which yields:fread(&header, sizeof(BITMAPFILEHEADER), 1, fp);
Which gives the values we saw (adjusting for little-endian):![]()
Again, the correct output should be:Member Hex Decimal --------------------------------------- bfType 4D42 19778 bfSize 00000003 3 bfReserved1 0000 0 bfReserved2 0436 1078 bfOffBits 00280000 2621440
Type: BM (4D42) Size: 207158 Res1: 0 Res2: 0 Offs: 1078
To achieve the correct results, we need to pack the structure:
#pragma pack(2)
typedef struct tagBITMAPFILEHEADER {
WORD bfType; /* 2 bytes */
DWORD bfSize; /* 4 bytes */
WORD bfReserved1; /* 2 bytes */
WORD bfReserved2; /* 2 bytes */
DWORD bfOffBits; /* 4 bytes */
} BITMAPFILEHEADER, *PBITMAPFILEHEADER;
#pragma pack()
Now, the structure in memory looks like this:
and:![]()
so now when we read in 14 bytes, the structure is filled like this:sizeof(BITMAPFILEHEADER) == 14
which gives the correct values:![]()
Member Hex Decimal --------------------------------------- bfType 4D42 19778 bfSize 00032936 207158 bfReserved1 0000 0 bfReserved2 0000 0 bfOffBits 00000436 1078
The actual structure in wingdi.h looks like this:
#include <pshpack2.h>
typedef struct tagBITMAPFILEHEADER {
WORD bfType;
DWORD bfSize;
WORD bfReserved1;
WORD bfReserved2;
DWORD bfOffBits;
} BITMAPFILEHEADER, FAR *LPBITMAPFILEHEADER, *PBITMAPFILEHEADER;
#include <poppack.h>
and pshpack2.h looks like this:
Complete listings#if ! (defined(lint) || defined(_lint) || defined(RC_INVOKED)) #if ( _MSC_VER >= 800 ) || defined(_PUSHPOP_SUPPORTED) #pragma warning(disable:4103) #if !(defined( MIDL_PASS )) || defined( __midl ) #pragma pack(push) #endif #pragma pack(2) #else #pragma pack(2) #endif #endif // ! (defined(lint) || defined(_lint) || defined(RC_INVOKED))
Addresses and values at different pack values:
#pragma pack(1) #pragma pack(2) #pragma pack(4) ----------------------------------------------------------------------------------- bfType = 0012FEE0 bfType = 0012FEE0 bfType = 0012FEE0 bfSize = 0012FEE2 bfSize = 0012FEE2 bfSize = 0012FEE4 bfRes1 = 0012FEE6 bfRes1 = 0012FEE6 bfRes1 = 0012FEE8 bfRes2 = 0012FEE8 bfRes2 = 0012FEE8 bfRes2 = 0012FEEA bfOffs = 0012FEEA bfOffs = 0012FEEA bfOffs = 0012FEEC Type: BM (4D42) Type: BM (4D42) Type: BM (4D42) Size: 207158 (00032936) Size: 207158 (00032936) Size: 3 (00000003) Res1: 0 (0000) Res1: 0 (0000) Res1: 0 (0000) Res2: 0 (0000) Res2: 0 (0000) Res2: 1078 (0436) Offs: 1078 (00000436) Offs: 1078 (00000436) Offs: 2621440 (00280000)
Given the sizes of each data type, we could say that the minimum amount of memory require to hold these attributes is 8 bytes. However, given a 32-bit computer, it's possible that the amount of memory required could actually be 24 bytes, depending on where these variables exist. Why?// Variables for each attribute of some object // Comments represent the range of values for the attribute unsigned char level; // 0 - 3 unsigned char power; // 0 - 63 unsigned short range; // 0 - 1023 unsigned char armor; // 0 - 15 unsigned short health; // 0 - 511 unsigned char grade; // 0 - 1
Declared local to a function: (on the stack)
Microsoft GNU Borland Address of level = 0012FF28 Address of power = 0012FF24 Address of range = 0012FF20 Address of armor = 0012FF1C Address of health = 0012FF18 Address of grade = 0012FF14 Address of level = 0x22F047 Address of power = 0x22F046 Address of range = 0x22F044 Address of armor = 0x22F043 Address of health = 0x22F040 Address of grade = 0x22F03F Address of level = 0012FF83 Address of power = 0012FF82 Address of range = 0012FF80 Address of armor = 0012FF7F Address of health = 0012FF7C Address of grade = 0012FF7B
Declared globally:
Microsoft GNU Borland Address of level = 004310BE Address of power = 004310BC Address of range = 004312FC Address of armor = 004310BD Address of health = 00431142 Address of grade = 004310BF Address of level = 0x405030 Address of power = 0x406060 Address of range = 0x406050 Address of armor = 0x406080 Address of health = 0x407110 Address of grade = 0x407160 Address of level = 004122E0 Address of power = 004122E1 Address of range = 004122E2 Address of armor = 004122E4 Address of health = 004122E6 Address of grade = 004122E8
Our first attempt to save memory is to put them in a structure:
// Put into a struct
typedef struct
{
unsigned char level; // 0 - 3
unsigned char power; // 0 - 63
unsigned short range; // 0 - 1023
unsigned char armor; // 0 - 15
unsigned short health; // 0 - 511
unsigned char grade; // 0 - 1
}ENTITY_ATTRS;
What is the memory requirements for this struct? Of course, it depends on how the compiler is packing structures. Given the default pack value in MSVC, the layout looks like this:
What about this structure:
// Put into a struct and pack
#pragma pack(1) /* align on 1-byte boundaries */
typedef struct
{
unsigned char level; // 0 - 3
unsigned char power; // 0 - 63
unsigned short range; // 0 - 1023
unsigned char armor; // 0 - 15
unsigned short health; // 0 - 511
unsigned char grade; // 0 - 1
}ENTITY_ATTRS;
#pragma pack()
This code yields a layout like this:
Of course, looking closer, we realize that we only need 32 bits for all 6 variables, so we'll just use an unsigned integer to store the values:
To set the fields to these values:
We can use "simple" bit manipulation:level = 3; // 2 bits wide power = 32; // 6 bits wide range = 1000; // 10 bits wide armor = 7; // 4 bits wide health = 300; // 9 bits wide grade = 1; // 1 bit wide
After shifting, we OR all of the values together:unsigned int attrs; attrs = 3 << 30; // set level to 3 attrs = attrs | (32 << 24); // set power to 32 attrs = attrs | (1000 << 14); // set range to 1000 attrs = attrs | (7 << 10); // set armor to 7 attrs = attrs | (300 << 1); // set health to 300 attrs = attrs | 1; // set grade to 1
Left shifts Binary
-----------------------------------------------
3 << 30 11000000000000000000000000000000
32 << 24 100000000000000000000000000000
1000 << 14 111110100000000000000000
7 << 10 01110000000000
300 << 1 1001011000
1 1
-----------------------------------------------
11100000111110100001111001011001
Of course, there's got to be a better way...
// Use bitfields for the attributes
typedef struct
{
unsigned int level : 2; // 0 - 3
unsigned int power : 6; // 0 - 63
unsigned int range : 10; // 0 - 1023
unsigned int armor : 4; // 0 - 15
unsigned int health : 9; // 0 - 511
unsigned int grade : 1; // 0 - 1
}ENTITY_ATTRS_B;
The sizeof the structure above is 4, which is the same size as the unsigned integer
used before. However, this structure allows for a much cleaner syntax:
Much like a lot of syntax in C, the compiler is doing the work for you behind-the-scenes.ENTITY_ATTRS_B attrs; // Easier to read, understand, and self-documenting attrs.level = 3; attrs.power = 32; attrs.range = 1000; attrs.armor = 7; attrs.health = 300; attrs.grade = 1;
Notes