Structures and Unions

"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." -- Brian W. Kernighan

Refresher

struct tag { members } variable-list;
Create a structure named TIME, (no space is allocated):
struct TIME {
  int hours;
  int minutes;
  int seconds;
};
Create two variables of type struct TIME, (space is allocated):
struct TIME t1, t2; /* In C, you need the struct keyword */

We can do both in one step:

struct TIME {
  int hours;
  int minutes;
  int seconds;
}t1, t2;            /* This allocates space            */

struct TIME t3, t4; /* Create more TIME variables here */

Leaving off the tag creates an anonymous structure:

struct {         /* No name given to this struct */
  int hours;
  int minutes;
  int seconds;
}t1, t2;         /* We won't be able to create others later */

Create a new type with the typedef keyword:

typedef struct {  /* use typedef keyword to create a new type */
  int hours;
  int minutes;
  int seconds;
}TIME;            /* TIME is a type, not a variable    */

TIME t1, t2;      /* Don't need the struct keyword now */

Using the tag and typedef:

typedef struct TIME {/* tag name     */
  int hours;
  int minutes;
  int seconds;
}S_TIME;             /* typedef name */

S_TIME t1, t2;       /* use typedef name                   */
struct TIME t3, t4;  /* use tag name, needs struct keyword */

The tag name and typedef name can be the same:

typedef struct TIME{ /* tag name            */
  int hours;
  int minutes;
  int seconds;
}TIME;               /* typedef same as tag */

TIME t1, t2;         /* use typedef name                   */
struct TIME t3, t4;  /* use tag name, needs struct keyword */

TIME times[10];    /* an array of 10 TIME structs          */
TIME *pt;          /* a pointer to a TIME struct           */
TIME foo(TIME *p); /* a function taking a TIME pointer and */ 
                   /*   returning a TIME struct            */

Structures can contain almost any data type, including other structures:

#define MAX_PATH 12
struct DATE {
  int month;
  int day;
  int year;
};
struct DATETIME {
  struct DATE date;
  struct TIME time;
};
struct FILEINFO {
  int length;
  char name[MAX_PATH];
  struct DATETIME dt;
};
Nothing new here:
void main(void)
{
  struct FILEINFO fi;   /* Create FILEINFO struct on stack */

    /* Set date to 7/4/2005 */
  fi.dt.date.day = 4;
  fi.dt.date.month = 7;
  fi.dt.date.year = 2005;

    /* Set time to 9:30 am */
  fi.dt.time.hours = 9;
  fi.dt.time.minutes = 30;
  fi.dt.time.seconds = 0;

  fi.length = 1024;           /* Set length */
  strcpy(fi.name, "foo.txt"); /* Set name */
}
Same operations using a pointer to the structure:
void main(void)
{
  struct FILEINFO fi;         /* Create FILEINFO struct on stack */
  struct FILEINFO *pfi = &fi; /* Pointer to a FILEINFO struct    */

    /* Set date to 7/4/2005 */
  (*pfi).dt.date.day = 4;
  (*pfi).dt.date.month = 7;
  (*pfi).dt.date.year = 2005;

    /* Set time to 9:30 am */
  (*pfi).dt.time.hours = 9;
  (*pfi).dt.time.minutes = 30;
  (*pfi).dt.time.seconds = 0;

  (*pfi).length = 1024;           /* Set length */
  strcpy((*pfi).name, "foo.txt"); /* Set name */
}
Due to the order of precedence, we need the parentheses above. Otherwise:

Accessing a member of a pointer:

  /* error C2231: '.length' : left operand points to 'struct', use '->' */
pfi.length = 1024; 
Accessing a member of a pointer and then attempting to dereference an integer:
  /* error C2231: '.length' : left operand points to 'struct', use '->' */
  /* error C2100: illegal indirection                                   */
*pfi.length = 1024;
Closer look:
Expression     Description
------------------------------------------------------------------
pfi            A pointer to a FILEINFO struct
*pfi           A FILEINFO struct
(*pfi).        Accessing a member of a FILEINFO struct
pfi->          Accessing a member of a FILEINFO struct (shorthand)
The arrow operator is another programmer convenience along the same vein as the subscript operator. It performs the indirection "behind the scenes" so:
(*pfi).  ==  pfi->
That's why using the arrow operator on a structure is illegal; we're trying to dereference something (a structure) that isn't a pointer. And that's a no-no. Same example using the arrow operator:
  /* Set date to 7/4/2005 */
pfi->dt.date.month = 7;
pfi->dt.date.day = 4;
pfi->dt.date.year = 2005;

  /* Set time to 9:30 am */
pfi->dt.time.hours = 9;
pfi->dt.time.minutes = 30;
pfi->dt.time.seconds = 0;

pfi->length = 1024;           /* Set length */
strcpy(pfi->name, "foo.txt"); /* Set name */

Dynamically Allocating Structures

Function to print a single FILEINFO structure:
void PrintFileInfo(struct FILEINFO *fi)
{
  printf("Name: %s\n", fi->name);
  printf("Size: %i\n", fi->length);
  printf("Time: %2i:%02i:%02i\n", fi->dt.time.hours, fi->dt.time.minutes, fi->dt.time.seconds);
  printf("Date: %i/%i/%i\n", fi->dt.date.month, fi->dt.date.day, fi->dt.date.year);
}
Function to print an array of FILEINFO structures:
void PrintFileInfos(struct FILEINFO *records, int count)
{
  int i;
  for (i = 0; i < count; i++)
    PrintFileInfo(records++);
}

Dynamically allocate a FILEINFO structure and print it:

void main(void)
{
    /* Pointer to a FILEINFO struct    */
  struct FILEINFO *pfi = malloc(sizeof(struct FILEINFO));

    /* Make sure it worked */
  assert(pfi != NULL);

    /* Set the fields of the struct .... */

  PrintFileInfos(pfi, 1); /* Print the struct */
  free(pfi);              /* Free the memory  */
}

Dynamically allocate an array of FILEINFO structs and print them:

#define SIZE 10

void TestHeapArray(void)
{
  int i;
  struct FILEINFO *pfi = calloc(SIZE, sizeof(struct FILEINFO));
  struct FILEINFO *saved = pfi;
  assert(pfi != NULL);

  for (i = 0; i < SIZE; i++)
  {
    char name[12];

    pfi->dt.date.month = 12;
    pfi->dt.date.day = i + 1;
    pfi->dt.date.year = 2005;
    sprintf(name, "foo-%i.txt", i + 1);
    strcpy(pfi->name, name); 

      /* Point to next FILEINFO struct in the array */
    pfi++;
  }
   
  pfi = saved;  /* Reset pointer to beginning */
  PrintFileInfos(pfi, SIZE);
  free(pfi);
}

Self-referencing structures

Before any data type can be used to create a variable, the size of the type must be known to the compiler:
struct NODE {
  int value;
  struct NODE next;  /* illegal */
};
Since the compiler hasn't fully "seen" the NODE struct, it can't be used anywhere, even inside itself. However, this works:
struct NODE {
  int value;
  struct NODE *next; /* OK */
};
Since all pointers are of the same size, the compiler will accept this. The compiler doesn't fully know what's in a NODE struct, but it knows the size of a pointer to it.

If two structures are mutually dependent on each other, it may not be possible, since physically in the source file, one of the declarations must come after the other. With C compilers, this is generally not a problem since the struct keyword is required:

struct A {
  int value;
  struct B *b; /* B is a struct */
};

struct B {
  int value;
  struct A *a; /* A is a struct */
};
However, C++ certainly won't handle this because the struct keyword is usually missing:
struct A {
  int value;
  B *b; /* What is B? */
};

struct B {
  int value;
  A *a;  /* A is a struct */
};
The solution is to use a forward (or incomplete) declaration:
struct B; /* incomplete or forward declaration */

struct A {
  int value;
  B *b; /* B is a struct */
};

struct B {
  int value;
  A *a;  /* A is a struct */
};

Initializing Structures

The structs:
#define MAX_PATH 12
struct DATE {
  int month;
  int day;
  int year;
};
struct DATETIME {
  struct DATE date;
  struct TIME time;
};
struct FILEINFO {
  int length;
  char name[MAX_PATH];
  struct DATETIME dt;
};
Initializing:
struct TIME t = {9, 30, 0};
struct DATE d = {12, 31, 2002};
struct DATETIME dt1 = { {12, 31, 2002}, {9, 30, 0} };
struct FILEINFO fi1 = {1024, "foo.txt", { {12, 31, 2002}, {9, 30, 0} } };

struct DATE d2 = {12, 0, 0}; /* month = 12, day = 0, year = 0 */
struct DATE d3 = {12};       /* month = 12, day = 0, year = 0 */
struct FILEINFO fi2 = {1024, "foo.txt", {{0, 0, 0}, {0, 0, 0}}}; /* date/time all 0 */ 
struct FILEINFO fi3 = {1024, "foo.txt"};                         /* date/time all 0 */
struct FILEINFO fi4 = {0};   /* all fields are 0 */

struct FILEINFO fi5;
memset(&fi, 0, sizeof(fi5)); /* all fields are 0 */

ASM code for initializing


Arrays vs. Structures

Unlike arrays, which prohibit most aggregate operations, it is possible in some cases to manipulate structures as a whole.

OperationArraysStructures
ArithmeticNoNo
AssignmentNoYes
ComparisonNoNo
Input/Output(e.g. printf)No (except strings)No
Parameter passingBy address onlyBy address or value
Return from functionNoYes

Question Although arrays are a "simpler" aggregate data type, we are limited to what we can do to them compared to the more complex aggregate type, the struct. Why do you think C allows you to assign one structure to another but not arrays? What about passing them by value to a function and returning them from a function?

Given this code:
struct FILEINFO fi;
We can visualize the struct in memory like this:

Now highlighting the two fields of the DATETIME struct:

Now highlighting the fields of the DATE and TIME structs:

Using the diagrams as an aid, what does the following code print out?
struct FILEINFO fi = {123, "foo.txt", { {12, 31, 2000}, {9, 33, 40} } };
  
int i = *((int *)&fi + 7);
char c1 = *((char *)&fi + 9);
char c2 = *((char *)&fi + 25);
char c3 = *((char *)&fi + 32);
int x = *( (int *) ((struct DATE *)&fi + 2) );

printf("%i, %c, %i, %c, %i\n", i, c1, c2, c3, x);

Unions

Suppose we want to parse simple expressions and we need to store information about each symbol in the expression. We could use a structure like this:
enum Kind {OPERATOR, INTEGER, FLOAT, IDENTIFIER};

struct Symbol
{
  enum Kind kind;
  char op;
  int ival;
  float fval;
  char id;
};
A Symbol struct in memory would look something like this:

If we wanted to store the information about this expression:
A + 23 * 3.14
We could do this:
void main(void)
{
  struct Symbol sym1, sym2, sym3, sym4, sym5;

  sym1.kind = IDENTIFIER;
  sym1.id = 'A';

  sym2.kind = OPERATOR;
  sym2.op = '+';

  sym3.kind = INTEGER;
  sym3.ival = 23;

  sym4.kind = OPERATOR;
  sym4.op = '*';

  sym5.kind = FLOAT;
  sym5.fval = 3.14F;
}
Memory usage would look something like this:


When dealing with mutually exclusive data members, a better solution is to create a union and use that instead:

The unionThe new struct
union SYMBOL_DATA
{
  char op;
  int ival;
  float fval;
  char id;
};
struct NewSymbol
{
  enum Kind kind;
  union SYMBOL_DATA data;
};
Note that sizeof(SYMBOL_DATA) is 4, since that's the size of the largest member.

The same rules for naming structs apply to unions as well, so we could even typedef the union:

The unionThe new struct
typedef union
{
  char op;
  int ival;
  float fval;
  char id;
}SYMBOL_DATA;
struct NewSymbol
{
  enum Kind kind;
  SYMBOL_DATA data;
};
Often, however, if the union is not intended to be used outside of a structure, we define it within the structure definition itself without the tag:

struct NewSymbol
{
  enum Kind kind;
  union 
  {
    char op;
    int ival;
    float fval;
    char id;
  } data;
};
Our NewSymbol struct would look like this in memory:

Our code needs to be modified slightly:
void main(void)
{
  struct NewSymbol sym1, sym2, sym3, sym4, sym5;

  sym1.kind = IDENTIFIER;
  sym1.data.id = 'A';

  sym2.kind = OPERATOR;
  sym2.data.op = '+';

  sym3.kind = INTEGER;
  sym3.data.ival = 23;

  sym4.kind = OPERATOR;
  sym4.data.op = '*';

  sym5.kind = FLOAT;
  sym5.data.fval = 3.14F;
}
And the memory usage would look something like this:


Using a union to get at individual bytes of data:

void TestUnion(void)
{
  union 
  {
    int i;
    unsigned char bytes[4];
  }val;

  val.i = 257;
  printf("%3i  %3i  %3i  %3i\n", 
    val.bytes[0], val.bytes[1], val.bytes[2], val.bytes[3]);

  val.i = 32767;
  printf("%3i  %3i  %3i  %3i\n", 
    val.bytes[0], val.bytes[1], val.bytes[2], val.bytes[3]);

  val.i = 32768;
  printf("%3i  %3i  %3i  %3i\n", 
    val.bytes[0], val.bytes[1], val.bytes[2], val.bytes[3]);
}
This prints out:
  1    1    0    0
255  127    0    0
  0  128    0    0
The values in binary:
  257: 00000000 00000000 00000001 00000001
32767: 00000000 00000000 01111111 11111111
32768: 00000000 00000000 10000000 00000000
As little-endian:
  257: 00000001 00000001 00000000 00000000
32767: 11111111 01111111 00000000 00000000
32768: 00000000 10000000 00000000 00000000
Changing the union to this:
union 
{
  int i;
  signed char bytes[4];
}val;
Gives this output (the bit patterns are the same):
   1     1     0     0
  -1   127     0     0
   0  -128     0     0

Initializing Unions

The type of the initializer must be the same type as the first member of the union:

struct NewSymbol sym1 = {OPERATOR, {'+'} }; /* fine, op is first member    */
struct NewSymbol sym2 = {FLOAT, {3.14} };   /* this won't work as expected */
Given the code above, what is printed below?
printf("%c, %i, %f, %c\n", sym1.data.op, sym1.data.ival, 
                           sym1.data.fval, sym1.data.id);
printf("%c, %i, %f, %c\n", sym2.data.op, sym2.data.ival, 
                           sym2.data.fval, sym2.data.id);

Nested Structures

Given these structures:

struct A
{
  int a;
};
struct B
{
  int b;
  struct A foo;
};

we can access the fields like this:

struct B st;

st.b = 5;
st.foo.a = 10;
We can also define A within B like this:

struct B
{
  int b;
  struct A
  {
    int a;
  }sa;
};
and access the fields in the same way:

struct B st;

st.b = 5;
st.sa.a = 10;
There's an interesting "feature" to the C language that gives the inner (or nested) structure the same scope as the surrounding structure. So, given the above definition of B, we can do this:

struct B s1;  
struct A s2; /* A is in scope (same scope as B) */

s1.b = 5;
s1.sa.a = 10;
s2.a = 20;
If you create another structure named A, you'll get a compiler error for having a duplicate definition:

struct B
{
  int b;
  struct A 
  {
    int a;
  }sa;
};
struct A
{
  int foo;
  double bar;
};

Note that you can't do this:

struct B.A s2; /* syntax error */
or this:
struct B::A s2; /* syntax error, no scope resolution operator in C */
So, to use the nested structure in B:
struct A s2; /* Valid C, invalid C++ */
B::A s2;     /* Valid C++, invalid C */

Note: The scoping rules for nested structs is one of those few areas that C and C++ disagree on.

One solution to prevent the pollution of the global namespace is to leave the tag off of the struct within B:
struct B
{
  int b;
  struct  /* no tag */
  {
    int a;
  }sa;
};
Finally, this example shows the scoping rules in more detail. Given structure B and function Foo:

struct B
{
  int b;
  struct A
  {
    int a;
  }sa;
};
void Foo(void)
{
  struct X
  {
    int x;
    struct Y
    {
      int y;
      struct Z
      {
        int z;
      }sz;
    }sx;
  };

  struct B b;  // Ok, B is in scope
  struct A a;  // Ok, A has same scope as B (global scope)
  struct X s1; // Ok, X is in scope
  struct Y s2; // Y has same scope as X (function scope)
  struct Z s3; // Z has same scope as Y (which has same scope as X)
}

we get errors with X, Y, and Z if used within another function:

int main(void)
{
  struct B s1; // Ok
  struct A s2; // A has same scope as B (global)
  struct X s3; // Error, X is not in scope
  struct Y s4; // Error, Y is not in scope
  struct Z s5; // Error, Z is not in scope
  return 0;
}

Structure Alignment

What will be printed out by the following code?
struct Symbol sym1;

printf("sizeof(sym1.kind) =  %i\n", sizeof(sym1.kind));
printf("sizeof(sym1.op)   =  %i\n", sizeof(sym1.op));
printf("sizeof(sym1.ival) =  %i\n", sizeof(sym1.ival));
printf("sizeof(sym1.fval) =  %i\n", sizeof(sym1.fval));
printf("sizeof(sym1.id)   =  %i\n", sizeof(sym1.id));
printf("sizeof(sym1)      = %i\n", sizeof(sym1));
Recall the Symbol structure and diagram:

struct Symbol
{
  enum Kind kind;
  char op;
  int ival;
  float fval;
  char id;
};
The actual output on MS Visual C++ 6.0 is this:

sizeof(sym1.kind) =  4
sizeof(sym1.op)   =  1
sizeof(sym1.ival) =  4
sizeof(sym1.fval) =  4
sizeof(sym1.id)   =  1
sizeof(sym1)      = 20  
But
4 + 1 + 4 + 4 + 1 != 20
What's going on?

So, a more accurate diagram of the Symbol structure would look like this:

which is 20 bytes in size because all data is aligned on 4-byte boundaries. This means that the char data is actually padded with 3 bytes extra so the data that follows will be aligned properly. (Note the term "follows").

To change the structure alignment, use this compiler directive:

#pragma pack(n)
where n is the alignment. The n specifies the value, in bytes, to be used for packing. In Microsoft Visual Studio, the default value for n is 8. Valid values are 1, 2, 4, 8, and 16.

The alignment of a member will be on a boundary that is either a multiple of n or a multiple of the size of the member, whichever is smaller.

For example, to align the fields of the Symbol structure on 2-byte boundaries:
#pragma pack(2)    /* align on 2-byte boundaries */
struct Symbol
{
  enum Kind kind;
  char op;
  int ival;
  float fval;
  char id;
};
#pragma pack()     /* restore compiler's default alignment setting */
Now, it would look like this in memory:

To align the fields on 1-byte boundaries:

#pragma pack(1)    /* align on 1-byte boundaries */
struct Symbol
{
  enum Kind kind;
  char op;
  int ival;
  float fval;
  char id;
};
#pragma pack()     /* restore compiler's alignment setting */
Now, it would look like this in memory:

An actual printout from MS VS 6.0:

   #pragma pack(4)               #pragma pack(2)              #pragma pack(1)
--------------------------------------------------------------------------------
     &sym1 = 0012FEDC             &sym1 = 0012FEE0            &sym1 = 0012FEE0
&sym1.kind = 0012FEDC        &sym1.kind = 0012FEE0       &sym1.kind = 0012FEE0
  &sym1.op = 0012FEE0          &sym1.op = 0012FEE4         &sym1.op = 0012FEE4
&sym1.ival = 0012FEE4        &sym1.ival = 0012FEE6       &sym1.ival = 0012FEE5
&sym1.fval = 0012FEE8        &sym1.fval = 0012FEEA       &sym1.fval = 0012FEE9
  &sym1.id = 0012FEEC          &sym1.id = 0012FEEE         &sym1.id = 0012FEED

sizeof(sym1.kind) =  4       sizeof(sym1.kind) =  4      sizeof(sym1.kind) =  4
sizeof(sym1.op)   =  1       sizeof(sym1.op)   =  1      sizeof(sym1.op)   =  1
sizeof(sym1.ival) =  4       sizeof(sym1.ival) =  4      sizeof(sym1.ival) =  4
sizeof(sym1.fval) =  4       sizeof(sym1.fval) =  4      sizeof(sym1.fval) =  4
sizeof(sym1.id)   =  1       sizeof(sym1.id)   =  1      sizeof(sym1.id)   =  1
sizeof(sym1)      = 20       sizeof(sym1)      = 16      sizeof(sym1)      = 14
The code to print the addresses:
struct Symbol sym1;

printf("&sym1 = %p\n", &sym1);
printf("&sym1.kind = %p\n", &sym1.kind);
printf("&sym1.op = %p\n", &sym1.op);
printf("&sym1.ival = %p\n", &sym1.ival);
printf("&sym1.fval = %p\n", &sym1.fval);
printf("&sym1.id = %p\n", &sym1.id);
Note that if we used any of these alignments:

#pragma pack(4)    /* align on 4-byte boundaries */
#pragma pack(8)    /* align on 8-byte boundaries */
#pragma pack(16)   /* align on 16-byte boundaries */
the layout would still look like this:

This is because none of the members of the structure are larger than 4 bytes (so they will never need to be aligned on 8-byte or 16-byte boundaries.)

Notes

Pragmas are not part of the ANSI C language and are compiler-dependent. Although most compilers support the pack pragma, you should be aware that different compilers may have different default alignments. Also, MS says that the default alignment for Win32 is 8 bytes, not 4. You should consult the documentation for your compiler to determine the behavior.

This is from the top of stdlib.h from MS VC++ 6.0:

#ifdef  _MSC_VER
/*
 * Currently, all MS C compilers for Win32 platforms default to 8 byte
 * alignment.
 */
#pragma pack(push,8)
Given these two logically equivalent structures, what are the ramifications of laying out the members in these ways?

struct BEAVIS
{
  char a;
  double b;
  char c;
  double d;
  char e;
  double f;
  char g;
  double h;
};
struct BUTTHEAD
{
  char a;
  char c;
  char e;
  char g;
  double b;
  double d;
  double f;
  double h;
};


Accessing Structures Using Pointer/Offset

Much like arrays, the compiler converts structure.member notation into pointer + offset notation:
structvar.member ==> *([address of structvar] + [offset of member])
So using the Symbol structure example above with the address of sym1 being 100:
struct Symbol sym1;

sym1.ival ==> *(&sym1 + 8)
sym1.id   ==> *(&sym1 + 16)
Or more accurately:
sym1.ival ==> *( (int *)( (char *)&sym1 + 8) )
sym1.id   ==> *( (char *)&sym1 + 16 )
Note that the code above assumes structures are aligned on 4-byte boundaries:


Code to print the values of a Symbol structure variable using pointer/offset with 4-byte alignment:

TestStructOffset4(void)
{
  struct Symbol sym1 = {IDENTIFIER, '+', 123, 3.14F, 'A'};
  char *psym = (char *)&sym1;

  int kind   = *((int *)(psym + 0));    /* 3    */
  char op    = *(psym + 4);             /* '+'  */
  int ival   = *((int *)(psym + 8));    /* 123  */
  float fval = *((float *)(psym + 12)); /* 3.14 */
  char id    = *(psym + 16);            /* 'A'  */

    /* 3, +, 123, 3.140000, A */
  printf("%i, %c, %i, %f, %c\n", kind, op, ival, fval, id);
}

Code to print the values of a Symbol structure variable using pointer/offset with 1-byte alignment:

TestStructOffset1(void)
{
  struct Symbol sym1 = {IDENTIFIER, '+', 123, 3.14F, 'A'};
  char *psym = (char *)&sym1;

  int kind   = *((int *)(psym + 0));    /* 3    */
  char op    = *(psym + 4);             /* '+'  */
  int ival   = *((int *)(psym + 5));    /* 123  */
  float fval = *((float *)(psym + 9));  /* 3.14 */
  char id    = *(psym + 13);            /* 'A'  */

    /* 3, +, 123, 3.140000, A */
  printf("%i, %c, %i, %f, %c\n", kind, op, ival, fval, id);
}


Example Using BITMAPFILEHEADER

Given these definitions:
typedef unsigned short WORD;
typedef unsigned long DWORD;

typedef struct tagBITMAPFILEHEADER { 
  WORD    bfType;       /* 2 bytes */
  DWORD   bfSize;       /* 4 bytes */
  WORD    bfReserved1;  /* 2 bytes */
  WORD    bfReserved2;  /* 2 bytes */ 
  DWORD   bfOffBits;    /* 4 bytes */
} BITMAPFILEHEADER, *PBITMAPFILEHEADER; 
And this function:
void PrintBitmapHeader(BITMAPFILEHEADER *header)
{
  printf("Type: %c%c (%04X)\n", header->bfType & 0xFF, 
                                header->bfType >> 8, 
                                header->bfType);
  printf("Size: %lu (%08X)\n", header->bfSize, header->bfSize);
  printf("Res1: %lu (%04X)\n", header->bfReserved1, header->bfReserved1);
  printf("Res2: %lu (%04X)\n", header->bfReserved2, header->bfReserved2);
  printf("Offs: %lu (%08X)\n", header->bfOffBits, header->bfOffBits);
}
What should this program display? (Hint: the size of the file is 207,158 bytes, the offset to the bitmap itself is 1078 bytes, and the two reserved fields are 0.)
void main(void)
{
  BITMAPFILEHEADER header;
  FILE *fp = fopen("foo.bmp", "rb");
  assert(fp);

  fread(&header, sizeof(BITMAPFILEHEADER), 1, fp);
  PrintBitmapHeader(&header);
  fclose(fp);
}
Given the "hint" above, the expected output should be:
Type: BM (4D42)
Size: 207158 (00032936)
Res1: 0 (0000)
Res2: 0 (0000)
Offs: 1078 (00000436)
However, the actual output is:
Type: BM (4D42)
Size: 3 (00000003)
Res1: 0 (0000)
Res2: 1078 (0436)
Offs: 2621440 (00280000)
Why is this incorrect?


The actual bytes in the bitmap file look like this:

42 4D 36 29 03 00 00 00 00 00 36 04 00 00 28 00 . . . . 
Separated by fields it looks like this:
 Type      Size       Res1    Res2      Offset       Other stuff
 
42 4D | 36 29 03 00 | 00 00 | 00 00 | 36 04 00 00 | 28 00 . . . . 
And the BITMAPFILEHEADER structure in memory looks like this:

Why is the structure aligned like this? This means that:
  sizeof(BITMAPFILEHEADER) == 16

Reading the header with the code:

fread(&header, sizeof(BITMAPFILEHEADER), 1, fp);
causes the first 16 bytes (sizeof(BITMAPFILEHEADER)) of the file to be read into the buffer (memory pointed to by &header), which yields:

Which gives the values we saw (adjusting for little-endian):
Member             Hex       Decimal
---------------------------------------
bfType              4D42       19778   
bfSize          00000003           3
bfReserved1         0000           0
bfReserved2         0436        1078
bfOffBits       00280000     2621440
Again, the correct output should be:
Type: BM (4D42)
Size: 207158
Res1: 0
Res2: 0
Offs: 1078


To achieve the correct results, we need to pack the structure:

#pragma pack(2)
typedef struct tagBITMAPFILEHEADER { 
  WORD    bfType;       /* 2 bytes */
  DWORD   bfSize;       /* 4 bytes */
  WORD    bfReserved1;  /* 2 bytes */
  WORD    bfReserved2;  /* 2 bytes */ 
  DWORD   bfOffBits;    /* 4 bytes */
} BITMAPFILEHEADER, *PBITMAPFILEHEADER; 
#pragma pack()
Now, the structure in memory looks like this:

and:
  sizeof(BITMAPFILEHEADER) == 14
so now when we read in 14 bytes, the structure is filled like this:

which gives the correct values:
Member             Hex       Decimal
---------------------------------------
bfType              4D42       19778   
bfSize          00032936      207158
bfReserved1         0000           0
bfReserved2         0000           0
bfOffBits       00000436        1078
The actual structure in wingdi.h looks like this:
#include <pshpack2.h>
typedef struct tagBITMAPFILEHEADER {
        WORD    bfType;
        DWORD   bfSize;
        WORD    bfReserved1;
        WORD    bfReserved2;
        DWORD   bfOffBits;
} BITMAPFILEHEADER, FAR *LPBITMAPFILEHEADER, *PBITMAPFILEHEADER;
#include <poppack.h>
and pshpack2.h looks like this:
#if ! (defined(lint) || defined(_lint) || defined(RC_INVOKED))
#if ( _MSC_VER >= 800 ) || defined(_PUSHPOP_SUPPORTED)
#pragma warning(disable:4103)
#if !(defined( MIDL_PASS )) || defined( __midl )
#pragma pack(push)
#endif
#pragma pack(2)
#else
#pragma pack(2)
#endif
#endif // ! (defined(lint) || defined(_lint) || defined(RC_INVOKED))
Complete listings

Addresses and values at different pack values:

#pragma pack(1)              #pragma pack(2)              #pragma pack(4)
-----------------------------------------------------------------------------------
bfType = 0012FEE0            bfType = 0012FEE0            bfType = 0012FEE0
bfSize = 0012FEE2            bfSize = 0012FEE2            bfSize = 0012FEE4
bfRes1 = 0012FEE6            bfRes1 = 0012FEE6            bfRes1 = 0012FEE8
bfRes2 = 0012FEE8            bfRes2 = 0012FEE8            bfRes2 = 0012FEEA
bfOffs = 0012FEEA            bfOffs = 0012FEEA            bfOffs = 0012FEEC
Type: BM (4D42)              Type: BM (4D42)              Type: BM (4D42)
Size: 207158 (00032936)      Size: 207158 (00032936)      Size: 3 (00000003)
Res1: 0 (0000)               Res1: 0 (0000)               Res1: 0 (0000)
Res2: 0 (0000)               Res2: 0 (0000)               Res2: 1078 (0436)
Offs: 1078 (00000436)        Offs: 1078 (00000436)        Offs: 2621440 (00280000)

Bit Fields in Structures

Suppose we wanted to track these attributes of some object:
// Variables for each attribute of some object
// Comments represent the range of values for the attribute
unsigned char level;   // 0 - 3
unsigned char power;   // 0 - 63
unsigned short range;  // 0 - 1023
unsigned char armor;   // 0 - 15
unsigned short health; // 0 - 511
unsigned char grade;   // 0 - 1
Given the sizes of each data type, we could say that the minimum amount of memory require to hold these attributes is 8 bytes. However, given a 32-bit computer, it's possible that the amount of memory required could actually be 24 bytes, depending on where these variables exist. Why?

Declared local to a function: (on the stack)

MicrosoftGNUBorland
Address of level = 0012FF28
Address of power = 0012FF24
Address of range = 0012FF20
Address of armor = 0012FF1C
Address of health = 0012FF18
Address of grade = 0012FF14
Address of level = 0x22F047
Address of power = 0x22F046
Address of range = 0x22F044
Address of armor = 0x22F043
Address of health = 0x22F040
Address of grade = 0x22F03F
Address of level = 0012FF83
Address of power = 0012FF82
Address of range = 0012FF80
Address of armor = 0012FF7F
Address of health = 0012FF7C
Address of grade = 0012FF7B

Declared globally:

MicrosoftGNUBorland
Address of level = 004310BE
Address of power = 004310BC
Address of range = 004312FC
Address of armor = 004310BD
Address of health = 00431142
Address of grade = 004310BF
Address of level = 0x405030
Address of power = 0x406060
Address of range = 0x406050
Address of armor = 0x406080
Address of health = 0x407110
Address of grade = 0x407160
Address of level = 004122E0
Address of power = 004122E1
Address of range = 004122E2
Address of armor = 004122E4
Address of health = 004122E6
Address of grade = 004122E8

Our first attempt to save memory is to put them in a structure:

// Put into a struct
typedef struct
{
  unsigned char level;   // 0 - 3
  unsigned char power;   // 0 - 63
  unsigned short range;  // 0 - 1023
  unsigned char armor;   // 0 - 15
  unsigned short health; // 0 - 511
  unsigned char grade;   // 0 - 1
}ENTITY_ATTRS;

What is the memory requirements for this struct? Of course, it depends on how the compiler is packing structures. Given the default pack value in MSVC, the layout looks like this:

What about this structure:

// Put into a struct and pack
#pragma pack(1)    /* align on 1-byte boundaries */
typedef struct
{
  unsigned char level;   // 0 - 3
  unsigned char power;   // 0 - 63
  unsigned short range;  // 0 - 1023
  unsigned char armor;   // 0 - 15
  unsigned short health; // 0 - 511
  unsigned char grade;   // 0 - 1
}ENTITY_ATTRS;
#pragma pack()
This code yields a layout like this:

Of course, looking closer, we realize that we only need 32 bits for all 6 variables, so we'll just use an unsigned integer to store the values:

To set the fields to these values:

level = 3;     //  2 bits wide
power = 32;    //  6 bits wide
range = 1000;  // 10 bits wide
armor = 7;     //  4 bits wide
health = 300;  //  9 bits wide
grade = 1;     //  1 bit wide
We can use "simple" bit manipulation:
unsigned int attrs;

attrs = 3 << 30;              // set level to 3
attrs = attrs | (32 << 24);   // set power to 32
attrs = attrs | (1000 << 14); // set range to 1000
attrs = attrs | (7 << 10);    // set armor to 7
attrs = attrs | (300 << 1);   // set health to 300
attrs = attrs | 1;            // set grade to 1
After shifting, we OR all of the values together:
Left shifts            Binary 
-----------------------------------------------
   3 << 30     11000000000000000000000000000000
  32 << 24       100000000000000000000000000000
1000 << 14             111110100000000000000000  
   7 << 10                       01110000000000
 300 <<  1                           1001011000
         1                                    1
-----------------------------------------------         
               11100000111110100001111001011001                                                     
Of course, there's got to be a better way...
// Use bitfields for the attributes
typedef struct
{
  unsigned int level  :  2; // 0 - 3
  unsigned int power  :  6; // 0 - 63
  unsigned int range  : 10; // 0 - 1023
  unsigned int armor  :  4; // 0 - 15
  unsigned int health :  9; // 0 - 511
  unsigned int grade  :  1; // 0 - 1
}ENTITY_ATTRS_B;
The sizeof the structure above is 4, which is the same size as the unsigned integer used before. However, this structure allows for a much cleaner syntax:
ENTITY_ATTRS_B attrs;

  // Easier to read, understand, and self-documenting
attrs.level = 3;
attrs.power = 32;
attrs.range = 1000;
attrs.armor = 7;
attrs.health = 300;
attrs.grade = 1;
Much like a lot of syntax in C, the compiler is doing the work for you behind-the-scenes.

Notes


Advanced Structures using Function Pointers

Suppose we want to create a structure to represent employees and write client code such as this:
#include "Employee_c.h"

void main(void)
{
  Employee *emp1 = NewEmployee("Mickey", "Mouse", 60000, 15);
  Employee *emp2 = NewEmployee("Donald", "Duck", 50000, 10);
  emp1->Display(emp1);
  emp2->Display(emp2);
  emp1->SetSalary(emp1, 85000);
  emp2->SetYears(emp2, 15);
  emp1->Display(emp1);
  emp2->Display(emp2);
}
We can assume the output might look something like this:
  Name: Mouse, Mickey
Salary: $60000
 Years: 15
  Name: Duck, Donald
Salary: $50000
 Years: 10
  Name: Mouse, Mickey
Salary: $85000
 Years: 15
  Name: Duck, Donald
Salary: $50000
 Years: 15
What do you notice about the program above?

We start with a header file named Employee_c.h that looks like this:

typedef struct Employee Employee;

struct Employee {
  void *data;
  void (*SetFirstName)(Employee *, const char *firstname);
  void (*SetLastName)(Employee *, const char *lastname);
  void (*SetName)(Employee *, char *first, char *last);
  void (*SetSalary)(Employee *, float salary);
  void (*SetYears)(Employee *, int years);
  const char *(*GetFirstName)(Employee *);
  const char *(*GetLastName)(Employee *);
  float (*GetSalary)(Employee *);
  int (*GetYears)(Employee *);
  void (*Display)(Employee *);
};

Employee *NewEmployee(const char *first, const char *last, float salary, int years);
How could we code the implementation to support the program above?

After studying the program, implement the functions in a file named Employee_c.c.