Arrays

"If the code and the comments disagree, then both are probably wrong." -- Norm Schryer

One-Dimensional Arrays

Given the code below, what are the types of a, b, c, d, e, f and g?

short a;         // a scalar variable
short *b;        // a scalar variable
short c[10];     // an array variable
short d[10];     // an array variable
short (*e)[5];   // a pointer (scalar) to an array
short (*f)[10];  // a pointer (scalar) to an array
short * const g = c; // a constant pointer (scalar)

For each statement below, indicate which will not compile (error), compile with a warning, or compile cleanly. For each error or warning, describe the compiler's complaint.

a = c[0]; b = c; b = &c[0]; b = &c;

c = b; c = d; e = c; e = &c; e = f;

g = c; *g = 5;

Compiler messages

The value of an array name is the address of the first element of the array and may also be typed as a pointer constant.
The type of c is an array of short but sometimes we can think of it as a constant pointer to a short.
You can't assign one array to another. (They are constant). You must use a loop and assign each element individually.

The sizeof operator gives the size of the array in bytes:

sizeof(a) == 2      // a scalar
sizeof(b) == 4      // a scalar
sizeof(c) == 20     // an array
sizeof(d) == 20     // an array
sizeof(e) == 4      // a scalar
sizeof(f) == 4      // a scalar
sizeof(g) == 4      // a scalar
sizeof(c[0]) == 2   // a scalar
sizeof(&c[0]) == 4  // a scalar

The Basic Rule:

array[i] == *(array + i)

where:

array is an array of any type
i is any integer expression

This means:

An array reference is just a pointer and an offset.
The pointer is usually the (base) address of the array.
The offset is scaled by the size of the element type. (Pointer arithmetic)

The compiler converts all array references to a pointer/offset, so:

void f(void)
{
  int array[] = {5, 10, 15, 20, 25};
  int index = 3;

  printf("%i\n", array[index]); // 20
  printf("%i\n", index[array]); // ???
  printf("%i\n", 3[array]);     // ???
}

What's going on in the second and third printf statements? (Besides job security)

According to the Basic Rule:

array[index] ==> *(array + index) ==> *(array + index)
index[array] ==>                      *(index + array) 
3[array]     ==>                      *(  3   + array)

Don't forget that due to C's built-in pointer arithmetic, the addition (e.g. array + 3) is scaled:

array[3] ==> *(array + 3 * sizeof(int)) ==> *(array + 12 bytes)

Relationship Between Subscripts and Pointers

Using the rule to convert a subscript to a pointer/offset, we get:

 a[i] ==> *(a + i)
&a[i] ==> &(*(a + i))
&a[i] ==> &*(a + i)
&a[i] ==> a + i

This shows that the address of any element is just the base address of the array plus the index (scaled).

char a[] = "abcdef";
char *p = a;

printf("%p, %p, %p, %p, %p\n", a, a + 2, &*(a + 2), p + 2, &*(p + 2));

Output:

0012FED4, 0012FED6, 0012FED6, 0012FED6, 0012FED6

Other equivalences:

 a[i] ==> *(a + i)
 a[0] ==> *(a + 0)
 a[0] ==> *a
&a[0] ==> &*a
&a[0] ==> a

These calls are equivalent:

f(*a);    // pass first element of array
f(a[0]);  // pass first element of array

and so are these:

f(&a[0]); // pass address of first element
f(a);     // pass address of array

Given this declaration:

int array[] = {2, 4, 6, 8};

The following expressions are all equivalent:

      array = 0012FF1C
    &*array = 0012FF1C
  &*&*array = 0012FF1C
&*&*&*array = 0012FF1C

      array[0] = 2
    *&array[0] = 2
  *&*&array[0] = 2
*&*&*&array[0] = 2

Pointer Expressions and Arrays

Given this code:

int a[10] = {5, 8, 3, 2, 1, 9, 0, 4, 7, 6};
int *p = a + 2;

Abstract diagram:

or shown with concrete values (addresses are arbitrary):

Self-check Give the l-value and r-value of the expressions as well as an equivalent expression using a. (Some expressions are illegal l-values.)

p
p[0]
*p
p + 3
*p + 5
*(p + 6)
p[6]
&p
p[-1]
p[9]

Given these definitions:

#define SIZE 10000000
int x[SIZE];
int y[SIZE];
int i;
int *p1, *p2;

Which loop is more efficient? Why?

for (i = 0; i < SIZE; i++)
  x[i] = y[i];

for (p1 = x, p2 = y; p1 - x < SIZE; )
  *p1++ = *p2++;

for (p1 = x, p2 = y; p1 < &x[SIZE]; )
  *p1++ = *p2++;

register int *p1, *p2;
for (p1 = x, p2 = y; p1 < &x[SIZE]; )
  *p1++ = *p2++;

Details

Passing Arrays to Functions

The following two prototypes are equivalent:

int string_length(const char *string);
int string_length(const char string[]);

The body of the function can be implemented as:

{
  int len = 0;
  while (*string++)
    len++;
  return len;
}

Points:

This is a "feature" of the C language. Dennis Ritchie, one of the developers of the language, uses the terms "historical accident" and "mistake" when describing this feature in his Critique section.
A pointer to the first element, not the array, is actually passed to the function.
sizeof(string) is the size of the pointer (4 bytes), not the size of the array.
The array notation may help document the code.
You can even put an integer in the brackets, but it will be ignored.
No space is allocated for the array, only the pointer.
Because only a pointer is passed, it is impossible to determine the size of the array that was passed (or even if an array is passed).

Self-check:

Why doesn't the following compile?
Remove the offending code so it compiles.
What does the remaining code display?

void TestSizeof(char *a, char b[], char c[10], char *d[10])
{
  char *aa, bb[], cc[10], *dd[10];

  printf("sizeof(a) = %i\n", sizeof(a));
  printf("sizeof(b) = %i\n", sizeof(b));
  printf("sizeof(c) = %i\n", sizeof(c));
  printf("sizeof(d) = %i\n", sizeof(d));
  printf("sizeof(aa) = %i\n", sizeof(aa));
  printf("sizeof(bb) = %i\n", sizeof(bb));
  printf("sizeof(cc) = %i\n", sizeof(cc));
  printf("sizeof(dd) = %i\n", sizeof(dd));
}

Array Initialization

Static vs. Automatic:

void some_function(void)
{
  int array1[5] = {1, 2, 3, 4, 5};        // on the stack
  static int array2[5] = {1, 2, 3, 4, 5}; // not on the stack
  ...
}

Assembly listing

Partial:

int array3[5] = {1, 2, 3};          // 1, 2, 3, 0, 0
int array4[5] = {1, 2, 3, 4, 5, 6}; // error: too many initializers

Automatic sizing:

int array5[] = {1, 2, 3};          // size of array5 is 3 elements
int array6[] = {1, 2, 3, 4, 5, 6}; // size of array6 is 6 elements

Character Array Initialization:

char s1[] = {'H', 'e', 'l', 'l', 'o'};    // array of 5 chars
char s2[] = {'H', 'e', 'l', 'l', 'o', 0}; // array of 6 chars

char s3[] = "Hello"; // array of 6 chars; 5 + NULL terminator
char *s4 = "Hello";  // pointer to string of 6 chars; 5 + NULL terminator

Assembly listing

Arrays of Pointers

Given the declarations:

char *strings[] = {"First", "Second", "Third", "Fourth", NULL};
char **ppstr = strings;

Draw a diagram that describes the declarations above. Assume:

Symbol       Address
--------------------
strings       1000
ppstr         2000
"First"        100
"Second"       108
"Third"        116
"Fourth"       124

What does the following code display?

printf("%c\n", **ppstr);  
printf("%s\n", *ppstr);   
printf("%c\n", *ppstr[0]);
printf("%s\n", &**ppstr); 

++*ppstr;                
printf("%c\n", **ppstr); 
printf("%s\n", *ppstr);  
  
(*++ppstr)++; 
(*ppstr)++;   
printf("%c\n", **ppstr); 
printf("%s\n", *ppstr);  

ppstr += 2;  
*ppstr += 4; 
printf("%c\n", **ppstr); 
printf("%s\n", *ppstr);

Detailed solution.

After the code above executes, the code below is executed. What is the output? (On some systems this will fail. Assume it will work as expected here.)

*++*--ppstr = 'r';            /* ??? */
*(*ppstr += 2) = 'a';         /* ??? */
printf("%s\n", (*ppstr - 3)); /* ??? */

Detailed solution.

Multidimensional Arrays

An array with more than one dimension is called a multidimensional array.

int matrix[5][10];  // array of 5 arrays of 10 int; a 5x10 array of int

Building up multidimensional arrays:

int a;            // int
int b[10];        // array of 10 int
int c[5][10];     // array of 5 arrays of 10 int
int d[3][5][10];  // array of 3 arrays of 5 arrays of 10 int

int e[10][5][3];  // array of 10 arrays of 5 arrays of 3 int

Storage order
Arrays in C are stored in row major order. This means that the rightmost subscript varies the most rapidly.

Given the declaration of points:

double points[3][4];

An array of 3 arrays of 4 doubles
A 3x4 array of doubles

We could diagram the arrays like this:

With details:

Or draw it contiguously (as it really is in memory):

 

Or horizontally:

Giving concrete values to the 2D array of doubles will help visualize the arrays. Note how the initialization syntax helps us visualize the "array of arrays" notion:

double points[3][4] = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10, 11, 12}};

or even formatted as a 3x4 matrix:

double points[3][4] = { 
                        {1.0,  2.0,  3.0,  4.0}, 
                        {5.0,  6.0,  7.0,  8.0}, 
                        {9.0, 10.0, 11.0, 12.0}
                      };

This will work in most compilers, but you may get warnings:

  double points[3][4] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0};

Diagram:

Some expressions involving points:

Addresses                             Type
-------------------------------------------------------------
points        = 0012F5A4    An array of 3 arrays of 4 doubles
&points       = 0012F5A4    A pointer to an array of 3 arrays of 4 doubles
points[0]     = 0012F5A4    An array of 4 doubles
&points[0]    = 0012F5A4    A pointer to an array of 4 doubles
*points       = 0012F5A4    An array of 4 doubles
&points[0][0] = 0012F5A4    A pointer to a double

Contents
------------------------
**points      = 1.000000
*points[0]    = 1.000000
points[0][0]  = 1.000000

Sizes
-------------------------
sizeof(points)       = 96
sizeof(*points)      = 32
sizeof(**points)     =  8
sizeof(points[0])    = 32
sizeof(points[0][0]) =  8

Code to display above tables.

Self-check:

Given this declaration:

double points[3][4];

which of the statements are valid? warnings/errors? (You need to look at the types on both sides of the assignment)


 double *pd0 = points[0][1];  
 double *pd1 = &points[0][1]; 
 double *pd2 = &points[5][7]; 
 double *pd3 = &**points;     
 double (*p4d0)[4] = points[0]; 
 double (*p4d1)[4] = &points[0];
 double (*p4d2)[4] = (double (*)[4])points[0];    
 double (*p4d3)[4] = (double (*)[4])&points[0][0];
 double (*p4d4)[] = &points[0]; 
 double (*p4d5)[3][4] = points; 
 double (*p4d6)[3][4] = &points;
 double (*p4d7)[][4] = &points; 
 double (*p4d8)[][] = &points;

Self-check:

Given this code:

double points[3][4] = { {1.0,  2.0,  3.0,  4.0}, 
                        {5.0,  6.0,  7.0,  8.0}, 
                        {9.0, 10.0, 11.0, 12.0}};
                        
double *pd = &points[0][1];
double (*p4d)[4] = &points[0];

What does this code display?

pd = &points[0][1];
printf("First value is %g\n", *pd);
printf("Second value is %g\n", *++pd);
printf("Third value is %g\n\n", *++pd);

What does this code display?

p4d = &points[0];
printf("First value is %g\n", **p4d);
printf("Second value is %g\n", **++p4d);
printf("Third value is %g\n", **++p4d);

And this?

pd = &points[0][1];
printf("First value is %g\n", *pd);
printf("Second value is %g\n", ++*pd);
printf("Third value is %g\n", ++*pd);
printf("points[0][1] is %g\n", points[0][1]);

Traversing multi-dimension static arrays using simple pointers to traverse row boundaries is known as flattening the array and is technically illegal according to the Standard. However, it usually works, but you should be aware of the non-standard behavior.

Accessing Elements in a 2-D Array

short matrix[3][8]; /* 24 shorts, 3x8 array */

matrix

  matrix[0]
*(matrix + 0)
   *matrix

  matrix[1]
*(matrix + 1)

  matrix[2]
*(matrix + 2)

     matrix[1][2]
*(*(matrix + 1) + 2)

Remember the rule:

array[i] == *(array + i)

where:

array is an array of any type
i is any integer expression

With multidimensional arrays, the rule becomes:

array[i][j] == *(*(array + i) + j)
array[i][j][k] == *(*(*(array + i) + j) + k)
etc...

Pointer arithmetic is used to locate each element. (Base address + Offset)

Given this declaration:

short matrix[3][8];

The value of sizeof varies with the argument:

Sizes
-------------------------
sizeof(matrix)       = 48   ; entire matrix
sizeof(matrix[0])    = 16   ; first row
sizeof(matrix[1])    = 16   ; second row
sizeof(matrix[0][0]) = 2    ; first short element

Passing 2D Arrays to Functions

Putting values in the matrix and printing it:

Fill3x8Matrix(matrix);  /* Put values in the matrix */
Print3x8Matrix(matrix); /* Print the matrix         */

Implementations:

void Fill3x8Matrix(short matrix[][8])
{
  int i, j;
  for (i = 0; i < 3; i++)
    for (j = 0; j < 8; j++)
      matrix[i][j] = i * 8 + j + 1; 
}

void Print3x8Matrix(short matrix[][8])
{
  int i, j;
  for (i = 0; i < 3; i++)
    for (j = 0; j < 8; j++)
      printf("%i ", matrix[i][j]);
  printf("\n");
}

These functions could have specified the parameters this way:

void Fill3x8Matrix(short (*matrix)[8])
void Print3x8Matrix(short (*matrix)[8])

void Fill3x8Matrix(short matrix[3][8]);
void Print3x8Matrix(short matrix[3][8]);

Why are they not declared like this?:

void Fill3x8Matrix(short matrix[][]);
void Print3x8Matrix(short matrix[][]);

The compiler needs to know the size of each element in each dimension. It doesn't need to (and can't) know the number of elements in the first dimension. The size of each element in the first dimension is determined by the other dimensions and the type of the elements.

void Test(int a[], int b[][6], int c[][3][5])
{
  printf("a = %p, b = %p, c = %p\n", a, b, c);
  a++;
  b++;
  c++;
  printf("a = %p, b = %p, c = %p\n", a, b, c);
}

Output:
a = 0012FEE8, b = 0012FF38, c = 0012FEFC  
a = 0012FEEC, b = 0012FF50, c = 0012FF38

In decimal:

Output:
a = 1244904, b = 1244984, c = 1244924
a = 1244908, b = 1245008, c = 1244984

The function Test is equivalent to this:

void Test(int *a, int (*b)[6], int (*c)[3][5])

Other methods for filling the matrix use explicit pointer arithmetic:

void Fill3x8Matrix(short matrix[][8])
{
  int i, j;
  for (i = 0; i < 3; i++)
    for (j = 0; j < 8; j++)
      *(*(matrix + i) + j) = i * 8 + j + 1; 
}

void Fill3x8Matrix(short matrix[][8])
{
  int i, j;
  for (i = 0; i < 3; i++)
  {
    short *pmat = *(matrix + i);
    for (j = 0; j < 8; j++)
      *pmat++ = i * 8 + j + 1;
  }
}

How does the compiler calculate the address (offset) for the element below?

matrix[1][2];

Using address offsets we get:

&matrix[1][2] ==> &*(*(matrix + 1) + 2) ==> *(matrix + 1) + 2

First dimension - Each element of matrix is an array of 8 shorts, so each element is 16 bytes.
Second dimension - Each element of each element of matrix is a short, so it's 2 bytes.

Given these declarations:

short matrix[3][8]        
short array[10]

We can calculate the size of any portion:

Expression             Meaning                 Size (bytes)
-----------------------------------------------------------
array                Entire array                  20
array[]              Element in 1st dimension       2
matrix               Entire array                  48
matrix[]             Element in 1st dimension      16
matrix[][]           Element in 2nd dimension       2

Recap:

The compiler needs to know the size of each of the elements, in each dimension.
Since the size of each dimension relies on the fundamental type (int, double, etc.) of the array(s), there is an implicit size specified.
In a two-dimensional array, knowing the size of the second dimension (number of columns) and the data type of the array is sufficient to perform pointer arithmetic on the first dimension.
This seemingly convoluted way of locating array elements is required since memory is physically laid out in one dimension. The multiple dimension syntax (e.g. [][]) is just a convenience for the programmer.

Dynamically Allocated 2D Arrays

Recall the 2D points static array and how a dynamically allocated array would look:

double points[3][4];

double *pd = malloc(3 * 4 * sizeof(double));

Given a row and column:

int row = 1, column = 2;
double value;

The static 2D array can be accessed using subscripts, but the dynamic "2D array" can only be indexed with a single subscript.
```
value = points[row][column]; /* OK      */
value = pd[row][column];     /* ILLEGAL */
```
We (the programmers) have to do all of the arithmetic to locate an element using two subscripts:
```
value = pd[row * 4 + column];
```

The compiler is still doing some of the work for us:

value = *(address-of-pd + (row * 4 + column) * sizeof(double));

What does the number 4 in the above calculations represent?

If we want to use two subscripts on a dynamic 2D array, we have to set things up a little differently.

Using these definitions from above:

#define ROWS 3
#define COLS 4
double *pd = malloc(ROWS * COLS * sizeof(double));

Create a variable that is a pointer to a pointer to a double

double **ppd;

Allocate an array of 3 (ROWS) pointers to doubles and point ppd at it:

ppd = malloc(ROWS * sizeof(double *));

Point each element of ppd at an array of 4 doubles:

ppd[0] = pd;
ppd[1] = pd + 4;
ppd[2] = pd + 8;

Of course, for a large array, or an array whose size is not known at compile time, you would want to set these in a loop:

int row;
for (row = 0; row < ROWS; row++)
  ppd[row] = pd + (COLS * row);

This yields the diagram:

Given a row and column, we can access elements through the single pointer or double pointer variable:

int row = 1, column = 3;
double value;

  /* Access via double pointer using subscripting */
value = ppd[row][column];            

  /* Access via single pointer using pointer arithmetic        */
  /* and/or subscripting. These statements are all equivalent. */
value = pd[row * COLS + column];
value = *(pd + row * COLS + column);
value = (pd + row * COLS)[column];

If you wanted to let the compiler do the work, you could do this:

#define ROWS 3
#define COLS 4
double *pd = malloc(ROWS * COLS * sizeof(double));

  /* ppd is a pointer to an array of 4 doubles (need the cast) on the right */
double (*ppd)[COLS] = (double (*)[COLS]) pd;