Arrays

"If the code and the comments disagree, then both are probably wrong." -- Norm Schryer

One-Dimensional Arrays

Given the code below, what are the types of a, b, c, d, e, f and g?
short a;         // a scalar variable
short *b;        // a scalar variable
short c[10];     // an array variable
short d[10];     // an array variable
short (*e)[5];   // a pointer (scalar) to an array
short (*f)[10];  // a pointer (scalar) to an array
short * const g = c; // a constant pointer (scalar)
For each statement below, indicate which will not compile (error), compile with a warning, or compile cleanly. For each error or warning, describe the compiler's complaint.

a = c[0];  

b = c;     
b = &c[0]; 
b = &c;    
c = b;     
c = d;     

e = c;     
e = &c;    
e = f;     
g = c;     
*g = 5;    

Compiler messages

The Basic Rule:
array[i] == *(array + i)
where: This means: What's going on in the second and third printf statements? (Besides job security)

According to the Basic Rule:

array[index] ==> *(array + index) ==> *(array + index)
index[array] ==>                      *(index + array) 
3[array]     ==>                      *(  3   + array)
Don't forget that due to C's built-in pointer arithmetic, the addition (e.g. array + 3) is scaled:
array[3] ==> *(array + 3 * sizeof(int)) ==> *(array + 12 bytes)

Relationship Between Subscripts and Pointers

Using the rule to convert a subscript to a pointer/offset, we get:
 a[i] ==> *(a + i)
&a[i] ==> &(*(a + i))
&a[i] ==> &*(a + i)
&a[i] ==> a + i
This shows that the address of any element is just the base address of the array plus the index (scaled).
char a[] = "abcdef";
char *p = a;

printf("%p, %p, %p, %p, %p\n", a, a + 2, &*(a + 2), p + 2, &*(p + 2));
Output:
0012FED4, 0012FED6, 0012FED6, 0012FED6, 0012FED6
Other equivalences:
 a[i] ==> *(a + i)
 a[0] ==> *(a + 0)
 a[0] ==> *a
&a[0] ==> &*a
&a[0] ==> a
These calls are equivalent:
f(*a);    // pass first element of array
f(a[0]);  // pass first element of array
and so are these:
f(&a[0]); // pass address of first element
f(a);     // pass address of array
Given this declaration:
int array[] = {2, 4, 6, 8};
The following expressions are all equivalent:
      array = 0012FF1C
    &*array = 0012FF1C
  &*&*array = 0012FF1C
&*&*&*array = 0012FF1C

      array[0] = 2
    *&array[0] = 2
  *&*&array[0] = 2
*&*&*&array[0] = 2

Pointer Expressions and Arrays

Given this code:
int a[10] = {5, 8, 3, 2, 1, 9, 0, 4, 7, 6};
int *p = a + 2;
Abstract diagram:

or shown with concrete values (addresses are arbitrary):

Self-check Give the l-value and r-value of the expressions as well as an equivalent expression using a. (Some expressions are illegal l-values.)

  1. p
  2. p[0]
  3. *p
  4. p + 3
  5. *p + 5
  6. *(p + 6)
  7. p[6]
  8. &p
  9. p[-1]
  10. p[9]
Given these definitions:
#define SIZE 10000000
int x[SIZE];
int y[SIZE];
int i;
int *p1, *p2;
Which loop is more efficient? Why?
  1. for (i = 0; i < SIZE; i++) x[i] = y[i];
  2. for (p1 = x, p2 = y; p1 - x < SIZE; ) *p1++ = *p2++;
  3. for (p1 = x, p2 = y; p1 < &x[SIZE]; ) *p1++ = *p2++;
  4. register int *p1, *p2; for (p1 = x, p2 = y; p1 < &x[SIZE]; ) *p1++ = *p2++;
Details

Passing Arrays to Functions

The following two prototypes are equivalent:
int string_length(const char *string);
int string_length(const char string[]);
The body of the function can be implemented as:
{
  int len = 0;
  while (*string++)
    len++;
  return len;
}
Points: Self-check:
void TestSizeof(char *a, char b[], char c[10], char *d[10])
{
  char *aa, bb[], cc[10], *dd[10];

  printf("sizeof(a) = %i\n", sizeof(a));
  printf("sizeof(b) = %i\n", sizeof(b));
  printf("sizeof(c) = %i\n", sizeof(c));
  printf("sizeof(d) = %i\n", sizeof(d));
  printf("sizeof(aa) = %i\n", sizeof(aa));
  printf("sizeof(bb) = %i\n", sizeof(bb));
  printf("sizeof(cc) = %i\n", sizeof(cc));
  printf("sizeof(dd) = %i\n", sizeof(dd));
}

Array Initialization

Static vs. Automatic:
void some_function(void)
{
  int array1[5] = {1, 2, 3, 4, 5};        // on the stack
  static int array2[5] = {1, 2, 3, 4, 5}; // not on the stack
  ...
}

Assembly listing

Partial:

int array3[5] = {1, 2, 3};          // 1, 2, 3, 0, 0
int array4[5] = {1, 2, 3, 4, 5, 6}; // error: too many initializers
Automatic sizing:
int array5[] = {1, 2, 3};          // size of array5 is 3 elements
int array6[] = {1, 2, 3, 4, 5, 6}; // size of array6 is 6 elements
Character Array Initialization:
char s1[] = {'H', 'e', 'l', 'l', 'o'};    // array of 5 chars
char s2[] = {'H', 'e', 'l', 'l', 'o', 0}; // array of 6 chars

char s3[] = "Hello"; // array of 6 chars; 5 + NULL terminator
char *s4 = "Hello";  // pointer to string of 6 chars; 5 + NULL terminator

Assembly listing


Arrays of Pointers

Given the declarations:
char *strings[] = {"First", "Second", "Third", "Fourth", NULL};
char **ppstr = strings;
Draw a diagram that describes the declarations above. Assume:
Symbol       Address
--------------------
strings       1000
ppstr         2000
"First"        100
"Second"       108
"Third"        116
"Fourth"       124

What does the following code display?

printf("%c\n", **ppstr);  
printf("%s\n", *ppstr);   
printf("%c\n", *ppstr[0]);
printf("%s\n", &**ppstr); 

++*ppstr;                
printf("%c\n", **ppstr); 
printf("%s\n", *ppstr);  
  
(*++ppstr)++; 
(*ppstr)++;   
printf("%c\n", **ppstr); 
printf("%s\n", *ppstr);  

ppstr += 2;  
*ppstr += 4; 
printf("%c\n", **ppstr); 
printf("%s\n", *ppstr);  


Detailed solution.

After the code above executes, the code below is executed. What is the output? (On some systems this will fail. Assume it will work as expected here.)

*++*--ppstr = 'r';            /* ??? */
*(*ppstr += 2) = 'a';         /* ??? */
printf("%s\n", (*ppstr - 3)); /* ??? */
Detailed solution.

Multidimensional Arrays

An array with more than one dimension is called a multidimensional array.
int matrix[5][10];  // array of 5 arrays of 10 int; a 5x10 array of int
Building up multidimensional arrays:
int a;            // int
int b[10];        // array of 10 int
int c[5][10];     // array of 5 arrays of 10 int
int d[3][5][10];  // array of 3 arrays of 5 arrays of 10 int

int e[10][5][3];  // array of 10 arrays of 5 arrays of 3 int
Storage order
Arrays in C are stored in row major order. This means that the rightmost subscript varies the most rapidly.

Given the declaration of points:

double points[3][4];
We could diagram the arrays like this:

With details:

Or draw it contiguously (as it really is in memory):
 
Or horizontally:


Giving concrete values to the 2D array of doubles will help visualize the arrays. Note how the initialization syntax helps us visualize the "array of arrays" notion:
double points[3][4] = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10, 11, 12}};
or even formatted as a 3x4 matrix:
double points[3][4] = { 
                        {1.0,  2.0,  3.0,  4.0}, 
                        {5.0,  6.0,  7.0,  8.0}, 
                        {9.0, 10.0, 11.0, 12.0}
                      };
This will work in most compilers, but you may get warnings:
  double points[3][4] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0};
Diagram:

Some expressions involving points:
Addresses                             Type
-------------------------------------------------------------
points        = 0012F5A4    An array of 3 arrays of 4 doubles
&points       = 0012F5A4    A pointer to an array of 3 arrays of 4 doubles
points[0]     = 0012F5A4    An array of 4 doubles
&points[0]    = 0012F5A4    A pointer to an array of 4 doubles
*points       = 0012F5A4    An array of 4 doubles
&points[0][0] = 0012F5A4    A pointer to a double

Contents
------------------------
**points      = 1.000000
*points[0]    = 1.000000
points[0][0]  = 1.000000

Sizes
-------------------------
sizeof(points)       = 96
sizeof(*points)      = 32
sizeof(**points)     =  8
sizeof(points[0])    = 32
sizeof(points[0][0]) =  8
Code to display above tables.


Self-check:

Given this declaration:
double points[3][4];
which of the statements are valid? warnings/errors? (You need to look at the types on both sides of the assignment)
  1. double *pd0 = points[0][1];
  2. double *pd1 = &points[0][1];
  3. double *pd2 = &points[5][7];
  4. double *pd3 = &**points;
  5. double (*p4d0)[4] = points[0];
  6. double (*p4d1)[4] = &points[0];
  7. double (*p4d2)[4] = (double (*)[4])points[0];
  8. double (*p4d3)[4] = (double (*)[4])&points[0][0];
  9. double (*p4d4)[] = &points[0];
  10. double (*p4d5)[3][4] = points;
  11. double (*p4d6)[3][4] = &points;
  12. double (*p4d7)[][4] = &points;
  13. double (*p4d8)[][] = &points;


Self-check:

Given this code:
double points[3][4] = { {1.0,  2.0,  3.0,  4.0}, 
                        {5.0,  6.0,  7.0,  8.0}, 
                        {9.0, 10.0, 11.0, 12.0}};
                        
double *pd = &points[0][1];
double (*p4d)[4] = &points[0];
What does this code display?
pd = &points[0][1];
printf("First value is %g\n", *pd);
printf("Second value is %g\n", *++pd);
printf("Third value is %g\n\n", *++pd);
What does this code display?
p4d = &points[0];
printf("First value is %g\n", **p4d);
printf("Second value is %g\n", **++p4d);
printf("Third value is %g\n", **++p4d);
And this?
pd = &points[0][1];
printf("First value is %g\n", *pd);
printf("Second value is %g\n", ++*pd);
printf("Third value is %g\n", ++*pd);
printf("points[0][1] is %g\n", points[0][1]);

Traversing multi-dimension static arrays using simple pointers to traverse row boundaries is known as flattening the array and is technically illegal according to the Standard. However, it usually works, but you should be aware of the non-standard behavior.


Accessing Elements in a 2-D Array

short matrix[3][8]; /* 24 shorts, 3x8 array */
matrix
  matrix[0]
*(matrix + 0)
   *matrix
  matrix[1]
*(matrix + 1)
  matrix[2]
*(matrix + 2)
     matrix[1][2]
*(*(matrix + 1) + 2)
Remember the rule:
array[i] == *(array + i)
where: With multidimensional arrays, the rule becomes:
array[i][j] == *(*(array + i) + j)
array[i][j][k] == *(*(*(array + i) + j) + k)
etc...
Pointer arithmetic is used to locate each element. (Base address + Offset)

Given this declaration:

short matrix[3][8];
The value of sizeof varies with the argument:
Sizes
-------------------------
sizeof(matrix)       = 48   ; entire matrix
sizeof(matrix[0])    = 16   ; first row
sizeof(matrix[1])    = 16   ; second row
sizeof(matrix[0][0]) = 2    ; first short element

Passing 2D Arrays to Functions

Putting values in the matrix and printing it:
Fill3x8Matrix(matrix);  /* Put values in the matrix */
Print3x8Matrix(matrix); /* Print the matrix         */
Implementations:
void Fill3x8Matrix(short matrix[][8])
{
  int i, j;
  for (i = 0; i < 3; i++)
    for (j = 0; j < 8; j++)
      matrix[i][j] = i * 8 + j + 1; 
}

void Print3x8Matrix(short matrix[][8])
{
  int i, j;
  for (i = 0; i < 3; i++)
    for (j = 0; j < 8; j++)
      printf("%i ", matrix[i][j]);
  printf("\n");
}
These functions could have specified the parameters this way:
void Fill3x8Matrix(short (*matrix)[8])
void Print3x8Matrix(short (*matrix)[8])
or
void Fill3x8Matrix(short matrix[3][8]);
void Print3x8Matrix(short matrix[3][8]);
Why are they not declared like this?:
void Fill3x8Matrix(short matrix[][]);
void Print3x8Matrix(short matrix[][]);

The compiler needs to know the size of each element in each dimension. It doesn't need to (and can't) know the number of elements in the first dimension. The size of each element in the first dimension is determined by the other dimensions and the type of the elements.

void Test(int a[], int b[][6], int c[][3][5])
{
  printf("a = %p, b = %p, c = %p\n", a, b, c);
  a++;
  b++;
  c++;
  printf("a = %p, b = %p, c = %p\n", a, b, c);
}

Output:
a = 0012FEE8, b = 0012FF38, c = 0012FEFC  
a = 0012FEEC, b = 0012FF50, c = 0012FF38  
In decimal:
Output:
a = 1244904, b = 1244984, c = 1244924
a = 1244908, b = 1245008, c = 1244984
The function Test is equivalent to this:
void Test(int *a, int (*b)[6], int (*c)[3][5])
Other methods for filling the matrix use explicit pointer arithmetic:
void Fill3x8Matrix(short matrix[][8])
{
  int i, j;
  for (i = 0; i < 3; i++)
    for (j = 0; j < 8; j++)
      *(*(matrix + i) + j) = i * 8 + j + 1; 
}

void Fill3x8Matrix(short matrix[][8])
{
  int i, j;
  for (i = 0; i < 3; i++)
  {
    short *pmat = *(matrix + i);
    for (j = 0; j < 8; j++)
      *pmat++ = i * 8 + j + 1;
  }
}
How does the compiler calculate the address (offset) for the element below?
matrix[1][2];

Using address offsets we get:
&matrix[1][2] ==> &*(*(matrix + 1) + 2) ==> *(matrix + 1) + 2
  1. First dimension - Each element of matrix is an array of 8 shorts, so each element is 16 bytes.
  2. Second dimension - Each element of each element of matrix is a short, so it's 2 bytes.
Given these declarations:
short matrix[3][8]        
short array[10]
We can calculate the size of any portion:
Expression             Meaning                 Size (bytes)
-----------------------------------------------------------
array                Entire array                  20
array[]              Element in 1st dimension       2
matrix               Entire array                  48
matrix[]             Element in 1st dimension      16
matrix[][]           Element in 2nd dimension       2
Recap:

Dynamically Allocated 2D Arrays

Recall the 2D points static array and how a dynamically allocated array would look:

double points[3][4];


double *pd = malloc(3 * 4 * sizeof(double));

Given a row and column:

int row = 1, column = 2;
double value;
If we want to use two subscripts on a dynamic 2D array, we have to set things up a little differently.


Using these definitions from above:

#define ROWS 3
#define COLS 4
double *pd = malloc(ROWS * COLS * sizeof(double));
Create a variable that is a pointer to a pointer to a double
double **ppd;
Allocate an array of 3 (ROWS) pointers to doubles and point ppd at it:
ppd = malloc(ROWS * sizeof(double *));

Point each element of ppd at an array of 4 doubles:
ppd[0] = pd;
ppd[1] = pd + 4;
ppd[2] = pd + 8;
Of course, for a large array, or an array whose size is not known at compile time, you would want to set these in a loop:
int row;
for (row = 0; row < ROWS; row++)
  ppd[row] = pd + (COLS * row);
This yields the diagram:

Given a row and column, we can access elements through the single pointer or double pointer variable:
int row = 1, column = 3;
double value;

  /* Access via double pointer using subscripting */
value = ppd[row][column];            

  /* Access via single pointer using pointer arithmetic        */
  /* and/or subscripting. These statements are all equivalent. */
value = pd[row * COLS + column];
value = *(pd + row * COLS + column);
value = (pd + row * COLS)[column];
If you wanted to let the compiler do the work, you could do this:
#define ROWS 3
#define COLS 4
double *pd = malloc(ROWS * COLS * sizeof(double));

  /* ppd is a pointer to an array of 4 doubles (need the cast) on the right */
double (*ppd)[COLS] = (double (*)[COLS]) pd;