Arrays

Code	Output
int i; / uninitialized / float f; / uninitialized / double d; / uninitialized / printf("sizeof(i) is %u\n", sizeof(i)); printf("sizeof(int) is %u\n", sizeof(int)); printf("sizeof(42) is %u\n", sizeof(42)); printf("\n"); printf("sizeof(f) is %u\n", sizeof(f)); printf("sizeof(float) is %u\n", sizeof(float)); printf("sizeof(42.0F) is %u\n", sizeof(42.0F)); printf("\n"); printf("sizeof(d) is %u\n", sizeof(d)); printf("sizeof(double) is %u\n", sizeof(double)); printf("sizeof(42.0) is %u\n", sizeof(42.0)); printf("\n");	sizeof(i) is 4 sizeof(int) is 4 sizeof(42) is 4 sizeof(f) is 4 sizeof(float) is 4 sizeof(42.0F) is 4 sizeof(d) is 8 sizeof(double) is 8 sizeof(42.0) is 8

Note that the sizeof operator doesn't use the values of the variables. It's just looking at the types. (Which is why using uninitialized variables is not a problem.)

Because all of the values in an array have the same type, an array is called a homogeneous data structure. Incidentally, there is also an aggregate data type that consists of multiple values that are not the same type (heterogeneous) called a struct (structure).

Code	Output
int touchdowns[10]; float distances[20]; double temperatures[2]; printf("sizeof(touchdowns) is %u\n", sizeof(touchdowns)); printf("sizeof(distances) is %u\n", sizeof(distances)); printf("sizeof(temperatures) is %u\n", sizeof(temperatures));	sizeof(touchdowns) is 40 sizeof(distances) is 80 sizeof(temperatures) is 16

Assigning values to each element	Printing out the values
int a[10]; int i; for (i = 0; i < 10; i++) a[i] = i * 2;	for (i = 0; i < 10; i++) printf("%i ", a[i]); Output: 0 2 4 6 8 10 12 14 16 18

Code	Possible memory layout
int i; int a[10]; / initialize elements WRONG / for (i = 0; i <= 10; i++) a[i] = 0;

`for`	`while`	`while` (compact)
int i; int a[10]; for (i = 0; i < 10; i++) a[i] = 0;	int i = 0; int a[10]; while (i < 10) { a[i] = 0; i++; }	int i = 0; int a[10]; while (i < 10) a[i++] = 0;

Example: Given a date in the form of month/day, print out the day of the year. For example:

This example works as expected (It finds the largest element in an array of integers). A pointer to the first element is passed to the function find_largest, along with the size (number of elements) in the array. Since the function has a pointer to the first element, it can access every element of the array using the subscript operator:

Now, the function is modifying the original data (not a copy). It seems that passing a pointer to the data, rather than a copy, could be dangerous if the function does something we don't expect.


int main(void) { int a[] = {4, 5, 3, 9, 5, 2, 7, 6}; int largest; printf("Array before:\n"); print_array(a, 8); largest = find_largest(a, 8); printf("Largest value: %i\n", largest); printf("Array after:\n"); print_array(a, 8); return 0; } Output: Array before: 4 5 3 9 5 2 7 6 Largest value: 9 Array after: 4 5 3 9 5 2 7 6	/ Assumes at least one element in the array / int find_largest(int a[], int size) { int i; int max = a[0]; / assume first is largest / for (i = 1; i < size; i++) if (a[i] > max) max = a[i]; / found a larger one / return max; }

Passing a pointer to a function allows the function to modify our data even if we don't want it to:


/ Modifies the array the was passed in!! / int find_largest_BAD(int a[], int size) { int i; int max = a[0]; / assume 1st is largest / a[0] = 0; / change first element! / for (i = 1; i < size; i++) { if (a[i] > max) max = a[i]; / found a larger one / a[i] = 0; / set element to 0!! / } return max; }	Output: Array before: 4 5 3 9 5 2 7 6 Largest value: 9 Array after: 0 0 0 0 0 0 0 0

Multidimensional Arrays

An array with more than one dimension is called a multidimensional array.

int matrix[5][10];  /* array of 5 arrays of 10 int; a 5x10 array of int */

Building up multidimensional arrays:

int a;            /* int                                     */
int b[10];        /* array of 10 int                         */
int c[5][10];     /* array of 5 arrays of 10 int             */
int d[3][5][10];  /* array of 3 arrays of 5 arrays of 10 int */

int e[10][5][3];  /* array of 10 arrays of 5 arrays of 3 int */

Storage order
Arrays in C are stored in row major order. This means that the rightmost subscript varies the most rapidly.

Given the declaration of points:

double points[3][4];

An array of 3 arrays of 4 doubles
A 3x4 array of doubles

We could diagram the arrays like this:

With details:

Or draw it contiguously (as it really is in memory):

 

Or horizontally:

Giving concrete values to the 2D array of doubles will help visualize the arrays. Note how the initialization syntax helps us visualize the "array of arrays" notion:

double points[3][4] = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10, 11, 12}};

or even formatted as a 3x4 matrix:

double points[3][4] = { 
                        {1.0,  2.0,  3.0,  4.0}, 
                        {5.0,  6.0,  7.0,  8.0}, 
                        {9.0, 10.0, 11.0, 12.0}
                      };

Diagram:

Some expressions involving points:

Addresses                             Type
-------------------------------------------------------------
points        = 0012F5A4    An array of 3 arrays of 4 doubles
&points       = 0012F5A4    A pointer to an array of 3 arrays of 4 doubles
points[0]     = 0012F5A4    An array of 4 doubles
&points[0]    = 0012F5A4    A pointer to an array of 4 doubles
*points       = 0012F5A4    An array of 4 doubles
&points[0][0] = 0012F5A4    A pointer to a double

Contents
------------------------
**points      = 1.000000
*points[0]    = 1.000000
points[0][0]  = 1.000000

Sizes
-------------------------
sizeof(points)       = 96
sizeof(*points)      = 32
sizeof(**points)     =  8
sizeof(points[0])    = 32
sizeof(points[0][0]) =  8

Code to display above tables.

Accessing Elements in a 2-D Array

short matrix[3][8]; /* 24 shorts, 3x8 array */

matrix

  matrix[0]
*(matrix + 0)
   *matrix

  matrix[1]
*(matrix + 1)

  matrix[2]
*(matrix + 2)

     matrix[1][2]
*(*(matrix + 1) + 2)

Remember the rule:

array[i] == *(array + i)

where:

array is an array of any type
i is any integer expression

With multidimensional arrays, the rule becomes:

array[i][j] == *(*(array + i) + j)
array[i][j][k] == *(*(*(array + i) + j) + k)
etc...

Pointer arithmetic is used to locate each element. (Base address + Offset)

Given this declaration:

short matrix[3][8];

The value of sizeof varies with the argument:

Sizes
-------------------------
sizeof(matrix)       = 48   ; entire matrix
sizeof(matrix[0])    = 16   ; first row
sizeof(matrix[1])    = 16   ; second row
sizeof(matrix[0][0]) = 2    ; first short element

Passing 2D Arrays to Functions

Putting values in the matrix and printing it:

Fill3x8Matrix(matrix);  /* Put values in the matrix */
Print3x8Matrix(matrix); /* Print the matrix         */

Implementations:

void Fill3x8Matrix(short matrix[][8])
{
  int i, j;
  for (i = 0; i < 3; i++)
    for (j = 0; j < 8; j++)
      matrix[i][j] = i * 8 + j + 1; 
}

void Print3x8Matrix(short matrix[][8])
{
  int i, j;
  for (i = 0; i < 3; i++)
    for (j = 0; j < 8; j++)
      printf("%i ", matrix[i][j]);
  printf("\n");
}

These functions could have specified the parameters this way: (precedence chart)

void Fill3x8Matrix(short (*matrix)[8])
void Print3x8Matrix(short (*matrix)[8])

Why are they not declared like this?:

void Fill3x8Matrix(short matrix[][]);
void Print3x8Matrix(short matrix[][]);

The compiler needs to know the size of each element in each dimension. It doesn't need to (and can't) know the number of elements in the first dimension. The size of each element in the first dimension is determined by the other dimensions and the type of the elements.

void Test(int a[], int b[][6], int c[][3][5])
{
  printf("a = %p, b = %p, c = %p\n", (void *)a, (void *)b, (void *)c);
  a++;
  b++;
  c++;
  printf("a = %p, b = %p, c = %p\n", (void *)a, (void *)b, (void *)c);
}

Output:
a = 0012FEE8, b = 0012FF38, c = 0012FEFC  
a = 0012FEEC, b = 0012FF50, c = 0012FF38

In decimal:

Output:
a = 1244904, b = 1244984, c = 1244924
a = 1244908, b = 1245008, c = 1244984

The function Test is equivalent to this:

void Test(int *a, int (*b)[6], int (*c)[3][5])

Other methods for filling the matrix use explicit pointer arithmetic:

void Fill3x8Matrix(short matrix[][8])
{
  int i, j;
  for (i = 0; i < 3; i++)
    for (j = 0; j < 8; j++)
      *(*(matrix + i) + j) = i * 8 + j + 1; 
}

void Fill3x8Matrix(short matrix[][8])
{
  int i, j;
  for (i = 0; i < 3; i++)
  {
    short *pmat = *(matrix + i);
    for (j = 0; j < 8; j++)
      *pmat++ = i * 8 + j + 1;
  }
}

Assuming this definition:

short matrix[3][8]; /* 24 shorts, 3x8 array */

and then filled with values:

How does the compiler calculate the address (offset) for the element below?

matrix[1][2];

Using address offsets we get:

&matrix[1][2] ==> &*(*(matrix + 1) + 2) ==> *(matrix + 1) + 2

matrix is an array of 3 arrays of 8 shorts.

matrix + 1 is a pointer to the second row.
*(matrix + 1) is the second row (an array of 8 shorts).
*(matrix + 1) + 2 is a pointer to the 3rd element in the second row.
*(*(matrix + 1) + 2) is the 3rd element in the second row (same as matrix[1][2]).

First dimension - Each element of matrix is an array of 8 shorts, so each element is 16 bytes.
Second dimension - Each element of each element of matrix is a short, so it's 2 bytes.

Given these declarations:

short matrix[3][8]        
short array[10]

We can calculate the size of any portion:

Expression             Meaning                 Size (bytes)
-----------------------------------------------------------
array                Entire array                  20
array[]              Element in 1st dimension       2
matrix               Entire array                  48
matrix[]             Element in 1st dimension      16
matrix[][]           Element in 2nd dimension       2

Recap:

The compiler needs to know the size of each of the elements, in each dimension.
Since the size of each dimension relies on the fundamental type (int, double, etc.) of the array(s), there is an implicit size specified.
In a two-dimensional array, knowing the size of the second dimension (number of columns) and the data type of the array is sufficient to perform pointer arithmetic on the first dimension.
This seemingly convoluted way of locating array elements is required since memory is laid out in one dimension by the compiler. The multiple dimension syntax (e.g. [][]) is just a convenience for the programmer.

Dynamically Allocated 2D Arrays

Recall the 2D points static array and how a dynamically allocated array would look:

double points[3][4];

double *pd = malloc(3 * 4 * sizeof(double));

Given a row and column:

int row = 1, column = 2;
double value;

The static 2D array can be accessed using subscripts, but the dynamic "2D array" can only be indexed with a single subscript.
```
value = points[row][column]; /* OK      */
value = pd[row][column];     /* ILLEGAL */
```
We (the programmers) have to do all of the arithmetic to locate an element using two subscripts:
```
value = pd[row * 4 + column];
```

The compiler is still doing some of the work for us:

value = *(address-of-pd + (row * 4 + column) * sizeof(double));

What does the number 4 in the above calculations represent?

If we want to use two subscripts on a dynamic 2D array, we have to set things up a little differently.

Using these definitions from above:

#define ROWS 3
#define COLS 4
double *pd = malloc(ROWS * COLS * sizeof(double));

Create a variable that is a pointer to a pointer to a double

double **ppd;

Allocate an array of 3 (ROWS) pointers to doubles and point ppd at it:

ppd = malloc(ROWS * sizeof(double *));

Point each element of ppd at an array of 4 doubles:

ppd[0] = pd;
ppd[1] = pd + 4;
ppd[2] = pd + 8;

Of course, for a large array, or an array whose size is not known at compile time, you would want to set these in a loop:

int row;
for (row = 0; row < ROWS; row++)
  ppd[row] = pd + (COLS * row);

This yields the diagram:

Given a row and column, we can access elements through the single pointer or double pointer variable:

int row = 1, column = 3;
double value;

  /* Access via double pointer using subscripting */
value = ppd[row][column];            

  /* Access via single pointer using pointer arithmetic        */
  /* and/or subscripting. These statements are all equivalent. */
value = pd[row * COLS + column];
value = *(pd + row * COLS + column);
value = (pd + row * COLS)[column];