Arrays

"... the results are undefined, and we all know what 'undefined' means: it means it works during development, it works during testing, and it blows up in your most important customers' faces." -- Scott Meyers

The sizeof Operator

C has an operator that returns the size (in bytes) of a value (expression) or type. It's a unary operator and it is listed in the precedence chart.

General form is:

sizeof(expression or type)

Examples:

CodeOutput
int i;    /* uninitialized */
float f;  /* uninitialized */
double d; /* uninitialized */

printf("sizeof(i) is %u\n", sizeof(i));
printf("sizeof(int) is %u\n", sizeof(int));
printf("sizeof(42) is %u\n", sizeof(42));
printf("\n");

printf("sizeof(f) is %u\n", sizeof(f));
printf("sizeof(float) is %u\n", sizeof(float));
printf("sizeof(42.0F) is %u\n", sizeof(42.0F));
printf("\n");

printf("sizeof(d) is %u\n", sizeof(d));
printf("sizeof(double) is %u\n", sizeof(double));
printf("sizeof(42.0) is %u\n", sizeof(42.0));
printf("\n");




sizeof(i) is 4
sizeof(int) is 4
sizeof(42) is 4


sizeof(f) is 4
sizeof(float) is 4
sizeof(42.0F) is 4


sizeof(d) is 8
sizeof(double) is 8
sizeof(42.0) is 8

Notice the format specifier is %u. That's used to print an unsigned integer. This is for 32-bit systems. On 64-bit systems (almost all systems today), the sizeof operator returns an unsigned long integer, so you must use %lu to print it. That's a lowercase 'L' before the 'u'.

You can think of these sizeof expressions:

sizeof(i)
sizeof(int)
as this (for variables):
the_number_of_bytes_this_requires(i)
or (for types)
the_number_of_bytes_this_requires(int)
The sizeof operator is also unique in that it can determine the value at compile time. The program does not need to be executed to obtain the results. Also, the operand to the sizeof operator can be an expression. For example, given these declarations:

int i;    /* uninitialized */
int j;    /* uninitialized */
double d; /* uninitialized */
This is the result of applying the sizeof operator to these expressions:
sizeof(i + j) is 4            (int)
sizeof(i * j - 20) is 4       (int)
sizeof(i * j * d) is 8        (double)
sizeof(i + 2.0) is 8          (double)
sizeof(10 + 3.2F - 5.4) is 8  (double)

Note that the sizeof operator doesn't use the values of the variables. It's just looking at the types. (Which is why using uninitialized variables is not a problem.)

One-Dimensional Arrays

An array is an aggregate data type. This means that it consists of multiple values, all of which are the same type. Contrast this to scalar data types like float and int, which are single values. Each value in an array is called an element.

Because all of the values in an array have the same type, an array is called a homogeneous data structure. Incidentally, there is also an aggregate data type that consists of multiple values that are not the same type (heterogeneous) called a struct (structure).

Arrays are used when you have lots of related data. For example, if there are 100 students in a class, it is much easier to deal with one array with 100 elements (one element per student), rather than dealing with 100 separate variables.

To declare an array, you must specify an additional piece of information: the size.

The general form is:

type identifier[positive_compile_time_integer_constant];
This declares an array of 10 integers. The array is called a and has room for 10 elements:
int a[10]; /* a is an array of 10 integers */
Visually, we can think of the array in memory like this:

Also, like local scalar variables, the values of the array are undefined:

Examples:
int touchdowns[10];     /* array of 10 integers, 40 bytes */
float distances[20];    /* array of 20 floats, 80 bytes   */
double temperatures[2]; /* array of 2 doubles, 16 bytes   */
Visually:


The sizeof operator will also determine number of bytes required for an array. Examples:

CodeOutput
int touchdowns[10];
float distances[20];
double temperatures[2];

printf("sizeof(touchdowns) is %u\n", sizeof(touchdowns));
printf("sizeof(distances) is %u\n", sizeof(distances));
printf("sizeof(temperatures) is %u\n", sizeof(temperatures));




sizeof(touchdowns) is 40
sizeof(distances) is 80
sizeof(temperatures) is 16

Technically speaking, these are called static arrays because the size is determined at compile-time and their size can never change (i.e. static). Later, we will learn about dynamic arrays. Dynamic arrays can change their sizes (grow and shrink) at runtime.

Accessing Array Elements



Most work with arrays is done with some kind of looping construct:
Assigning values to each elementPrinting out the values
int a[10];
int i;
for (i = 0; i < 10; i++)
  a[i] = i * 2;
for (i = 0; i < 10; i++)
  printf("%i  ", a[i]);

  
Output: 0  2  4  6  8  10  12  14  16  18

It is crucial that you understand that there is absolutely, positively no boundary checking when reading/writing an array. Your program is completely undefined in the event you read/write out of bounds (even if it appears to work correctly).

Writing past the end of the array:
int a[10];
int i;
for (i = 0; i < 15; i++)
  a[i] = i * 2;
The output at runtime:
     21 [main] a 3672 _cygtls::handle_exceptions: Exception: STATUS_ACCESS_VIOLATION
    546 [main] a 3672 open_stackdumpfile: Dumping stack trace to a.exe.stackdump
     21 [main] a 3672 _cygtls::handle_exceptions: Exception: STATUS_ACCESS_VIOLATION
    546 [main] a 3672 open_stackdumpfile: Dumping stack trace to a.exe.stackdump
  63187 [main] a 3672 _cygtls::handle_exceptions: Exception: STATUS_ACCESS_VIOLATION
  68303 [main] a 3672 _cygtls::handle_exceptions: Error while dumping state (probably corrupted stack)
Note that undefined means just that. This similar code may or may not result in an infinite loop:

Video review

CodePossible memory layout
int i;
int a[10];

  /* initialize elements WRONG */
for (i = 0; i <= 10; i++)
  a[i] = 0;

Arrays and looping go hand-in-hand.
forwhilewhile (compact)
int i;
int a[10];

for (i = 0; i < 10; i++)
  a[i] = 0;
int i = 0;
int a[10];

while (i < 10)
{
  a[i] = 0;
  i++;
}
int i = 0;
int a[10];

while (i < 10)
  a[i++] = 0;
Unlike scalar types, you can't assign one array to another. This confuses many new C programmers. Instead, you must "manually" copy each element from one array to the other:
void assign1(void)
{
  #define SIZE 10

  int a[SIZE]; /* 10 integers */
  int b[SIZE]; /* 10 integers */
  int i;

    /* set elements to i squared: 0, 1, 4, 9, 16, etc... */
  for (i = 0; i < SIZE; i++)
    b[i] = i * i; 

    /* Assign elements of b to a (This is not legal) */
  a = b;

    /* Assign elements of b to a (This is how to assign arrays) */
  for (i = 0; i < SIZE; i++)
    a[i] = b[i];
}
Errors:
error: assignment to expression with array type
   a = b;
     ^
or
error: array type 'int [10]' is not assignable
  a = b;
  ~ ^
1 error generated.

Initializing Arrays

When we declare an array and provide its size, the array is called a static array. The size of the array is set in stone and will never change. Ever.

We can perform static initialization of an array:

void some_function(void)
{
  int array1[5] = {1, 2, 3, 4, 5};  /* All elements are initialized */
  ...
}
In partial initialization, if we "run out" of initializers, the remaining elements are set to 0 (they are NOT undefined):
int array3[5] = {1, 2, 3};          /* 1, 2, 3, 0, 0 */
However, it is an error to provide too many initializers:
int array4[5] = {1, 2, 3, 4, 5, 6}; /* error: too many initializers */
warning: excess elements in array initializer
   int a[2] = {1, 2, 3};
                     ^
C has a convenient feature that allows us to leave the size of the array empty. The compiler automatically fills in the size based on the number of initializers provided:
int array5[] = {1, 2, 3};          /* array5 has 3 elements */
int array6[] = {1, 2, 3, 4, 5, 6}; /* array6 has 6 elements */
A very convenient (and efficient) way to initialize all elements to 0:
int array3[50] = {0};             /* All 50 elements (0 through 49) are set to 0 */


Example: Given a date in the form of month/day, print out the day of the year. For example:

Day of Year for 1/1 is 1
Day of Year for 2/1 is 32
Day of Year for 5/13 is 133
Day of Year for 12/31 is 365
Not using arrays:
void DayOfYear(void)
{
    /* The number of days in each month, assume non-leap year */
  int jan = 31;  int feb = 28;  int mar = 31;
  int apr = 30;  int may = 31;  int jun = 30;
  int jul = 31;  int aug = 31;  int sep = 30;
  int oct = 31;  int nov = 30;  int dec = 31;
  
  int month, day; /* Current month and day */
  int count = 0;  /* Total count of days   */

    /* Prompt the user for month/day */
  printf("Enter a date (m/d): ");
  scanf("%d/%d", &month, &day);

    /*
     * Add up the number of days in the previous months
     * The break statements are intentionally missing!
     */
  switch (month)
  {
    case 12: 
      count += nov;
    case 11: 
      count += oct;
    case 10: 
      count += sep;
    case  9: 
      count += aug;
    case  8: 
      count += jul;
    case  7: 
      count += jun;
    case  6: 
      count += may;
    case  5: 
      count += apr;
    case  4: 
      count += mar;
    case  3: 
      count += feb;
    case  2: 
      count += jan;

    default: 
      count += day; /* Add in this month's days */
  }

    /* Format and print out the results */
  printf("The date %i/%i is day number %i\n", month, day, count);
}
The same problem using an array:
void DayOfYear(void)
{
    /* The number of days in each month, assume non-leap year */
  int months[] = {31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31};

  int month;     /* Current month       */
  int day;       /* Current day         */
  int count = 0; /* Total count of days */
  int i;         /* Loop counter        */

    /* Prompt the user for month/day */
  printf("Enter a date (mm/dd): ");
  scanf("%d/%d", &month, &day);

    /* Add up the days in previous months */
  for (i = 0; i < month - 1; i++)
    count += months[i];

    /* Add in this month's days */
  count += day;

    /* Format and print out the results */
  printf("The date %i/%i is day number %i\n", month, day, count);
}
One final note about arrays: If you initialize array elements, the initializers must be constant expressions, i.e. their values must be known at compile-time.
#include <stdio.h>

int foo(int a, int  b)
{
  return a + b;
}

int main(void)
{
    /* Not allowed in C89/90 */
  int array[3] = {1, 2, foo(3, 4)};

  return array[0];
}
Warnings from gcc:
warning: initializer element is not computable at load time [-Wpedantic]
Warnings from clang:
warning: initializer for aggregate is not a compile-time constant [-Wc99-extensions]
Warnings from Microsoft's compiler:
warning C4204: nonstandard extension used: non-constant aggregate initializer

Passing Arrays to Functions

Arrays in C are treated differently than other types:

Let's see how this might be a problem.

This example works as expected (It finds the largest element in an array of integers). A pointer to the first element is passed to the function find_largest, along with the size (number of elements) in the array. Since the function has a pointer to the first element, it can access every element of the array using the subscript operator:

int main(void)
{
  int a[] = {4, 5, 3, 9, 5, 2, 7, 6};
  int largest;

  printf("Array before:\n");
  print_array(a, 8);

  largest = find_largest(a, 8);
  printf("Largest value: %i\n", largest);

  printf("Array after:\n");
  print_array(a, 8);

  return 0;
}

Output:
Array before:
    4    5    3    9    5    2    7    6
Largest value: 9
Array after:
    4    5    3    9    5    2    7    6
/* Assumes at least one element in the array */
int find_largest(int a[], int size)
{
  int i;
  int max = a[0]; /* assume first is largest */

  for (i = 1; i < size; i++)
    if (a[i] > max) 
      max = a[i];  /* found a larger one */

  return max;
}
Let's modify the function to do something unexpected.

Now, the function is modifying the original data (not a copy). It seems that passing a pointer to the data, rather than a copy, could be dangerous if the function does something we don't expect.

Passing a pointer to a function allows the function to modify our data even if we don't want it to:

/* Modifies the array the was passed in!! */
int find_largest_BAD(int a[], int size)
{
  int i;
  int max = a[0]; /* assume 1st is largest */
  a[0] = 0;       /* change first element! */
  for (i = 1; i < size; i++)
  {
    if (a[i] > max) 
      max = a[i];  /* found a larger one */
    a[i] = 0;      /* set element to 0!! */
  }
  return max;
}
Output:
Array before:
    4    5    3    9    5    2    7    6
Largest value: 9
Array after:
    0    0    0    0    0    0    0    0
There is a solution to this problem. Going back to the original non-const version of this function:
int find_largest(int a[], int size);
If we really don't want our array to change, we declare it with the const keyword:
int main(void)
{
  const int a[] = {4, 5, 3, 9, 5, 2, 7, 6}; /* Elements are constant (can't be changed)  */
  int largest = find_largest(a, 8);         /* ILLEGAL: Function expects non-const array */
  
  return 0;
}
Compiler error:
warning: passing arg 1 of 'find_largest' discards qualifiers from pointer target type

Important: When you create functions that will accept arrays as parameters, be sure to mark them as const if you do not intend to modify them. If you don't make them const, a lot of code will not be able to use your functions.

Multidimensional Arrays

An array with more than one dimension is called a multidimensional array.
int matrix[5][10];  /* array of 5 arrays of 10 int; a 5x10 array of int */
Building up multidimensional arrays:
int a;            /* int                                     */
int b[10];        /* array of 10 int                         */
int c[5][10];     /* array of 5 arrays of 10 int             */
int d[3][5][10];  /* array of 3 arrays of 5 arrays of 10 int */

int e[10][5][3];  /* array of 10 arrays of 5 arrays of 3 int */
Storage order
Arrays in C are stored in row major order. This means that the rightmost subscript varies the most rapidly.

Given the declaration of points:

double points[3][4];
We could diagram the arrays like this:

With details:

Or draw it contiguously (as it really is in memory):
 
Or horizontally:


Giving concrete values to the 2D array of doubles will help visualize the arrays. Note how the initialization syntax helps us visualize the "array of arrays" notion:

double points[3][4] = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10, 11, 12}};
or even formatted as a 3x4 matrix:
double points[3][4] = { 
                        {1.0,  2.0,  3.0,  4.0}, 
                        {5.0,  6.0,  7.0,  8.0}, 
                        {9.0, 10.0, 11.0, 12.0}
                      };
Diagram:

Some expressions involving points:
Addresses                             Type
-------------------------------------------------------------
points        = 0012F5A4    An array of 3 arrays of 4 doubles
&points       = 0012F5A4    A pointer to an array of 3 arrays of 4 doubles
points[0]     = 0012F5A4    An array of 4 doubles
&points[0]    = 0012F5A4    A pointer to an array of 4 doubles
*points       = 0012F5A4    An array of 4 doubles
&points[0][0] = 0012F5A4    A pointer to a double

Contents
------------------------
**points      = 1.000000
*points[0]    = 1.000000
points[0][0]  = 1.000000

Sizes
-------------------------
sizeof(points)       = 96
sizeof(*points)      = 32
sizeof(**points)     =  8
sizeof(points[0])    = 32
sizeof(points[0][0]) =  8
Code to display above tables.

Accessing Elements in a 2-D Array

short matrix[3][8]; /* 24 shorts, 3x8 array */
matrix
  matrix[0]
*(matrix + 0)
   *matrix
  matrix[1]
*(matrix + 1)
  matrix[2]
*(matrix + 2)
     matrix[1][2]
*(*(matrix + 1) + 2)
Remember the rule:
array[i] == *(array + i)
where: With multidimensional arrays, the rule becomes:
array[i][j] == *(*(array + i) + j)
array[i][j][k] == *(*(*(array + i) + j) + k)
etc...
Pointer arithmetic is used to locate each element. (Base address + Offset)

Given this declaration:

short matrix[3][8];
The value of sizeof varies with the argument:
Sizes
-------------------------
sizeof(matrix)       = 48   ; entire matrix
sizeof(matrix[0])    = 16   ; first row
sizeof(matrix[1])    = 16   ; second row
sizeof(matrix[0][0]) = 2    ; first short element

Passing 2D Arrays to Functions

Putting values in the matrix and printing it:
Fill3x8Matrix(matrix);  /* Put values in the matrix */
Print3x8Matrix(matrix); /* Print the matrix         */
Implementations:
void Fill3x8Matrix(short matrix[][8])
{
  int i, j;
  for (i = 0; i < 3; i++)
    for (j = 0; j < 8; j++)
      matrix[i][j] = i * 8 + j + 1; 
}

void Print3x8Matrix(short matrix[][8])
{
  int i, j;
  for (i = 0; i < 3; i++)
    for (j = 0; j < 8; j++)
      printf("%i ", matrix[i][j]);
  printf("\n");
}
These functions could have specified the parameters this way: (precedence chart)
void Fill3x8Matrix(short (*matrix)[8])
void Print3x8Matrix(short (*matrix)[8])
Why are they not declared like this?:
void Fill3x8Matrix(short matrix[][]);
void Print3x8Matrix(short matrix[][]);

The compiler needs to know the size of each element in each dimension. It doesn't need to (and can't) know the number of elements in the first dimension. The size of each element in the first dimension is determined by the other dimensions and the type of the elements.

void Test(int a[], int b[][6], int c[][3][5])
{
  printf("a = %p, b = %p, c = %p\n", (void *)a, (void *)b, (void *)c);
  a++;
  b++;
  c++;
  printf("a = %p, b = %p, c = %p\n", (void *)a, (void *)b, (void *)c);
}

Output:
a = 0012FEE8, b = 0012FF38, c = 0012FEFC  
a = 0012FEEC, b = 0012FF50, c = 0012FF38  
In decimal:
Output:
a = 1244904, b = 1244984, c = 1244924
a = 1244908, b = 1245008, c = 1244984
The function Test is equivalent to this:
void Test(int *a, int (*b)[6], int (*c)[3][5])
Other methods for filling the matrix use explicit pointer arithmetic:
void Fill3x8Matrix(short matrix[][8])
{
  int i, j;
  for (i = 0; i < 3; i++)
    for (j = 0; j < 8; j++)
      *(*(matrix + i) + j) = i * 8 + j + 1; 
}

void Fill3x8Matrix(short matrix[][8])
{
  int i, j;
  for (i = 0; i < 3; i++)
  {
    short *pmat = *(matrix + i);
    for (j = 0; j < 8; j++)
      *pmat++ = i * 8 + j + 1;
  }
}
Assuming this definition:
short matrix[3][8]; /* 24 shorts, 3x8 array */
and then filled with values:

How does the compiler calculate the address (offset) for the element below?
matrix[1][2];
Using address offsets we get:
&matrix[1][2] ==> &*(*(matrix + 1) + 2) ==> *(matrix + 1) + 2
  1. matrix is an array of 3 arrays of 8 shorts.
  2. First dimension - Each element of matrix is an array of 8 shorts, so each element is 16 bytes.
  3. Second dimension - Each element of each element of matrix is a short, so it's 2 bytes.
Given these declarations:
short matrix[3][8]        
short array[10]
We can calculate the size of any portion:
Expression             Meaning                 Size (bytes)
-----------------------------------------------------------
array                Entire array                  20
array[]              Element in 1st dimension       2
matrix               Entire array                  48
matrix[]             Element in 1st dimension      16
matrix[][]           Element in 2nd dimension       2
Recap:

Dynamically Allocated 2D Arrays

Recall the 2D points static array and how a dynamically allocated array would look:

double points[3][4];


double *pd = malloc(3 * 4 * sizeof(double));

Given a row and column:

int row = 1, column = 2;
double value;
If we want to use two subscripts on a dynamic 2D array, we have to set things up a little differently.


Using these definitions from above:

#define ROWS 3
#define COLS 4
double *pd = malloc(ROWS * COLS * sizeof(double));
Create a variable that is a pointer to a pointer to a double
double **ppd;
Allocate an array of 3 (ROWS) pointers to doubles and point ppd at it:
ppd = malloc(ROWS * sizeof(double *));

Point each element of ppd at an array of 4 doubles:
ppd[0] = pd;
ppd[1] = pd + 4;
ppd[2] = pd + 8;
Of course, for a large array, or an array whose size is not known at compile time, you would want to set these in a loop:
int row;
for (row = 0; row < ROWS; row++)
  ppd[row] = pd + (COLS * row);
This yields the diagram:

Given a row and column, we can access elements through the single pointer or double pointer variable:
int row = 1, column = 3;
double value;

  /* Access via double pointer using subscripting */
value = ppd[row][column];            

  /* Access via single pointer using pointer arithmetic        */
  /* and/or subscripting. These statements are all equivalent. */
value = pd[row * COLS + column];
value = *(pd + row * COLS + column);
value = (pd + row * COLS)[column];