Visibility and Lifetime

Introduction

Video review

When we declare/define a symbol (e.g. variable, constant, function, etc.), where and how we declare it determines its scope. Let's look at a simple example that everyone should understand:

/* file1.c */
#include <stdio.h>

/* This function is visible/accessible to other functions in this file */
/* as well as functions in other files. It is a global function.       */
int add(int a, int b) /* a and b are visible only in this function */
{
  int local_var1 = a + b; /* visible only in this function */

  return local_var1;
}
/* add's local_var1, a and b are not accessible here. */

/* main is global and MUST be global because it is called from elsewhere */
int main(void)
{
  int local_var1;     /* visible anywhere within main, uninitialized    */ 
  int local_var2 = 5; /* visible anywhere within main, initialized to 5 */

  for (local_var1 = 0; local_var1 < 5; local_var1++)
  {
    int local_var3; /* visible only in this loop, uninitialized */

    local_var3 = add(local_var1, local_var2);

    printf("%i + %i is %i\n", local_var1, local_var2, local_var3);
  }

  /* local_var3 is not accessible here */

  return 0;
}

/* main's local_var1 and local_var2 are not accessible here. */

/* The global function add is still accessible here. */
Compiling and executing:
gcc -Wall -Wextra -ansi -pedantic file1.c -o file1 && ./file1

0 + 5 is 5
1 + 5 is 6
2 + 5 is 7
3 + 5 is 8
4 + 5 is 9
Putting the add function and main into separate files:
/* main.c */
#include <stdio.h>

/* prototype, add is defined in another file (functions.c) */
int add(int a, int b);

/* main is global and MUST be global because it is called from elsewhere */
int main(void)
{
  int local_var1;     /* visible anywhere within main, uninitialized    */ 
  int local_var2 = 5; /* visible anywhere within main, initialized to 5 */

  for (local_var1 = 0; local_var1 < 5; local_var1++)
  {
    int local_var3; /* visible only in this loop, uninitialized */

    local_var3 = add(local_var1, local_var2);

    printf("%i + %i is %i\n", local_var1, local_var2, local_var3);
  }

  /* local_var3 is not accessible here */

  return 0;
}
/* functions.c */

/* This function is visible/accessible to other functions in this file */
/* as well as functions in other files. It is a global function.       */
int add(int a, int b)
{
  int local_var1 = a + b; /* visible only in this function */

  return local_var1;
}
Now, we have to specify both files when building the program:
gcc -Wall -Wextra -ansi -pedantic main.c functions.c -o prog
None of this information is new. We've been doing things like this for a while. Only the most trivial programs will have all of its code in a single file. Most of the time, we will have multiple files and we need to be able to access the functions across the files.

Linkage: External vs. Internal vs. None

The technical term for the accessiblity of a symbol (e.g. function, variable, constant, etc.) is its linkage, which is either external, internal, or none: We've already seen how to make a function global (i.e. external linkage). This is done by defining it outside of another function. The add function is defined outside of other functions, so by default, it has external linkage.

It's interesting to note that C does not allow local functions, i.e. functions defined inside other functions. Some languages, like Pascal, Ada, and Python do allow this. (Newer versions of C# also allow this.) However, the GNU C compiler (not C++) has an extension that supports nested functions. For example:

/* nested.c  Compile without -pedantic */
#include <stdio.h>

int main(void)
{
  int factor = 10; /* Visible every where in main */

    /* Local or nested function, not allowed in standard C.  */
    /* The gcc compiler does support nested functions.       */
    /* Neither Clang nor Microsoft's compiler supports them. */
  int calculate(int a, int b)
  {
    int c = a + b; /* Local to this nested function */

    return c * factor;
  }

    /* Call local function and print result */
  printf("Calculated: %i\n", calculate(3, 5));

  return 0;
}
Here's an example of how nested functions could be useful. This is especially useful if the nested function is never going to be used by any other code.
void PrintInts(int array[], int size)
{
  int i;
  for (i = 0; i < size; i++)
    printf("%i ", array[i]);
  printf("\n");
}

void TestInts(void)
{
  int array[] = {5, 12, 8, 4, 23, 13, 15, 2, 13, 20};

    /* This comparison function is "private" to TestInts */
  int compare_int1(const void *arg1, const void *arg2)
  {
    return *(int *)arg1 - *(int *)arg2;
  }

  PrintInts(array, 10);                         /* print the array        */
  qsort(array, 10, sizeof(int), compare_int1);  /* sort the array         */
  PrintInts(array, 10);                         /* print the sorted array */
}

Output:
5 12 8 4 23 13 15 2 13 20
2 4 5 8 12 13 13 15 20 23
This document won't spend any more time on local variables, i.e. variables with no linkage, because they are straight-forward and easily understood. What will be covered in more detail is the other two types: external and internal.

Changing the Linkage with the extern and static Keywords

Going back to our add function:
/* This function has external linkage and is accessible to all  */
/* files/functions in the program. It is a global function.     */
int add(int a, int b)
{
  return a + b;
}
This is the default behavior for functions. Unless otherwise specified, functions have external linkage. There is a keyword, extern, which you can use to make this explicit:
/* The extern keyword explicitly marks this function */
/* as having external linkage, which is the default. */
extern int add(int a, int b)
{
  return a + b;
}
C programmers rarely, if ever, use this keyword with functions because it is redundant. The default for all functions is extern, so the keyword is generally omitted. So, all functions have external linkage. The question is, how do you specify that a function should have internal linkage? With the intern keyword? Sadly, as obvious as that sounds, that doesn't exist. It's the static keyword:
/* The static keyword marks this function as having internal     */
/* linkage. Only functions in this file can access the function. */
static int add(int a, int b)
{
  return a + b;
}
Building the program:
gcc -Wall -Wextra -ansi -pedantic main.c functions.c -o prog
leads to this linker error:
/tmp/cck78WC2.o: In function 'main':
main.c:(.text+0x23): undefined reference to 'add'
collect2: error: ld returned 1 exit status
The exact error message will vary depending on the linker and platform, but the one thing that will be the same is the "undefined reference" to the add function. This is because, with the static keyword, the function has internal linkage, making it only visible within the file (functions.c) where it is defined.

So, what's the main purpose of marking a function static? It's used when you don't intend for other files to access the function. Think helper functions.

Helper functions are functions that are not meant to be called from outside of the file they are defined in. They are only meant for other functions within the same file. Sure, they don't have to be made static, but, if they have external linkage (global), there's a higher chance that the name of the helper function will conflict with other global functions.

With small programs (i.e. beginning programmers), this is not usually a big deal. But, when you start having thousands or tens of thousands of functions accessible from your program (not that unlikely), you will get yourself into trouble. So, there is a simple rule-of-thumb:

If you ONLY need to access the function from within the file it is defined, mark it with the static keyword to keep it hidden/private to the file. If you intend to access the function from within the entire program (i.e. other files), don't use the static keyword.

Linkage and Non-Local Variables

As stated earlier, local variables have no linkage and are only accessible within the scope where they are defined. However, it is possible to have variables with external linkage (global) and internal linkage (file-scope). First, let's talk about file-scope variables first, as they are a little easier to understand.

Here's the simple header file:

/* geometry.h */
typedef struct GeometryResults
{
  double circle_area;
  double circle_circumference;
  double sphere_volume;
}GeometryResults;
Here's a file that calculates a few geometrical values:
/* file2.c */
#include "geometry.h" /* struct GeometryResults */

/* external linkage (global) */
const double PI = 3.1415926;

/* internal linkage (file-scope) */
static double area_of_circle(double radius)
{
  return PI * radius * radius;
}

/* internal linkage (file-scope) */
static double circumference_of_circle(double radius)
{
  return 2 * PI * radius;
}

/* internal linkage (file-scope) */
static double volume_of_sphere(double radius)
{
  return 4.0 / 3.0 * PI * radius * radius * radius;
}

/* external linkage (global) */
struct GeometryResults calculate_values(double radius)
{
  struct GeometryResults results;
  
  results.circle_area = area_of_circle(radius);
  results.circle_circumference = circumference_of_circle(radius);
  results.sphere_volume = volume_of_sphere(radius);

  return results;
}
And this is how we might want to use it:
/* file1.c */
#include <stdio.h>    /* printf          */
#include "geometry.h" /* GeometryResults */

/* external linkage (global) */
const double PI = 3.14;

/* prototype, defined in file2.c */
GeometryResults calculate_values(double radius);

/* helper function, not for use outside of this file */
static void print_results(const GeometryResults *results, double radius)
{
  printf("With a radius of %.2f:\n", radius);
  printf("----------------------\n");
  printf("Area of a circle is %.2f\n", results->circle_area);
  printf("Circumference of a circle is %.2f\n", results->circle_circumference);
  printf("Volume of a sphere is %.2f\n", results->sphere_volume);
}

int main(void)
{
  double radius;           /* radius used in all calculations */
  double height;           /* used for volume of a cone       */
  double cone_volume;      /* volume of a cone                */
  GeometryResults results; /* other geometric calculations    */  

  radius = 5.5;
  height = 10.0;
  cone_volume = PI * radius * radius * height / 3;

  printf("A cone with radius %.2f and height of %.2f has volume %.2f\n\n", 
          radius, height, cone_volume);

  results = calculate_values(radius);

  print_results(&results, radius);

  return 0;
}
Attempting to build the program:
gcc -Wall -Wextra -ansi -pedantic -g file1.c file2.c -o prog
results in this linker cryptic error message:
/tmp/ccsSdszg.o:(.rodata+0x0): multiple definition of 'PI'
/tmp/cc42WWRP.o:(.rodata+0x0): first defined here
collect2: error: ld returned 1 exit status
It's telling us the PI is defined twice, which was done on purpose to demonstrate how to avoid this message. These are the duplicated definitions:

In file1.c

/* external linkage (global) */
const double PI = 3.14;
In file2.c
/* external linkage (global) */
const double PI = 3.1415926;
This often happens when one programmer creates a global symbol in one file, and another programmer (maybe the same programmer?), creates a duplicate definition of the same symbol in another file. Also, it would still be an error even if the values assigned to PI were identical. By now, we know what the solution is: static

In both files, add the static keyword to the definition to give PI internal linkage:

In file1.c

/* internal linkage (file-scope) */
static const double PI = 3.14;
In file2.c
/* internal linkage (file-scope) */
static const double PI = 3.1415926;
Now, building and running works as expected:
gcc -Wall -Wextra -ansi -pedantic -g file1.c file2.c -o prog && ./prog 

A cone with radius 5.50 and height of 10.00 has volume 316.62

With a radius of 5.50:
----------------------
Area of a circle is 95.03
Circumference of a circle is 34.56
Volume of a sphere is 696.91
But, it seems that there's something not right with this. We have multiple definitions for PI. Even though each definition is internal to the file it is defined within, it just seems bad. Especially, since one file's definition has more precision than the other. That will likely lead to odd results at some point.

What we really want is to have just one definition and be able to use that one definition for all files. Let's go back to global functions to see what the solution is.

All functions (as well as all symbols), must obey the One Definition Rule (ODR). This is something we learned from the beginning and everyone should understand it pretty well. There can be exactly one definition of a function. That definition will exist in exactly one file.

All other files that wish to call the function must have a prototype (i.e. declaration) for that function. Remember, a function prototype does not include the body (i.e. curly braces and code). There are no limits to how many times you can prototype/declare a function, as long as they are all identical.

Going back to our original example with the add function, the definition was in a file called functions.c

/* This function is visible/accessible to other functions in this file */
/* as well as functions in other files. It is a global function.       */
int add(int a, int b)
{
  return a + b;
}
and in main.c we have a prototype:
/* add is defined in another file */
int add(int a, int b);
This is pretty straight-forward and we've been doing this for a while now. So, the question becomes, "How do I create a single definition of a variable in one file and then prototype that variable in other files so I can use it?"

The answer is: with the extern keyword.

Here's a sample to demonstrate:

fileA.cfileB.c
/* fileA.c */
#include <stdio.h>

/* definition, external linkage (global) */
int a = 5;

/* prototype, it's defined in fileB.c */
void foo(void);

int main(void)
{
  printf("The value of a in main is %i\n", a);
  foo();

  return 0;
}
/* fileB.c */
#include <stdio.h>

/* definition, external linkage (global) */
void foo(void)
{
    /* Try to access a from fileA.c */
  printf("The value of a in foo is %i\n", a);
}
Attempting to build the program:
gcc -Wall -Wextra -ansi -pedantic -g fileA.c fileB.c -o prog
leads to this compiler error:
fileB.c: In function 'foo':
fileB.c:7:43: error: 'a' undeclared (first use in this function)
   printf("The value of a in foo is %i\n", a);
                                           ^
fileB.c:7:43: note: each undeclared identifier is reported only once for each function it appears in
This error makes complete sense. The compiler sees only one file at a time (fileB.c) and is unaware of the global variable a in fileA.c. The linker doesn't even get a chance to do its magic because the compilation fails.

Of course, if we try to "declare" the variable a in the foo function, it actually ends up hiding the global a from the other file:

void foo(void)
{
  int a; /* Try to "prototype" a from fileA.c */
    
  printf("The value of a in foo is %i\n", a);
}
Not only does it hide the one we want, it's an uninitialized local variable that the compiler warns about:
fileB.c: In function 'foo':
fileB.c:9:3: warning: 'a' is used uninitialized in this function [-Wuninitialized]
   printf("The value of a in foo is %i\n", a);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is the proper way to "declare" (NOT define) a variable in another file:
void foo(void)
{
    /* This is NOT a definition, it's a declaration. No space is      */
    /* allocated. The linker will figure out where the definition is. */
    /* This tells the compiler that the variable a exists elsewhere   */
    /* and to not emit any errors.                                    */
  extern int a;
  
  printf("The value of a in foo is %i\n", a);
}
The reason we need the extern keyword is so that the compiler can distinguish between a declaration and a definition. With functions, it's easy. If the function has a body, then it is a definition. If there is no body, it's a declaration. There is no ambiguity. The extern used with functions has no bearing on the declaration/definition difference.

If a variable has the extern keyword, then it is a declaration. It's just telling the compiler that the variable is defined elsewhere (so don't give an error message) and that the linker will figure out where it is. If there is no extern keyword, then it is a definition.

Like functions, you must have exactly one definition of the variable, but you can have as many declarations (using the extern keyword) that you want.

Unfortunately, there is a slight caveat in C:

fileA.cfileB.c
/* fileA.c */
#include <stdio.h>

/* 
 * External linkage (global). With no initializer this is
 * considered extern in C, but error in C++. If more than  
 * one are initialized, it's an error in C, as well. The
 * solution is to use extern on all but one. There is still
 * only one a in the program and the linker will make sure
 * that there is only one definition.
 */
int a;

int foo(void); /* prototype, it's defined in fileB.c */

int main(void)
{
  printf("The value of a in main is %i and address is %p\n", 
         a, (void *)&a);
  foo();

  return 0;
}
/* fileB.c */
#include <stdio.h>

/* 
 * External linkage (global). With no initializer this is
 * considered extern in C, but error in C++. If more than  
 * one are initialized, it's an error in C, as well. The
 * solution is to use extern on all but one. There is still
 * only one a in the program and the linker will make sure
 * that there is only one definition.
 */
int a;

/* definition, external linkage (global) */
void foo(void)
{
  printf("The value of a in foo is %i and address is %p\n",
         a, (void *)&a);
}
The output shows that there is only one a in the program:
The value of a in main is 0 and address is 0x601044
The value of a in foo is 0 and address is 0x601044
By the way, what is the value of a? Why didn't the compiler complain about using an uninitialized variable?

Going back to our original problem:

In file1.c

/* internal linkage (file-scope) */
static const double PI = 3.14;
In file2.c
/* internal linkage (file-scope) */
static const double PI = 3.1415926;
we want to put the extern keyword on one of these. We'll assume that the definition is in file2.c and that file1.c will have the extern keyword:

In file1.c

/* Declaration. Keeps the compiler happy. PI is defined in another file.  */
/* Do not initialize it with any value, or you will get a compiler error. */                                  */
extern const double PI;
In file2.c
/* Definition. External linkage (global), visible to entire program */
const double PI = 3.1415926;
Since the extern keyword is used with global variables, it's an advanced concept that is not necessary for beginners. However, as you write larger and more complex programs, you will need to become aware of it at some point.

With global variables, only one occurrence (the variable without the extern keyword) is allowed to have an initializer. All of the others (with the extern keyword) can not have any initializer.

file1.cfile2.cfile3.cfile4.c
int somevar = 10;extern int somevar;extern int somevar;extern int somevar;

Be careful not to do this:
fileA.cfileB.c
/* fileA.c */
#include <stdio.h>

extern int a;  /* Not a definition */

int foo(void); /* Not a definition */

int main(void)
{
  printf("The value of a in main is %i and
          address is %p\n", a, (void *)&a);
  foo();

  return 0;
}
/* fileB.c */
#include <stdio.h>

extern int a; /* Not a definition */

/* definition, external linkage (global) */
void foo(void)
{
  printf("The value of a in foo is %i and
          address is %p\n", a, (void *)&a);
}
This will lead to this helpful (linker) error message:
/tmp/ccIKnPSd.o: In function 'main':
fileA1.c:(.text+0x6): undefined reference to 'a'
fileA1.c:(.text+0xb): undefined reference to 'a'
/tmp/ccp26QUD.o: In function 'foo':
fileB1.c:(.text+0x6): undefined reference to 'a'
fileB1.c:(.text+0xb): undefined reference to 'a'
collect2: error: ld returned 1 exit status
Tip: If many files need to access a global variable, you should put the extern declaration in a header file and include that in the files that need access to it. Remember, you can't put definitions in header files, only declarations, so this is a valid technique.

Storage Classes

The previous discussions above focused on visibility and scope. Now, we're going to talk about storage classes. There are two parts to a storage class:
  1. where the object is located in memory
  2. the lifetime of the object
There are 4 keywords (auto, extern, static, register) that are related to an object's storage class and lifetime. Some of these only apply to local objects (i.e. variables defined within a function or smaller scope). As such, the explanations below are only relevant when the keyword is applied to a local variable, not a function.
  1. auto - This is the default when defining a (local) variable inside of a function or other local scope (i.e. between curly braces). This tells the compiler to put the object in memory (on the stack). Since it is the default, it is rarely used by programmers, as it is redundant. In fact, you should never use it now that C++ has re-purposed this keyword for other uses. The name of the keyword comes from the fact that local variables are "automatic", in that the compiler deals with the creation and removal of them at run-time. Automatic variables that are not initialized have undefined values.

    The lifetime of an auto object is until the end of the scope.

  2. extern - This tells the compiler that the object is defined elsewhere. No space is allocated for the object, since it does not define anything. You can think of this as a "declaration/prototype" for an object so the compiler doesn't complain. The linker will be responsible for figuring out exactly where (i.e. which file) the object is defined.

    The lifetime of an extern object is until the program ends.

  3. static - When applied to a local variable, this tells the compiler that the object should NOT be stored on the stack, but stored in some other memory location that will still exist when the function or scope ends. This means that the value stored in the object is retained between calls to the function. The first time the function is called, it is initialized. Subsequent function calls modify the object's value and it will be retained. If you do not initialize the object when you define it, it will be initialized to 0. If you define an object with the static keyword inside an inner scope (e.g. a for loop), then its value is retained through each iteration of the scope.

    The lifetime of a static object is until the program ends.

  4. register - This tells the compiler that you want to put the object in a register on the CPU instead of putting it in memory (on the stack). This can only be used for local variables within a block {}. This is for performance reasons, since registers are significantly faster than memory. However, most compilers will do this whenever they determine that it is possible. There are only a handful of registers that are available for programmers, so they are a very scarce resource and should only be used if you know exactly what you're doing. Rule of Thumb: Beginning C programmers DO NOT know what they are doing and should let the compiler decide when to use registers. Using the register keyword is just a suggestion to the compiler, as the compiler knows far more about your code at the lowest level than what any programmer knows. In fact, if you (the programmer) use the register keyword, you are likely interfering with the compiler's ability to optimize your code for better performance, in which case your program will actually run slower. Again, follow the Rule of Thumb. C++17 has now removed this, so it no longer works. That should indicate that you really shouldn't use it at all.

    The lifetime of a register object is until the end of the scope.

Here's some code to demonstrate:
/* storage.c */
#include <stdio.h>

void foo(void)
{
  static int count = 1; /* not stored on the stack, value persists between function calls */

  printf("The function has been called %i times.\n", count++);
}

int main(void)
{
  auto int a;     /* a is uninitialized on the stack, auto is redundant         */
  int b;          /* b is uninitialized and is also on the stack (same as auto) */
  register int c; /* c will be put in a register, if possible (no guarantee)    */
  extern int d;   /* d is defined elsewhere, this is just a declaration         */

  for (a = 0; a < 5; a++)
  {
    static int e = 5; /* not stored on the stack, value persists in loops    */
    static int f;     /* not stored on the stack, f will be initialized to 0 */
    int g = 10;       /* on the stack, g will be initialized to 10 each time */

    printf("e is %i, f is %i, g is %i\n", e++, f++, g++);
    foo();
  }

  return 0;
}
Output:
e is 5, f is 0, g is 10
The function has been called 1 times.
e is 6, f is 1, g is 10
The function has been called 2 times.
e is 7, f is 2, g is 10
The function has been called 3 times.
e is 8, f is 3, g is 10
The function has been called 4 times.
e is 9, f is 4, g is 10
The function has been called 5 times.
Notes: