Linked Lists

"C programmers think memory management is too important to be left to the computer. Lisp programmers think memory management is too important to be left to the user." -- from Ellis and Stroustrup's The Annotated C++ Reference Manual

Arrays

Arrays are simple, popular, and built-in to the C language and they have certain characteristics, both good and bad.

The Good:

Built-in to the language.
Easy to create at compile time or runtime.
Accessing any element in the array is trivial (just use an index).
Easy to "clean up". (Static arrays, just forget about it. Dynamic, call free.)

The Bad:

You need to know size ahead of time (both static and dynamic arrays).
You must allocate a fixed amount of space (can't resize an array).
Inserting and deleting anywhere but at the end requires a lot of work.

The Ugly:

No bounds checking, allowing you to overwrite memory.
The unusual pointer-array relationship here. (Look in the Critique section or search on "historical accidents or mistakes")

Example of the limitation of arrays:

// Prints each value in the integer array
void print_array(int array[], int size)
{
  int i;
  for (i = 0; i < size; i++)
    printf("%i   ", array[i]);
  printf("\n");
}

int main(void)
{
  int numbers[30]; // Holds at most 30 integers
  int count = 0;

    // Open text file of numbers for reading
  FILE *fp = fopen("numbers.txt", "r");

    // Process the entire file
  while (!feof(fp))
  {
    int number;

      // Read next integer from the file
    if (fscanf(fp, "%i", &number) == 0)
      break;

      // Add the number to the end of the array
    numbers[count++] = number;
  }

  fclose(fp);

    // Print the array
  print_array(numbers, count);

  return 0;
}

Of course, if there are more than 30 numbers, we are going to overwrite the end of the array.

Possible "fixes":

Don't read in more than 30 numbers.
Set the size of the array to more than 30 (and hope it's big enough or waste a lot of space)
Allocate the array at runtime. (Still need a size.) Choices:
1. Put the size in the file. (Maybe the first line contains the number of integers.) Allocate the array when the size is known.
2. When the array is full, allocate another, bigger array, and copy old values into it. This may need to be done many times.

Linked Lists

We'd like to overcome the limitations of arrays. One way is to use a linked list. So, what is a Linked List?

A dynamic collection of elements called nodes.
A node is a struct (or class in C++).
A node has two main portions
- data portion -- same type (size) of information in all nodes
- pointer portion -- a pointer to the next node in the list
The data portion can be as simple as an int or very complex. (It's user-defined.)
The linked list is accessed through an external pointer called the head which points to the first node in the list.
The last pointer in the list points to NULL, marking the end of the list. (It's sort of a "NULL-terminated" list)
If the list is empty, head points to NULL.
If each node in the list has only one pointer (a next pointer), the list is called a singly linked list
Some lists have nodes with two pointers (a next and a previous). This type of linked list is known as a doubly linked list.

An example of a node structure (for a singly linked list) that contains an integer as it's data:

struct Node
{
  int number;        // data portion
  struct Node *next; // pointer portion
};

Notice that the structure above is sort of recursive. In other words, we're defining the structure by including a reference to itself in the definition. (Actually, there's only a pointer to itself.)

When the compiler encounters a structure member, it must know the size of the member. Since the size of all pointers is known at compile time, the code above is completely sane and legal. (Also, the compiler already knows what a struct Node is.)

This example code:

  // #1 Declare 3 structs
struct Node A, B, C;

  // #2 Initialize 'data' portions of the nodes
A.number = 10;
B.number = 20;
C.number = 30;

  // #3 Connect (link) the nodes together
A.next = &B;   // A's next points to B
B.next = &C;   // B's next points to C
C.next = NULL; // Nothing follows C

could be visualized as this:

After #1

After #2

After #3

The "problem" with this approach, is that we are declaring (and naming) all of the nodes at compile time. If we wanted to read a list of 30 integers from a file, we'd need to declare 30 Node structs. We're worse off than with arrays.

Notice from the diagram that naming struct B and C is redundant. Also remember that we don't "name" our individual elements of an array. We refer to them by supplying a subscript on the array name:

int numbers[30]; // 30 "anonymous" elements
numbers[5] = 0;  // We don't have a "name" for the 6th element

This principle of "anonymous" elements will apply to linked lists as well:

To access an element of an array, we simply use the name of the array (essentially a pointer to the first element) and an index.
To access an element of a linked list, we use a pointer to the first node and then walk the list to find a particular node.

For example, with named nodes (as in the example above) we can print out the data of each node very simply:

printf("%i\n", A.number); // 10
printf("%i\n", B.number); // 20
printf("%i\n", C.number); // 30

With unnamed nodes (i.e. access to the first node only):

struct Node *pNode = &A; // Point to first node
while (pNode) 
{
  printf("%i\n", pNode->number); // Print data
  pNode = pNode->next;           // "Follow" the pointer
}

Visually:

Let's revisit the original problem of reading an unknown number of integers from a file:

int main(void)
{
  struct Node *pList = NULL; // empty list
  
  FILE *fp = fopen("numbers.txt", "r");
  while (!feof(fp))
  {
    struct Node *pNode;
    int number;

      // Read next integer from the file
    if (fscanf(fp, "%i", &number) == 0)
      break;

      // Allocate a new node struct (same for all nodes)
    pNode = (struct Node *) malloc(sizeof(struct Node));

    pNode->number = number; // Initialize number
    pNode->next = NULL;     // Initialize next (no next yet)

      // If the list is NULL (empty), this is the first
      // node we are adding to the list. 
    if (pList == NULL)
      pList = pNode;
    else
    {
        // Find the end of the list
      struct Node *temp = pList;
      while (temp->next)
        temp = temp->next;

      temp->next = pNode; // Put new node at the end
    }
  }
  print_list(pList);  // Display the list
}

Make sure you can follow what each line of code is doing. You should definitely draw diagrams until you are very comfortable with linked lists. Note these two sections especially:

Creating a new node for each element of data (number in the file):

  // Allocate a new node struct (same for all nodes)
pNode = (struct Node *) malloc(sizeof(struct Node));
pNode->number = number; // Initialize number
pNode->next = NULL;     // Initialize next (no next yet)

Adding the new node to the end of the list:

  // If the list is NULL (empty), this is the first
  // node we are adding to the list. 
if (pList == NULL)
  pList = pNode;
else
{
    // Find the end of the list
  struct Node *temp = pList;
  while (temp->next)
    temp = temp->next;

  temp->next = pNode; // Put new node at the end
}

Also note the print_list function used above:

void print_list(struct Node *list)
{
  while (list)
  {
    printf("%i   ", list->number);
    list = list->next;
  }
  printf("\n");
}

A few points to make so far:

The code is certainly more complex than arrays.
The number of nodes in a linked list is only dependent on the amount of memory. (This code can handle small lists or large lists.)
We are only allocating what we need. (Arrays can waste space.)
There is a 4-byte (size of a pointer on 32-bit computers) overhead for each node.
The time it takes to add a node to the end of the linked list takes longer as the list grows.
We must also remember to deallocate (free) each node in the list when we are finished. (We haven't done that in the example yet.)

Also note that this very simple example does not do any error handling, especially the condition where malloc returns NULL. In Real World™ code, you would need to check the return from malloc and deal with it accordingly.

Adding nodes

Let's address the last two points now. First, this one: "The time it takes to add a node to the end of the linked list takes longer as the list grows."

This is simply because we are adding to the end and we don't have any immediate (random) access to the end. We only have immediate access to the first node; all of the other nodes must be accessed from the first one. If the list is long, this can take a while.

Solution #1: Maintain a pointer to the last node (tail).

We add a variable to track the tail:

struct Node *pList = NULL; // empty list
struct Node *pTail = NULL; // no tail yet

And then (in the while loop) change this code:

  // If the list is NULL (empty), this is the first
  // node we are adding to the list. 
if (pList == NULL)
  pList = pNode;
else
{
    // Find the end of the list
  struct Node *temp = pList;
  while (temp->next)
    temp = temp->next;

  temp->next = pNode; // Put new node at the end
}

to this code:

  // If the list is NULL (empty), this is the first
  // node we are adding to the list. 
if (pList == NULL)
{
  pList = pNode; // If there's only one node, it is
  pTail = pNode; //   both the head and tail 
}
else
{
  pTail->next = pNode; // Add after the end
  pTail = pNode;       // New node is new end
}

Solution #2: Insert at the head of the list instead of the tail. This is simpler yet:

Again we modify the code in the while loop to this:

  // If the list is NULL (empty), this is the first
  // node we are adding to the list. 
if (pList == NULL)
  pList = pNode;
else
{
  pNode->next = pList; // Insert new node before first
  pList = pNode;       // Now, list points to new node
}

Note that we can simply set the next pointer to the list, which will simplify things even more:

  // Allocate a new node struct
pNode = (struct Node *) malloc(sizeof(struct Node));
pNode->number = number; // Initialize number

pNode->next = pList; // Initialize to list instead of NULL
pList = pNode;       // Update list pointer

This has the "feature" that the items in the list will be reversed. (Think of a stack.)

Freeing nodes

Up until now, we haven't freed any of the nodes. Since we called malloc for these nodes, we have to call free when we're through. This is straight-forward using another while loop:

while (pList)
{
  struct Node *temp = pList->next;
  free(pList);
  pList = temp;
}

Diagrams for freeing the nodes.

Notes thus far:

When we traverse (walk) a linked list, we always start from the beginning (head).
Note that we cannot walk backwards through a singly linked list because we have no way to get from a node to the previous node.
A doubly linked list allows traversals in both directions.
If you only need to move in one direction, a singly linked list might be enough.
If you require bi-directional traversals, you will want to use a doubly linked list.

An Ordered List

The previous examples have added the data (integers) to the list in the order they arrived from the file. (Inserting at the front of the list caused the data to be reversed.) This is no different than the way you would add elements to an array.

Suppose we want to keep the linked list sorted, from smallest to largest. This is the data that is in the file:

12 34 21 56 38 94 23 22 67 56 88 19 59 10 17

Again, we just need to modify our while loop that adds a node to the list:

  // Allocate a new node struct (same for all nodes)
pNode = (struct Node *) malloc(sizeof(struct Node));
pNode->number = number; // Initialize number
pNode->next = NULL;     // Initialize next

  // If the list is empty, this is the first one
if (!pList)
  pList = pNode;
else
{
  struct Node *temp = pList, *prev;

    // Find the correct position in the list for the new node. 
  while (temp && (temp->number < pNode->number))
  {
    prev = temp;       // save previous (singly linked)     
    temp = temp->next; 
  }

    // If this number comes before the first one
  if (temp == pList)
    pList = pNode;
  else  
    prev->next = pNode; // Insert between current and prev node

  pNode->next = temp;   // Next node is larger than new node
}

This prints out:

10   12   17   19   21   22   23   34   38   56   56   59   67   88   94

If we insert a call to print_list after every insertion into the list, we can see the list evolve. The bold indicates newly inserted numbers:

pList → 12
pList → 12 → 34
pList → 12 → 21 → 34
pList → 12 → 21 → 34 → 56
pList → 12 → 21 → 34 → 38 → 56
pList → 12 → 21 → 34 → 38 → 56 → 94
pList → 12 → 21 → 23 → 34 → 38 → 56 → 94
pList → 12 → 21 → 22 → 23 → 34 → 38 → 56 → 94
pList → 12 → 21 → 22 → 23 → 34 → 38 → 56 → 67 → 94
pList → 12 → 21 → 22 → 23 → 34 → 38 → 56 → 56 → 67 → 94
pList → 12 → 21 → 22 → 23 → 34 → 38 → 56 → 56 → 67 → 88 → 94
pList → 12 → 19 → 21 → 22 → 23 → 34 → 38 → 56 → 56 → 67 → 88 → 94
pList → 12 → 19 → 21 → 22 → 23 → 34 → 38 → 56 → 56 → 59 → 67 → 88 → 94
pList → 10 → 12 → 19 → 21 → 22 → 23 → 34 → 38 → 56 → 56 → 59 → 67 → 88 → 94
pList → 10 → 12 → 17 → 19 → 21 → 22 → 23 → 34 → 38 → 56 → 56 → 59 → 67 → 88 → 94

Try doing that with an array! (By the way, how would you do this with an array?)

Linked List Functions

We should try to factor out the common linked list functionality for code reuse. We already have a function to print the lists. Another good one to start with is a function to free the nodes in the lists (so we don't have any memory leaks).

This is straight-forward:

void free_list(struct Node *pList)
{
    // Until we reach the end of the list
  while (pList)
  {
    struct Node *temp = pList->next; // save next node
    free(pList);                     // free current node
    pList = temp;                    // move to next node
  }
}

Another popular task is the ability to search through a linked list to find a particular element. With arrays, we simply loop through with an index, but with a linked list, we need to modify it slightly.

Search an array for an item:

// Searches array for value and returns its index.
// Returns -1 if the value is not in the array
int search_array(int array[], int size, int value)
{
  int i;
  for (i = 0; i < size; i++)
    if (array[i] == value)
      return i;

  return -1;
}

Search a linked list for an item:

// Searches a linked list for a value and returns the node
// Returns NULL if the value isn't found
struct Node *find_item(struct Node *list, int value)
{
  while (list)
  {
    if (list->number == value)
      return list;
    list = list->next;
  }
  return NULL;
}

Deleting a node from the list:

Find the node containing the item.
"Unlink" (remove) the node from the list.
Free the node that was removed.
Return a pointer to the first node. (Depends on implementation)

This sample delete_item keeps track of the previous node while searching for the item to delete. We need the previous so we can "reconnect" the remaining nodes after removing the node we found.

struct Node* delete_item(struct Node *list, int value)
{
  struct Node *curr = list;
  struct Node *prev = NULL;

    // Find node containing item to delete
  while (curr && curr->number != value)
  {
    prev = curr;
    curr = curr->next;
  }

    // Not found
  if (curr == NULL)
    return list;

    // Delete head of the list
  if (prev == NULL)
    list = list->next;
  else 
    prev->next = curr->next;

  free(curr);
  return list;
}

Suppose we call the function like this:

pList = delete_item(pList, 21);

Start of algorithm. Initialize the current and previous pointers.

After "search" loop curr points at the node to be removed.

Unlink the node and "reconnect" the remaining nodes. (This is why we needed to keep track of the previous node.)

Free the node that curr points to:

Can you now see why we returned a pointer to the first node?

Sample data and three calls to delete_item:

Original data:

12   34   21   56   38   94   23   22   67   56   88   19   59   10   17

Client code:

list1 = delete_item(list1, 22); 
print_list(list1);
list1 = delete_item(list1, 17);
print_list(list1);
list1 = delete_item(list1, 12);
print_list(list1);

Output:

12   34   21   56   38   94   23   67   56   88   19   59   10   17
12   34   21   56   38   94   23   67   56   88   19   59   10
34   21   56   38   94   23   67   56   88   19   59   10

Another couple of functions that would seem to be useful would be functions that allow us to add an item to the end and add an item at the front. We'll call these functions push_back and push_front, respectively.

These functions modify the list, but do not return a pointer to it (like the delete_item function above did.) The technique is shown below.

Add a node to the end:

void push_back(struct Node **ppList, struct Node *pNode)
{
    // If the list is NULL (empty), this is the first
    // node we are adding to the list. 
  if (*ppList == NULL)
    *ppList = pNode;
  else
  {
      // Find the end of the list
    struct Node *temp = *ppList;
    while (temp->next)
      temp = temp->next;

    temp->next = pNode; // Put new node at the end
  }

  pNode->next = NULL;   // New node is at the end
}

Add a node to the front:

void push_front(struct Node **ppList, struct Node *pNode)
{
  pNode->next = *ppList;   // List is "behind" new node
  *ppList = pNode;         // New node is at the front
}

This makes our client code much simpler:

int main(void)
{
  struct Node *pList = NULL; // empty list
  
  FILE *fp = fopen("numbers.txt", "r");
  if (!fp)
    return -1;

  while (!feof(fp))
  {
    struct Node *pNode;
    int number;

      // Read next integer from the file
    if (fscanf(fp, "%i", &number) == 0)
      break;

      // Allocate a new node struct (same for all nodes)
    pNode = (struct Node *) malloc(sizeof(struct Node));
    pNode->number = number; // Initialize number
    pNode->next = NULL;     // Initialize next (no next yet)

    push_back(&pList, pNode);
  }
  fclose(fp);

  print_list(pList);
  free_list(pList);

  return 0;
}

We can go step further and have the node allocation done in the functions, too:

Add to end:

void push_back(struct Node **ppList, int number)
{
    // Allocate a new node struct (same for all nodes)
  struct Node *pNode;
  pNode = (struct Node *) malloc(sizeof(struct Node));
  pNode->number = number; // Initialize number
  pNode->next = NULL;     // Initialize next (no next yet)

    // If the list is NULL (empty), this is the first
    // node we are adding to the list. 
  if (*ppList == NULL)
    *ppList = pNode;
  else
  {
      // Find the end of the list
    struct Node *temp = *ppList;
    while (temp->next)
      temp = temp->next;

    temp->next = pNode; // Put new node at the end
  }
}

Add to front:

void push_front(struct Node **ppList, int number)
{
    // Allocate a new node struct (same for all nodes)
  struct Node *pNode;
  pNode = (struct Node *) malloc(sizeof(struct Node));
  pNode->number = number; // Initialize number
  pNode->next = *ppList;   // Initialize next 
  *ppList = pNode;         // Update list
}

Now, our client code is very simple. Also, because we modularized the linked-list operations, we can now have multiple lists:

int main(void)
{
  struct Node *list1 = NULL, *list2 = NULL; 

  FILE *fp = fopen("numbers.txt", "r");
  if (!fp)
    return -1;

  while (!feof(fp))
  {
    int number;
    if (fscanf(fp, "%i", &number) == 0)
      break;

    push_back(&list1, number);
    push_front(&list2, number);
  }
  fclose(fp);

  print_list(list1);
  print_list(list2);
  free_list(list1);
  free_list(list2);

  return 0;
}

Input:

12   34   21   56   38   94   23   22   67   56   88   19   59   10   17

Output:

12   34   21   56   38   94   23   22   67   56   88   19   59   10   17
17   10   59   19   88   56   67   22   23   94   38   56   21   34   12

It would also be simple to make a function called insert_node which would insert the node in the proper position, keeping the list ordered.

void insert_node(struct Node **ppList, int number)
{
    // Allocate a new node struct (same for all nodes)
  struct Node *pNode;
  pNode = (struct Node *) malloc(sizeof(struct Node));
  pNode->number = number; // Initialize number

    // Code to insert into ordered list
}

This would lead to the sorted list:

10   12   17   19   21   22   23   34   38   56   56   59   67   88   94

Doubly Linked Lists

A doubly linked list is a list that has two pointers. In addition to the next pointer, it has a previous pointer. This allows you to traverse the list in both directions.

An example node structure for a doubly linked list:

struct Node
{
  int number;        // data portion
  struct Node *next; // node after this one
  struct Node *prev; // node before this one
};

Compared with singly linked lists, double linked lists:

require an extra pointer for each node.
require a more work at runtime to "hook-up" the extra pointer.
are more flexible by allowing traversals in both directions.
can be simpler to implement because nodes can access their next and previous node.

The implementations for functions to manipulate doubly linked lists would be similar to the singly linked lists functions, with the additional code for dealing with the previous pointer.

To summarize, linked lists:

are an alternate data structure for storing related data.
are more complex in structure and more complex in accessing than arrays.
require the programmer to deal with the complexity. (They are not built into the language.)
require more overhead than arrays.
can grow and shrink in size at runtime.
very efficient use of memory since you only allocate what you need (and free when you're done.)
are very efficient for inserting and deleting items at any point in the list.

Why not do away with arrays and use linked lists for all lists?