Linked Lists

"C programmers think memory management is too important to be left to the computer. Lisp programmers think memory management is too important to be left to the user." -- from Ellis and Stroustrup's The Annotated C++ Reference Manual

Arrays

Arrays are simple, popular, and built-in to the C language and they have certain characteristics, both good and bad.

The Good:

The Bad: The Ugly: Example of the limitation of arrays:

// Prints each value in the integer array
void print_array(int array[], int size)
{
  int i;
  for (i = 0; i < size; i++)
    printf("%i   ", array[i]);
  printf("\n");
}

int main(void)
{
  int numbers[30]; // Holds at most 30 integers
  int count = 0;

    // Open text file of numbers for reading
  FILE *fp = fopen("numbers.txt", "r");

    // Process the entire file
  while (!feof(fp))
  {
    int number;

      // Read next integer from the file
    if (fscanf(fp, "%i", &number) == 0)
      break;

      // Add the number to the end of the array
    numbers[count++] = number;
  }

  fclose(fp);

    // Print the array
  print_array(numbers, count);

  return 0;
}
Of course, if there are more than 30 numbers, we are going to overwrite the end of the array.

Possible "fixes":


Linked Lists

We'd like to overcome the limitations of arrays. One way is to use a linked list. So, what is a Linked List? An example of a node structure (for a singly linked list) that contains an integer as it's data:

struct Node
{
  int number;        // data portion
  struct Node *next; // pointer portion
};
Notice that the structure above is sort of recursive. In other words, we're defining the structure by including a reference to itself in the definition. (Actually, there's only a pointer to itself.)

When the compiler encounters a structure member, it must know the size of the member. Since the size of all pointers is known at compile time, the code above is completely sane and legal. (Also, the compiler already knows what a struct Node is.)

This example code:

  // #1 Declare 3 structs
struct Node A, B, C;

  // #2 Initialize 'data' portions of the nodes
A.number = 10;
B.number = 20;
C.number = 30;

  // #3 Connect (link) the nodes together
A.next = &B;   // A's next points to B
B.next = &C;   // B's next points to C
C.next = NULL; // Nothing follows C
could be visualized as this:

After #1

After #2

After #3

The "problem" with this approach, is that we are declaring (and naming) all of the nodes at compile time. If we wanted to read a list of 30 integers from a file, we'd need to declare 30 Node structs. We're worse off than with arrays.

Notice from the diagram that naming struct B and C is redundant. Also remember that we don't "name" our individual elements of an array. We refer to them by supplying a subscript on the array name:

int numbers[30]; // 30 "anonymous" elements
numbers[5] = 0;  // We don't have a "name" for the 6th element
This principle of "anonymous" elements will apply to linked lists as well: For example, with named nodes (as in the example above) we can print out the data of each node very simply:
printf("%i\n", A.number); // 10
printf("%i\n", B.number); // 20
printf("%i\n", C.number); // 30
With unnamed nodes (i.e. access to the first node only):
struct Node *pNode = &A; // Point to first node
while (pNode) 
{
  printf("%i\n", pNode->number); // Print data
  pNode = pNode->next;           // "Follow" the pointer
}
Visually:


Let's revisit the original problem of reading an unknown number of integers from a file:

int main(void)
{
  struct Node *pList = NULL; // empty list
  
  FILE *fp = fopen("numbers.txt", "r");
  while (!feof(fp))
  {
    struct Node *pNode;
    int number;

      // Read next integer from the file
    if (fscanf(fp, "%i", &number) == 0)
      break;

      // Allocate a new node struct (same for all nodes)
    pNode = (struct Node *) malloc(sizeof(struct Node));

    pNode->number = number; // Initialize number
    pNode->next = NULL;     // Initialize next (no next yet)

      // If the list is NULL (empty), this is the first
      // node we are adding to the list. 
    if (pList == NULL)
      pList = pNode;
    else
    {
        // Find the end of the list
      struct Node *temp = pList;
      while (temp->next)
        temp = temp->next;

      temp->next = pNode; // Put new node at the end
    }
  }
  print_list(pList);  // Display the list
}
Make sure you can follow what each line of code is doing. You should definitely draw diagrams until you are very comfortable with linked lists. Note these two sections especially:

Creating a new node for each element of data (number in the file):

  // Allocate a new node struct (same for all nodes)
pNode = (struct Node *) malloc(sizeof(struct Node));
pNode->number = number; // Initialize number
pNode->next = NULL;     // Initialize next (no next yet)
Adding the new node to the end of the list:

  // If the list is NULL (empty), this is the first
  // node we are adding to the list. 
if (pList == NULL)
  pList = pNode;
else
{
    // Find the end of the list
  struct Node *temp = pList;
  while (temp->next)
    temp = temp->next;

  temp->next = pNode; // Put new node at the end
}

Also note the print_list function used above:

void print_list(struct Node *list)
{
  while (list)
  {
    printf("%i   ", list->number);
    list = list->next;
  }
  printf("\n");
}
A few points to make so far:
  1. The code is certainly more complex than arrays.
  2. The number of nodes in a linked list is only dependent on the amount of memory. (This code can handle small lists or large lists.)
  3. We are only allocating what we need. (Arrays can waste space.)
  4. There is a 4-byte (size of a pointer on 32-bit computers) overhead for each node.
  5. The time it takes to add a node to the end of the linked list takes longer as the list grows.
  6. We must also remember to deallocate (free) each node in the list when we are finished. (We haven't done that in the example yet.)

Also note that this very simple example does not do any error handling, especially the condition where malloc returns NULL. In Real World™ code, you would need to check the return from malloc and deal with it accordingly.


Adding nodes

Let's address the last two points now. First, this one: "The time it takes to add a node to the end of the linked list takes longer as the list grows."

This is simply because we are adding to the end and we don't have any immediate (random) access to the end. We only have immediate access to the first node; all of the other nodes must be accessed from the first one. If the list is long, this can take a while.

Solution #1: Maintain a pointer to the last node (tail).

We add a variable to track the tail:

struct Node *pList = NULL; // empty list
struct Node *pTail = NULL; // no tail yet
And then (in the while loop) change this code:

  // If the list is NULL (empty), this is the first
  // node we are adding to the list. 
if (pList == NULL)
  pList = pNode;
else
{
    // Find the end of the list
  struct Node *temp = pList;
  while (temp->next)
    temp = temp->next;

  temp->next = pNode; // Put new node at the end
}
to this code:

  // If the list is NULL (empty), this is the first
  // node we are adding to the list. 
if (pList == NULL)
{
  pList = pNode; // If there's only one node, it is
  pTail = pNode; //   both the head and tail 
}
else
{
  pTail->next = pNode; // Add after the end
  pTail = pNode;       // New node is new end
}


Solution #2: Insert at the head of the list instead of the tail. This is simpler yet:

Again we modify the code in the while loop to this:

  // If the list is NULL (empty), this is the first
  // node we are adding to the list. 
if (pList == NULL)
  pList = pNode;
else
{
  pNode->next = pList; // Insert new node before first
  pList = pNode;       // Now, list points to new node
}
Note that we can simply set the next pointer to the list, which will simplify things even more:
  // Allocate a new node struct
pNode = (struct Node *) malloc(sizeof(struct Node));
pNode->number = number; // Initialize number

pNode->next = pList; // Initialize to list instead of NULL
pList = pNode;       // Update list pointer
This has the "feature" that the items in the list will be reversed. (Think of a stack.)


Freeing nodes

Up until now, we haven't freed any of the nodes. Since we called malloc for these nodes, we have to call free when we're through. This is straight-forward using another while loop:

while (pList)
{
  struct Node *temp = pList->next;
  free(pList);
  pList = temp;
}
Diagrams for freeing the nodes.

Notes thus far:


An Ordered List

The previous examples have added the data (integers) to the list in the order they arrived from the file. (Inserting at the front of the list caused the data to be reversed.) This is no different than the way you would add elements to an array.

Suppose we want to keep the linked list sorted, from smallest to largest. This is the data that is in the file:

12 34 21 56 38 94 23 22 67 56 88 19 59 10 17 
Again, we just need to modify our while loop that adds a node to the list:
  // Allocate a new node struct (same for all nodes)
pNode = (struct Node *) malloc(sizeof(struct Node));
pNode->number = number; // Initialize number
pNode->next = NULL;     // Initialize next

  // If the list is empty, this is the first one
if (!pList)
  pList = pNode;
else
{
  struct Node *temp = pList, *prev;

    // Find the correct position in the list for the new node. 
  while (temp && (temp->number < pNode->number))
  {
    prev = temp;       // save previous (singly linked)     
    temp = temp->next; 
  }

    // If this number comes before the first one
  if (temp == pList)
    pList = pNode;
  else  
    prev->next = pNode; // Insert between current and prev node

  pNode->next = temp;   // Next node is larger than new node
}
This prints out:

10   12   17   19   21   22   23   34   38   56   56   59   67   88   94
If we insert a call to print_list after every insertion into the list, we can see the list evolve. The bold indicates newly inserted numbers:
pList → 12
pList → 12 → 34
pList → 12 → 21 → 34
pList → 12 → 21 → 34 → 56
pList → 12 → 21 → 34 → 38 → 56
pList → 12 → 21 → 34 → 38 → 56 → 94
pList → 12 → 21 → 23 → 34 → 38 → 56 → 94
pList → 12 → 21 → 22 → 23 → 34 → 38 → 56 → 94
pList → 12 → 21 → 22 → 23 → 34 → 38 → 56 → 67 → 94
pList → 12 → 21 → 22 → 23 → 34 → 38 → 56 → 56 → 67 → 94
pList → 12 → 21 → 22 → 23 → 34 → 38 → 56 → 56 → 67 → 88 → 94
pList → 12 → 19 → 21 → 22 → 23 → 34 → 38 → 56 → 56 → 67 → 88 → 94
pList → 12 → 19 → 21 → 22 → 23 → 34 → 38 → 56 → 56 → 59 → 67 → 88 → 94
pList → 10 → 12 → 19 → 21 → 22 → 23 → 34 → 38 → 56 → 56 → 59 → 67 → 88 → 94
pList → 10 → 12 → 17 → 19 → 21 → 22 → 23 → 34 → 38 → 56 → 56 → 59 → 67 → 88 → 94
Try doing that with an array! (By the way, how would you do this with an array?)


Linked List Functions

We should try to factor out the common linked list functionality for code reuse. We already have a function to print the lists. Another good one to start with is a function to free the nodes in the lists (so we don't have any memory leaks).

This is straight-forward:

void free_list(struct Node *pList)
{
    // Until we reach the end of the list
  while (pList)
  {
    struct Node *temp = pList->next; // save next node
    free(pList);                     // free current node
    pList = temp;                    // move to next node
  }
}


Another popular task is the ability to search through a linked list to find a particular element. With arrays, we simply loop through with an index, but with a linked list, we need to modify it slightly.

Search an array for an item:

// Searches array for value and returns its index.
// Returns -1 if the value is not in the array
int search_array(int array[], int size, int value)
{
  int i;
  for (i = 0; i < size; i++)
    if (array[i] == value)
      return i;

  return -1;
}
Search a linked list for an item:
// Searches a linked list for a value and returns the node
// Returns NULL if the value isn't found
struct Node *find_item(struct Node *list, int value)
{
  while (list)
  {
    if (list->number == value)
      return list;
    list = list->next;
  }
  return NULL;
}


Deleting a node from the list:

  1. Find the node containing the item.
  2. "Unlink" (remove) the node from the list.
  3. Free the node that was removed.
  4. Return a pointer to the first node. (Depends on implementation)
This sample delete_item keeps track of the previous node while searching for the item to delete. We need the previous so we can "reconnect" the remaining nodes after removing the node we found.
struct Node* delete_item(struct Node *list, int value)
{
  struct Node *curr = list;
  struct Node *prev = NULL;

    // Find node containing item to delete
  while (curr && curr->number != value)
  {
    prev = curr;
    curr = curr->next;
  }

    // Not found
  if (curr == NULL)
    return list;

    // Delete head of the list
  if (prev == NULL)
    list = list->next;
  else 
    prev->next = curr->next;

  free(curr);
  return list;
}
Suppose we call the function like this:
pList = delete_item(pList, 21);

Start of algorithm. Initialize the current and previous pointers.


After "search" loop curr points at the node to be removed.


Unlink the node and "reconnect" the remaining nodes. (This is why we needed to keep track of the previous node.)


Free the node that curr points to:
Can you now see why we returned a pointer to the first node?

Sample data and three calls to delete_item:

Original data:

12   34   21   56   38   94   23   22   67   56   88   19   59   10   17
Client code:

list1 = delete_item(list1, 22); 
print_list(list1);
list1 = delete_item(list1, 17);
print_list(list1);
list1 = delete_item(list1, 12);
print_list(list1);
Output:

12   34   21   56   38   94   23   67   56   88   19   59   10   17
12   34   21   56   38   94   23   67   56   88   19   59   10
34   21   56   38   94   23   67   56   88   19   59   10


Another couple of functions that would seem to be useful would be functions that allow us to add an item to the end and add an item at the front. We'll call these functions push_back and push_front, respectively.

These functions modify the list, but do not return a pointer to it (like the delete_item function above did.) The technique is shown below.

Add a node to the end:

void push_back(struct Node **ppList, struct Node *pNode)
{
    // If the list is NULL (empty), this is the first
    // node we are adding to the list. 
  if (*ppList == NULL)
    *ppList = pNode;
  else
  {
      // Find the end of the list
    struct Node *temp = *ppList;
    while (temp->next)
      temp = temp->next;

    temp->next = pNode; // Put new node at the end
  }

  pNode->next = NULL;   // New node is at the end
}

Add a node to the front:

void push_front(struct Node **ppList, struct Node *pNode)
{
  pNode->next = *ppList;   // List is "behind" new node
  *ppList = pNode;         // New node is at the front
}
This makes our client code much simpler:
int main(void)
{
  struct Node *pList = NULL; // empty list
  
  FILE *fp = fopen("numbers.txt", "r");
  if (!fp)
    return -1;

  while (!feof(fp))
  {
    struct Node *pNode;
    int number;

      // Read next integer from the file
    if (fscanf(fp, "%i", &number) == 0)
      break;

      // Allocate a new node struct (same for all nodes)
    pNode = (struct Node *) malloc(sizeof(struct Node));
    pNode->number = number; // Initialize number
    pNode->next = NULL;     // Initialize next (no next yet)

    push_back(&pList, pNode);
  }
  fclose(fp);

  print_list(pList);
  free_list(pList);

  return 0;
}
We can go step further and have the node allocation done in the functions, too:

Add to end:

void push_back(struct Node **ppList, int number)
{
    // Allocate a new node struct (same for all nodes)
  struct Node *pNode;
  pNode = (struct Node *) malloc(sizeof(struct Node));
  pNode->number = number; // Initialize number
  pNode->next = NULL;     // Initialize next (no next yet)

    // If the list is NULL (empty), this is the first
    // node we are adding to the list. 
  if (*ppList == NULL)
    *ppList = pNode;
  else
  {
      // Find the end of the list
    struct Node *temp = *ppList;
    while (temp->next)
      temp = temp->next;

    temp->next = pNode; // Put new node at the end
  }
}
Add to front:

void push_front(struct Node **ppList, int number)
{
    // Allocate a new node struct (same for all nodes)
  struct Node *pNode;
  pNode = (struct Node *) malloc(sizeof(struct Node));
  pNode->number = number; // Initialize number
  pNode->next = *ppList;   // Initialize next 
  *ppList = pNode;         // Update list
}
Now, our client code is very simple. Also, because we modularized the linked-list operations, we can now have multiple lists:
int main(void)
{
  struct Node *list1 = NULL, *list2 = NULL; 

  FILE *fp = fopen("numbers.txt", "r");
  if (!fp)
    return -1;

  while (!feof(fp))
  {
    int number;
    if (fscanf(fp, "%i", &number) == 0)
      break;

    push_back(&list1, number);
    push_front(&list2, number);
  }
  fclose(fp);

  print_list(list1);
  print_list(list2);
  free_list(list1);
  free_list(list2);

  return 0;
}
Input:
12   34   21   56   38   94   23   22   67   56   88   19   59   10   17 
Output:
12   34   21   56   38   94   23   22   67   56   88   19   59   10   17
17   10   59   19   88   56   67   22   23   94   38   56   21   34   12
It would also be simple to make a function called insert_node which would insert the node in the proper position, keeping the list ordered.
void insert_node(struct Node **ppList, int number)
{
    // Allocate a new node struct (same for all nodes)
  struct Node *pNode;
  pNode = (struct Node *) malloc(sizeof(struct Node));
  pNode->number = number; // Initialize number

    // Code to insert into ordered list
}
This would lead to the sorted list:
10   12   17   19   21   22   23   34   38   56   56   59   67   88   94


Doubly Linked Lists

A doubly linked list is a list that has two pointers. In addition to the next pointer, it has a previous pointer. This allows you to traverse the list in both directions.

An example node structure for a doubly linked list:

struct Node
{
  int number;        // data portion
  struct Node *next; // node after this one
  struct Node *prev; // node before this one
};


Compared with singly linked lists, double linked lists:

The implementations for functions to manipulate doubly linked lists would be similar to the singly linked lists functions, with the additional code for dealing with the previous pointer.

To summarize, linked lists:

Why not do away with arrays and use linked lists for all lists?