Trees (Part 2)

Balanced Binary Search Trees

Rotating nodes (review)

Rotate right around the root, S. (Same as promoting M)

Rotate left twice around the root. First around 1, then around 3. (Same as promoting 3 and then 6)

Using the defintions from before. Note the parameter to each function.

Rotating a tree rightRotating a tree left
void RotateRight(Tree &tree)
{
  Tree temp = tree;
  tree = tree->left;
  temp->left = tree->right;       
  tree->right = temp;
}
void RotateLeft(Tree &tree)
{
  Tree temp = tree;
  tree = tree->right;
  temp->right = tree->left;
  tree->left = temp;
}
Follow the four lines of code in this example.
1. temp = Tree;              // temp ===> S
2. tree = temp->left;        // Tree ===> M
3. temp->left = tree->right; // temp->left ===> P
4. tree->right = temp;       // tree->right ===> S




Adjusting the diagram:

A Simple Balanced Tree

One type of balanced tree is the AVL tree. (Two Russian Mathematicians, Adel'son-Vel'skii and Landis). This next example shows how you can traverse back up the tree without having a pointer to your parent and without using recursion. (You could use recursion to achieve the same effect, of course.)

Pseudocode for Insertion:

  1. Insert the item into the tree using the same algorithm for BSTs. Call this new node x.
  2. Check if there are more nodes on the stack.
    1. If the stack is empty, the algorithm is complete and the tree is balanced.
    2. If any nodes remain on the stack, go to step 3.
  3. Remove the top node pointer from the stack and call it y.
  4. Check the height of the left and right subtrees of y.
    1. If they are equal or differ by no more than 1 (hence, balanced), go to step 2.
    2. If they differ by more than 1, perform a rotation on one or two nodes as described below. After the rotation(s), the algorithm is complete and the tree is balanced.

Pseudocode for Balancing:

  1. Check the height of the left and right subtree of y to find which is greater.
  2. For clarity, we refer to the right child of y as u. We'll refer to the left child of u as v, and the right child of u as w.
  3. Determine the height of the subtree rooted at v and the height of the subtree rooted at w.
    1. If h(v) is greater, perform a right rotation about u and a left rotation about y.
    2.     Else perform a left rotation about y. (This handles the case when h(v) and h(w) are equal. The reason for this is that it is more efficient to do a single rotation, rather than a double rotation, when possible.)
  4. The tree is now balanced after an insertion.

Removal:

Removing an element is very similar to the insertion algorithm. While we are searching for the node to delete, we are pushing the visited nodes onto a stack. The only difference is that at step 4b above, we modify it to say this:

Example:

Given this tree:After inserting 7On the stack
8
9
5


Balanced subtreesBalanced subtreesUnbalanced subtrees

Since h(v) is greater than h(w), perform a right rotation about u (promote v) and a left rotation about y (promote v again):


Example of deleting from an AVL tree that requires multiple rotations.

Self Check: Draw the AVL tree at each step as you insert these values: 5 2 9 8 12 15 17 19 25 (as letters: EBIHLQSY)

Partial implementation examples:


// Client calls this instead of InsertItem
void InsertAVLItem(Tree &tree, int Data)
{
  stack<Tree *> nodes;
  InsertAVLItem2(tree, Data, nodes);
}

// Auxiliary function with the stack of visited nodes
void InsertAVLItem2(Tree &tree, int Data, stack<Tree*>& nodes)
{
  if (tree == 0)
  {
    tree = MakeNode(Data);
    BalanceAVLTree(nodes); // Balance it now
  }
  else if (Data < tree->data)
  {
    nodes.push(&tree); // save visited node
    InsertAVLItem2(tree->left, Data, nodes);
  }
  else if (Data > tree->data)
  {
    nodes.push(&tree); // save visited node
    InsertAVLItem2(tree->right, Data, nodes);
  }
  else
    cout << "Error, duplicate item" << endl;
}

void BalanceAVLTree(stack<Tree *>& nodes)
{
  while (!nodes.empty())
  {
    Tree *node = nodes.top();
    nodes.pop();

    // implement algorithm using functions that
    // are already defined (Height, RotateLeft, RotateRight)

}
BST/AVL program showing balance factors and node counts.

2-3 Search Trees

A possible node structure for a 2-3 search tree:

struct Node23
{
  Node23 *left, *middle, *right;
  Key key1, key2;
};

2-node (not showing empty)2-node (showing empty)3-node

An example using bottom-up balancing (splitting):




Inserting D into the tree causes the leftmost node to grow from a 2-node to a 3-node:



4-node, not valid in 2-3 treeThree 2-nodes







What will the tree look like after inserting A? After inserting I? Diagrams

In the worst case, we have to traverse the entire tree twice. (Down to insert, up to balance)

2-3-4 Trees

Modified algorithms can produce better efficiency (e.g. splitting nodes on the way down instead of splitting from the bottom up.)

A 2-node attached to a 4-node becomes a 3-node attached to two 2-nodes:



A 3-node attached to a 4-node becomes a 4-node attached to two 2-nodes:



Deletion

Deleting an element in a 2-3-4 tree (assumes we will grow nodes on the way down.)

The idea is intuitive, but writing the algorithm down in English seems to make it look/sound harder than it is.

Again, when dealing with trees, there are different cases. Here, there are 3 different cases:

  1. If the element, k is in the node and the node is a leaf containing at least 2 keys, simply remove k from the node.

  2. If the element, k is in the node and the node is an internal node perform one of the following:
    1. If the element's left child has at least 2 keys, replace the element with its predecessor, p, and then recursively delete p.
    2. If the element's right child has at least 2 keys, replace the element with its successor, s, and then recursively delete s.
    3. If both children have only 1 key (the minimum), merge the right child into the left child and include the element, k, in the left child. Free the right child and recursively delete k from the left child.

  3. If the element, k, is not in the internal node, follow the proper link to find k. To ensure that all nodes we travel through will have at least 2 keys, you may need to perform one of the following before descending into a node. Then, you will descend into the corresponding node. Eventually, case 1 or 2 will be arrived at (if k is in the tree).
    1. If the child node (the one being descending into) has only 1 key and has an immediate sibling with at least 2 keys, move an element down from the parent into the child and move an element from the sibling into the parent.
    2. If both the child node and its immediate siblings have only 1 key each, merge the child node with one of the siblings and move an element down from the parent into the merged node. This element will be the middle element in the node. Free the node whose elements were merged into the other node.
Deletion example

Red-Black Trees

Invented by Guibas and Sedgewick in 1978. It is the data structure used for implementing maps and sets in C++'s Standard Template Library (STL) and also used to implement the Completely Fair Scheduler algorithm in Linux. Advantages of Red-Black trees Properties of Red-Black trees
enum COLOR { rbRED, rbBLACK };
struct RBNode
{
  RBNode *left;
  RBNode *right;
  RBNode *parent;
  COLOR color;
  Data data; // Could be any data type
};
Note that the terms "RED" and "BLACK" are completely arbitrary. The inventors could have simply used "GREEN/YELLOW", "A/B", "TRUE/FALSE", "TOM/JERRY". These terms are simply tags to distinguish between the two types of nodes.

A Red-Black tree is a Binary Search Tree with the additional properties of color:

Another way to state this is to focus on these two conditions:

The RED condition:

Each RED node has a BLACK parent.
The BLACK condition:
Each path from the root to every external node contains exactly the same number of BLACK nodes.
Mapping 2-3-4 Trees into Red-Black Trees

More Red-Black Tree Details

The implementation complexity with Red-Black trees arises when an insertion destroys the Red-Black properties that must hold for Red-Black trees. After such an insertion, we must restore the Red-Black properties as above.

First some terminology about our family tree:

We will use these letters in our diagram:

Situation #1

Orientation #1 (zig-zig)


Questions you should be able to reason about: Transformation: Orientation #2 (zig-zag)

Transformation:

Situation #2

Orientation #1 (zig-zig)

Transformation: Orientation #2 (zig-zag)

Transformation: Pseudocode for insertion into a Red-Black tree.

Diagrams and pseudocode in a PDF document.

Given the Red-Black tree below, insert the values 4, 6, 9, and 10 into it, re-balancing and re-coloring as necessary.


Resulting trees from inserting the values above. (Letters: KBNAGOEH DFIJ)

Self-check Draw the resulting Red-Black tree from inserting the letters: E A S Y Q U T I O N

Resulting tree from inserting the letters above.

Red-Black Demo (Empty)
Red-Black Demo (Populated)

Examples of Inserting Sorted/Unsorted Data

2-3-4 Tree: 1 2 3 4 5 6 7 8 Red-Black Tree: 1 2 3 4 5 6 7 8 BST: 1 2 3 4 5 6 7 8
2-3-4 Tree: 8 7 6 5 4 3 2 1 Red-Black Tree: 8 7 6 5 4 3 2 1 BST: 8 7 6 5 4 3 2 1
2-3-4 Tree: 2 7 5 6 1 4 8 3 Red-Black Tree: 2 7 5 6 1 4 8 3 BST: 2 7 5 6 1 4 8 3

Highlights of Red-Black Trees:

Red-Black Tree Program
BST/AVL program

Self-check: Insert the letters ABCDEFGHIJKLMNOPQRSTUVWXYZ into a 2-3-4 tree and compare it to a red-black tree with the same data. What do you see?

Red-Black vs. AVL

According to Ben Pfaff (creator of GNU libavl) in an excerpt from Google Groups on the topic "Red Black Trees Vs Skip Lists"

"In my own tests, the performance of AVL trees versus red-black trees depends on the input data. When the input data is in random order, red-black trees perform better because they expend less effort trying to balance a tree that is already well balanced. When the input data is pathological (e.g. in increasing order), AVL trees perform better because they produce trees with smaller average path length. The choice between AVL and red-black trees should therefore be made based on expectations of typical input data."
Another possible way to look at it is if the data is read only (e.g. a dictionary, no inserts/deletions), using an AVL tree will result in faster lookups because it will be guaranteed to be exactly balanced.