Balanced Binary Search Trees

Rotating nodes (review)- The primary operation used in balancing BSTs is the rotate operation.
- The idea is to rotate the tree around a particular node's position. (Pivot point)
- There are two types of rotations: left and right.
- Promoting a node is the same as rotating around the node's parent. (There is no direction, promotion is unambiguous.)
- Note an important property of rotation: after the rotation, the sort order is preserved. This is important, because the resulting tree must still be a BST.

Rotate right around the root, **S**. (Same as promoting **M**)

Rotate left twice around the root. First around **1**, then around **3**.
(Same as promoting **3** and then **6**)

Using the defintions from before. **Note the parameter to each function.**

Follow the four lines of code in this example.

Rotating a tree right Rotating a tree left voidRotateRight(Tree &tree) { Tree temp = tree; tree = tree->left; temp->left = tree->right; tree->right = temp; }voidRotateLeft(Tree &tree) { Tree temp = tree; tree = tree->right; temp->right = tree->left; tree->left = temp; }

1. temp = Tree; // temp ===> S 2. tree = temp->left; // Tree ===> M 3. temp->left = tree->right; // temp->left ===> P 4. tree->right = temp; // tree->right ===> S

Adjusting the diagram:

A Simple Balanced Tree

One type of balanced tree is the AVL tree. (Two Russian Mathematicians, Adel'son-Vel'skii and Landis).- An AVL tree is essentially a balanced binary search tree (BST).
- The insert and delete operations are more complicated (need to maintain the balanced property).
- Still, fairly simple to understand and implement.
- Worst case for searching is now
*O(lg N)*, which is very good.

- Insert the item into the tree using the same algorithm for BSTs. Call this new node
*x*.- While traversing the tree looking for the appropriate insertion point for
*x*, push the visited nodes onto a stack. (Actually, you are pushing pointers to the nodes.) It is not necessary to push the new node onto the stack.

- While traversing the tree looking for the appropriate insertion point for
- Check if there are more nodes on the stack.
- If the stack is empty, the algorithm is complete and the tree is balanced.
- If any nodes remain on the stack, go to step 3.

- Remove the top node pointer from the stack and call it
*y*. - Check the height of the
**left**and**right**subtrees of*y*.- If they are equal or differ by no more than 1 (hence, balanced), go to step 2.
- If they differ by more than 1, perform a rotation on one or two nodes as described below. After the rotation(s), the algorithm is complete and the tree is balanced.

- Check the height of the
**left**and**right**subtree of*y*to find which is greater.- This psuedocode will assume that the
**right**subtree is greater. (The algorithm is symmetric, so you should be able to modify it slightly for the case when the height of the**left**subtree is greater.)

- This psuedocode will assume that the
- For clarity, we refer to the
**right**child of*y*as*u*. We'll refer to the**left**child of*u*as*v*, and the**right**child of*u*as*w*. - Determine the height of the subtree rooted at
*v*and the height of the subtree rooted at*w*.- If
*h(v)*is greater, perform a**right**rotation about*u*and a**left**rotation about*y*. - Else perform a
**left**rotation about*y*. (This handles the case when*h(v)*and*h(w)*are equal. The reason for this is that it is more efficient to do a single rotation, rather than a double rotation, when possible.)

- If
- The tree is now balanced after an insertion.

Removing an element is very similar to the insertion algorithm. While we are searching for the node to delete, we are pushing the visited nodes onto a stack. The only difference is that at step 4b above, we modify it to say this:

- If they differ by more than 1, perform a rotation on one or two nodes as described below. After the rotation(s),
goto to step 2 (in the insertion pseudocode).When deleting a node, you must walk all the way back up to the root to ensure that all subtrees are balanced.

Given this tree: | After inserting 7 | On the stack | ||
---|---|---|---|---|

8 9 5 |

Balanced subtrees | Balanced subtrees | Unbalanced subtrees | ||
---|---|---|---|---|

Since *h(v)* is greater than *h(w)*, perform a right rotation about *u* (promote *v*)
and a left rotation about *y* (promote *v* again):

Example of deleting from an AVL tree that requires multiple rotations.

**Self Check:** Draw the AVL tree at each step as you insert these values: **5 2 9 8 12 15 17 19 25** (as letters: **EBIHLQSY**)

```
```*// Client calls this instead of InsertItem*
**void** InsertAVLItem(Tree &tree, **int** Data)
{
stack<Tree *> nodes;
InsertAVLItem2(tree, Data, nodes);
}
*// Auxiliary function with the stack of visited nodes*
**void** InsertAVLItem2(Tree &tree, **int** Data, stack<Tree*>& nodes)
{
**if** (tree == 0)
{
tree = MakeNode(Data);
BalanceAVLTree(nodes); *// Balance it now*
}
**else** **if** (Data < tree->data)
{
nodes.push(&tree); *// save visited node*
InsertAVLItem2(tree->left, Data, nodes);
}
**else** **if** (Data > tree->data)
{
nodes.push(&tree); *// save visited node*
InsertAVLItem2(tree->right, Data, nodes);
}
**else**
cout << "Error, duplicate item" << endl;
}
**void** BalanceAVLTree(stack<Tree *>& nodes)
{
**while** (!nodes.empty())
{
Tree *node = nodes.top();
nodes.pop();
*// implement algorithm using functions that*
*// are already defined (Height, RotateLeft, RotateRight)*
}

2-3 Search Trees

- Each node can contain 1 or 2 keys.
- Each node has 2 or 3 children, hence 2-3 tree.
- The keys in the nodes are ordered from small to large.
- All leaves are at the same (bottom most) level, meaning we always add at the bottom.
- Balance is maintained after every insertion using a very simple balancing algorithm. (Splitting nodes)
- Searches are O(lg N) in worst case.
- Number of splits in worst case is O(lg N), but about O(1) on average.

structNode23 { Node23 *left, *middle, *right; Key key1, key2; };

2-node (not showing empty) 2-node (showing empty) 3-node

An example using bottom-up balancing (splitting):

Inserting

- Adding
**O**to the tree is slightly more complicated - It causes the 3-node
**M,N**to overflow to**M,N,O** - Split the node into two 2-nodes,
**M**and**O** - Pass up the middle,
**N**

4-node, not valid in 2-3 tree Three 2-nodes →

- Passing up
**N**causes the 3-node**L,P**to overflow to**L,N,P** - Split the node into two 2-nodes,
**L**and**P** - Pass up the middle,
**N**

- Passing up
**N**causes the 2-node**J**to become a 3-node,**J,N** - No more overflow/splits

**Self-check:** What will the tree look like after inserting **A**? After inserting **I**?
Diagrams

2-3-4 Trees

- Similar to 2-3 trees
- Each node can contain 1, 2, or 3 keys.
- Each node has 2, 3 or 4 children, hence 2-3-4 tree.
- Restated: Every node has
*at most*4 children. - All external nodes (NULL nodes) are at the same depth.
- 2-3-4 trees form the basis for B-trees (typically around 1,000 children per node),
which are used extensively in large data sets (e.g. databases, filesystems).
A
*large*data set is typically data that won't fit into memory.

- Number of splits in the worst-case is O(lg N).
- Number of splits on average is very few.

A 2-node attached to a 4-node becomes a 3-node attached to two 2-nodes:

A 3-node attached to a 4-node becomes a 4-node attached to two 2-nodes:

- Splitting a 4-node into two 2-nodes preserves the number of child links (4 links)
- Changes do not have to be propagated. Change remains local to the split.
- By
**splitting nodes on the way down**, we can guarantee that each node we pass through is not a 4-node. - When we reach the bottom, we will
*not*be on a 4-node. - This means we only traverse the tree once when inserting/balancing.
- When the root becomes a 4-node, we split it into a 2-node pointing to two 2-nodes:
- Splitting the root is what causes the tree to grow one level. (This is the only way to increase the height of the tree.)

**Self-check:** Insert the letters: **E A S Y Q U T I O N Z** into a 2-3-4 tree
splitting nodes on the way down.

**Self-check:** Insert the letters: **A S E R C H I N G X** into a 2-3-4 tree
splitting nodes on the way down.

**Self-check:** In a 2-3-4 tree, what is the maximum number of children a node can have?
What is the minimum number? What is the (worst-case) complexity to find an item in a 2-3-4 tree?

**Deletion**

Deleting an element in a 2-3-4 tree is analogous to inserting. The difference is that
we will *grow* nodes on the way down instead of *splitting* them. (We need to ensure that
there will be no *underfull* nodes when we try to delete an item.)

The idea is intuitive, but writing the algorithm down in English seems to make it look/sound harder than it is.

Again, when dealing with trees, there are different cases. Here, there are 3 different cases:

- If the element,
*k*is in the node and the node is a*leaf*containing at least 2 keys, simply remove*k*from the node. - If the element,
*k*is in the node and the node is an*internal node*perform*one*of the following:- If the element's
*left*child has at least 2 keys, replace the element with its predecessor,*p*, and then recursively delete*p*. - If the element's
*right*child has at least 2 keys, replace the element with its successor,*s*, and then recursively delete*s*. - If both children have only 1 key (the minimum), merge the right child into the left child and include
the element,
*k*, in the left child. Free the right child and recursively delete*k*from the left child.

- If the element's
- If the element,
*k*, is not in the internal node, follow the proper link to find*k*. To ensure that all nodes we travel through will have at least 2 keys, you may need to perform one of the following before descending into a node. Then, you will descend into the corresponding node. Eventually, case 1 or 2 will be arrived at (if*k*is in the tree).- If the child node (the one being descending into) has only 1 key and has an immediate sibling with at least 2 keys, move an element down from the parent into the child and move an element from the sibling into the parent.
- If both the child node and its immediate siblings have only 1 key each, merge the child node with one of the siblings and move an element down from the parent into the merged node. This element will be the middle element in the node. Free the node whose elements were merged into the other node.

Red-Black Trees

Invented by Guibas and Sedgewick in 1978. It is the data structure used for implementing maps and sets in C++'s Standard Template Library (rb_tree_.hpp) and also used to implement the Completely Fair Scheduler algorithm in Linux.- A BST is a form of 2-3-4 tree in that all nodes are always 2-nodes.
- Red-Black trees are BSTs (all nodes are 2-nodes)
- Implementation-wise, a Red-Black tree will
*inherit*from a BST since most of the functionality (insert, find, rotate, etc.) is in the base class. - Red-Black trees are used to represent 2-3-4 trees.
- The "-3-4" nodes of the 2-3-4 tree are "encoded" in the nodes.
- This encoding is represented in the node as the node being either
**RED**or**BLACK**.

- Red-Black trees are BSTs, so the standard search methods (and other algorithms) for BSTs work as-is.
- They correspond directly to 2-3-4 trees, so they are always balanced.
- This means that searching, inserting, and re-balancing are all O(lg N). Sweet!
- The insertion/rebalancing algorithm is fairly simple. However, coming up with the algorithm is not.

- A Red-Black tree is a binary search tree.
- Each node has an additional attribute, a color, which is either
**RED**or**BLACK**. - A node in a Red-Black tree also contains a link (back pointer) to the parent, which helps to keep the algorithms simpler. (We need access to the grandparent.)
- A possible data structure for a node of a Red-Black tree could be:

Note that the terms "RED" and "BLACK" are completely arbitrary. The inventors could have simply used "GREEN/YELLOW", "A/B", "TRUE/FALSE", "TOM/JERRY". These terms are simply tags to distinguish between the two types of nodes.enumCOLOR { rbRED, rbBLACK };structRBNode { RBNode *left; RBNode *right; RBNode *parent; COLOR color; Data data;// Could be any data type};

A Red-Black tree is a Binary Search Tree with the additional properties of color:

- Each node is marked as
**RED**or**BLACK**. - Newly inserted nodes are marked as
**RED** - NULL nodes (empty children) are marked as
**BLACK**. - If a node is
**RED**, then its children must be**BLACK**.- This means that two
**RED**nodes are never adjacent on a path. - There is no limit on the number of
**BLACK**nodes that may appear in sequence.

- This means that two
- Every path from a node to any of its leaves contains the same number of
**BLACK**nodes. - The root of the tree is
**BLACK**. Technically, the root may be**RED**, but to keep the algorithm simple and ensure that everyone's trees look identical, we'll require the root to be**BLACK**.

The **RED** condition:

EachTheREDnode has aBLACKparent.

Each path from an internal node to every external (leaf) node contains exactly the same number ofMapping 2-3-4 Trees into Red-Black TreesBLACKnodes.

More Red-Black Tree Details

The implementation complexity with Red-Black trees arises when an insertion destroys the Red-Black properties that must hold for Red-Black trees. After such an insertion, we must restore the Red-Black properties as above.

- The specific condition that is violated is when an insertion causes two
**RED**nodes to be adjacent. - This is because newly inserted nodes are always marked as
**RED**, and if the parent is**RED**, we have a "situation."

We will use these letters in our diagram:

- C = Child
- P = Parent
- G = Grandparent
- U = Uncle

- Child and Parent are
**RED**and Uncle is**BLACK** - Grandparent
*must*be black because tree was valid Red-Black before insertion - Two possible orientations with the grandparent:

Questions you should be able to reason about:

- What color is the root of subtrees 1, 2, and 3?
- What is the color of G's parent?
- What color is the root of subtrees 4 and 5?
- How does the number of black nodes along any path in G's left subtree compare with the number of black nodes along any path in G's right subtree?

- Rotate Grandparent (promote parent) (G becomes child of P)
- Set Grandparent to
**RED**and Parent to**BLACK** - Changes were local so we are done. (Doesn't affect nodes above.)

Transformation:

- Rotate Parent left, then rotate Grandparent right (promote node, promote node)
- Set Grandparent to
**RED**and Child to**BLACK** - Changes were local so we are done. (Doesn't affect nodes above.)

- Child and Parent are
**RED**and Uncle is**RED** - Grandparent
*must*be black because tree was valid Red-Black before insertion - Two possible orientations with the grandparent:

Transformation:

- Set Grandparent to
**RED**, set Parent and Uncle to**BLACK**(flip colors of the 4-node) - Changing G to
**RED**may affect G's parent, so we need to continue up the tree

Transformation:

- Set Grandparent to
**RED**, set Parent and Uncle to**BLACK** - Changing G to
**RED**may affect G's parent, so we need to continue up the tree.

Diagrams and pseudocode in a PDF document.

Given the Red-Black tree below, insert the values 4, 6, 9, and 10 into it, re-balancing and re-coloring as necessary.

Resulting trees from inserting the values above. (Letters: KBNAGOEH DFIJ)

**Self-check** Draw the resulting Red-Black tree from inserting the letters: **E A S Y Q U T I O N**

Resulting tree from inserting the letters above.

**Self-check:**

What is the (worst-case) complexity to *insert* an item into a red-black tree?
To *delete* an item? To *find* an item?

What is the *"Red property"*?

What is the *"Black Property"*?

Why is it important to know the *orientation* a node has to its grandparent (i.e. either
*zig-zag* or *zig-zig*)?

Examples of Inserting Sorted/Unsorted Data

2-3-4 Tree: 1 2 3 4 5 6 7 8 | Red-Black Tree: 1 2 3 4 5 6 7 8 | BST: 1 2 3 4 5 6 7 8 | ||
---|---|---|---|---|

2-3-4 Tree: 8 7 6 5 4 3 2 1 | Red-Black Tree: 8 7 6 5 4 3 2 1 | BST: 8 7 6 5 4 3 2 1 | ||

2-3-4 Tree: 2 7 5 6 1 4 8 3 | Red-Black Tree: 2 7 5 6 1 4 8 3 | BST: 2 7 5 6 1 4 8 3 | ||

- Red-Black trees are BSTs, so standard BST search algorithms work as-is.
- They correspond directly to 2-3-4 trees, so they remain (approximately) balanced after inserting.
- There is less work during searches because no balancing is done (split on the way down). We only balance if we add a node.
- The insertion/rebalancing algorithm is fairly simple.
- Searching, inserting, and re-balancing are all O(lg N).
- Red-Black trees ensure the underlying 2-3-4 tree is balanced. This means:
- The corresponding 2-3-4 tree is
*exactly*balanced and requires at most*lg N*comparisons to reach a leaf. The worst case complexity, then, is*O(lg N)*. - The Red-Black tree is
*approximately*balanced and requires at most*2 lg N*comparisons to reach a leaf. The worst case complexity, then, is*O(lg N)*. On average, the number of comparisons is*1.002 lg N*.

- The corresponding 2-3-4 tree is

BTree (2-3-4 tree) visualization (Set Max. Degree to 4 and check Preemptive Split/Merge)

**Self-check:**
Insert the letters `ABCDEFGHIJKLMNOPQRSTUVWXYZ` into a 2-3-4 tree and compare it to a red-black tree
with the same data. What do you see?

"In my own tests, the performance of AVL trees versus red-black trees depends on the input data. When the input data is in random order, red-black trees perform better because they expend less effort trying to balance a tree that is already well balanced. When the input data is pathological (e.g. in increasing order), AVL trees perform better because they produce trees with smaller average path length. The choice between AVL and red-black trees should therefore be made based on expectations of typical input data."Another possible way to look at it is if the data is read only (e.g. a dictionary, no inserts/deletions), using an AVL tree will result in faster lookups because it will be guaranteed to be exactly balanced.