Heaps

Introduction to Heaps

A heap is a complete binary tree where each child's value (key) is less than (or equal) to it's parent value (key).
Recall that a complete tree is a balanced tree and that all nodes on the bottom level are stored as far left as possible:

Complete Binary Tree Heap
Any node's value in the tree is larger than or equal to all of its children's values.
A tree is said to be heap-ordered if it has these properties.
We'll see that a heap can be implemented in structures other than binary trees.
Technically, you could heap-order any tree, but the implementations will result in significant improvements if we require a complete tree.

The heap property

The benefit of a heap is that the largest value is always at the root.

Locating this value can be done in O(1) (constant) time.
Like stacks and queues, has push and pop.

We typically want to remove the largest value (the root).

We must replace the root with another node, but which one?

The choice of the replacement node dictates the complexity of the implementation.
Choose a node that has little or no effect on the structure of the tree.

This will be the last node in the tree, which will preserve the completeness of the tree.

Replacing the root will likely cause the heap property to be lost, so we must heapify the tree. (Much like balancing other trees after an insertion or deletion.)

Note that a heap is not simply a "sorted" tree or linked list. The largest value is first (root), but the rest of the tree may (and usually will) not be completely ordered and may contain duplicates. (Children are less than or equal to their parent.)

This is a desirable property when we're only interested in the largest value. Much easier to simply locate the largest item rather than to sort the entire structure. This is the idea behind priority queues.

Priority Queue example using arrays and linked lists. The complexity of the priority queue is directly related to the data structure used to represent it. Some algorithms that make use of priority queues are Dijkstra's algorithm, Kruskal's algorithm, and the A* (A-Star) algorithm.

Example:

This tree is a heap because it has the properties described above:

Removing the largest element

Easy to do because it's at the root.
This leaves a vacancy at the root that must be filled.

Fill this vacancy with the last node in the tree.
It is safe to remove the last node because it will never have any children. (It's a leaf, afterall)

It also preserves the completeness of the tree.

If the tree no longer has the heap property, we need to restore that property.

To heapify the tree after removing the root and replacing it, simply swap the root with the larger of its children:

Continue this process until the swapping no longer results in a child node being larger than its parent:

The tree now has the heap property again.
What is the cost of heapifying the tree? (i.e. How many swaps in the worst case?)

By definition, heaps (complete trees) are guaranteed to be balanced.

Note that removing the last element (right-most node on bottom level) doesn't destroy the complete-ness of the tree.

Removing the largest element again

After removing the root node R from the previous tree, we have to heapify the tree.

Step 1: Step 2:

Step 3: Step 4:

Inserting into a heap

Inserting F into the tree causes no problems. The tree maintains the heap property. Inserting into the right-most position on the bottom level preserves the completeness as well.

Inserting W into the tree causes the heap property to be lost: (W > B)

Heapifying the tree is similar to how it was done before except that we will work from the bottom up (instead of the top down).

W is larger than B, so we swap the nodes:

W is larger than L, so we swap the nodes:

W is larger than R, so we swap the nodes.

We are now at the root so we are done and the heap property has been restored.
Since we traverse the tree in both directions (top-bottom, bottom-top), we might want to use a parent pointer (back pointer) for efficiency (depending on the implementation).
The worst-case time for heapifying after insertion is the same as removal.

Self-check: Draw the tree after removing W from the last tree above. Then, draw the tree after inserting A, N, and P into the tree.

Elephant in the room: How do we find the last element in the tree?

Numbering the nodes in the tree using a level-order traversal:
What do you notice about the children and their numbers?

Growth-rate comparison

Implementing Complete Binary Trees Using Arrays

Complete binary trees have a very nice property that they can be easily represented by compact arrays.
These arrays are time- and space-efficient.
This means that finding the last node in the tree/array takes O(k) time.

This is even better than O(lg N) time.

Also, there is no wasted space in the array. (The arrays are packed, not sparse).

Example:

Again, if we number the nodes by position in the complete binary tree using level-order (breadth-first) traversal we arrive at these values:

Think of these positions as being an "index" into the tree.
The root is at index 1, the left child of the root is at index 2, the right child of the root is at index 3, etc.
The key as to why this is efficient is in recognizing the relationship between a parent and its children.

This leads to the following discoveries: Given the "index", i, of a node,
- the node's left child is at index 2i
- the node's right child is at index 2i + 1
- the node's parent is at index i / 2 (using integer division)
- the node is a leaf if 2i > N, where N is the number of nodes in the tree.
This allows us to find all of this information in constant time.

For example, node J at index 5:
- The left child is at index 2i (10).
- The right child is at index 2i + 1 (11).
- The parent is at index i / 2 (2, integer division).
- The node is not a leaf because 2i (10) is less than 13.

Given this parent-child relationship, we can simply represent the complete binary tree above as an array.

We will leave the first slot empty to make the arithmetic easier.

The root of the tree is V and its left and right children are M and R, respectively:

Node J has left and right children A and D, respectively:

Node L is a leaf and has no children: 2i > N (i = 7 and N = 13)

Visualizing Trees vs. Arrays

Removing the root

Original heap:

Remove largest element:

Move last node to root:

R is larger than M, so swap B and R:

L is larger than H, so swap B and L:

B is a leaf, (2 * 7 > 12), so we're done.

Inserting into the heap

Insert F, heap property is preserved (F < H)

Insert W, heap property is violated (W > B)

Swap W and B, heap property is still violated:

Swap W and L, heap property is still violated:

Swap W and R. W is at the root, so we are done.

Notes:

Using a sorted array requires O(N) copies (shifts) to maintain the sort.
Using a heap requires O(lgN) copies (swaps) to maintain the heap.
This is a huge difference.

When N is 1,000, that's 1,000 vs. 10.
When N is 1,000,000 that's 1,000,000 vs. 20.
When N is 1,000,000,000 that's 1,000,000,000 vs. 30.

Self check Implementing a heap using an array is left as an exercise for the student.

Using the STL to create a heap from an array:

std::make_heap std::priority_queue

void f1() { srand(10); std::vector<int> v; for (int i = 0; i < 20; i++) v.push_back(RandomInt(10, 99)); std::cout << "vector: "; PrintArray(&v[0], v.size()); std::cout << " heap: "; std::make_heap(v.begin(), v.end()); PrintArray(&v[0], v.size()); std::cout << "\npop, print, heapify\n"; while (!v.empty()) { // print "top", pop, re-heapify std::cout << v[0] << " "; // Expensive operation: O(N) due to shifting v.erase(v.begin()); std::make_heap(v.begin(), v.end()); } }

void f2() { srand(10); std::vector<int> v; for (int i = 0; i < 20; i++) v.push_back(RandomInt(10, 99)); std::cout << "Using a priority queue:\n"; std::priority_queue<int> pq(v.begin(), v.end()); while (!pq.empty()) { std::cout << pq.top() << " "; // Cheap operation: O(lg N), no shifting pq.pop(); } }

Output:

`std::make_heap`	`std::priority_queue`
void f1() { srand(10); std::vector<int> v; for (int i = 0; i < 20; i++) v.push_back(RandomInt(10, 99)); std::cout << "vector: "; PrintArray(&v[0], v.size()); std::cout << " heap: "; std::make_heap(v.begin(), v.end()); PrintArray(&v[0], v.size()); std::cout << "\npop, print, heapify\n"; while (!v.empty()) { // print "top", pop, re-heapify std::cout << v[0] << " "; // Expensive operation: O(N) due to shifting v.erase(v.begin()); std::make_heap(v.begin(), v.end()); } }	void f2() { srand(10); std::vector<int> v; for (int i = 0; i < 20; i++) v.push_back(RandomInt(10, 99)); std::cout << "Using a priority queue:\n"; std::priority_queue<int> pq(v.begin(), v.end()); while (!pq.empty()) { std::cout << pq.top() << " "; // Cheap operation: O(lg N), no shifting pq.pop(); } }

vector: 81  79  42  24  27  36  72  52  36  19  64  21  13  24  43  92  51  79  74  40
  heap: 92  81  72  79  64  36  43  52  79  40  27  21  13  24  42  24  51  36  74  19

pop, print, re-heapify...
92  81  79  79  74  72  64  52  51  43  42  40  36  36  27  24  24  21  19  13

Using a priority queue:
92  81  79  79  74  72  64  52  51  43  42  40  36  36  27  24  24  21  19  13

With 40,000 elements:

f1() runs in about 5.2 seconds
f2() runs in about 0.007 seconds

with 50,000,000 items, f2() runs in about 5.2 seconds.

An example showing how to use operator> with a priority queue:

std::priority_queue (with operator>)

void f2b() { srand(10); std::vector<int> v; for (int i = 0; i < 20; i++) v.push_back(RandomInt(10, 99)); std::cout << "Using PQ with operator> instead of operator<:"; // Use operator> instead of operator< (orders the elements from small to large) std::greater<int> comparor; // Construct PQ with desired sort order std::priority_queue<int, std::vector<int>, std::greater<int>> pq(v.begin(), v.end(), comparor); while (!pq.empty()) { std::cout << pq.top() << " "; pq.pop(); } }

Output:

`std::priority_queue (with operator>)`
void f2b() { srand(10); std::vector<int> v; for (int i = 0; i < 20; i++) v.push_back(RandomInt(10, 99)); std::cout << "Using PQ with operator> instead of operator<:"; // Use operator> instead of operator< (orders the elements from small to large) std::greater<int> comparor; // Construct PQ with desired sort order std::priority_queue<int, std::vector<int>, std::greater<int>> pq(v.begin(), v.end(), comparor); while (!pq.empty()) { std::cout << pq.top() << " "; pq.pop(); } }

Using PQ with operator> instead of operator<:
11  15  27  27  28  41  45  48  58  58  68  70  73  81  84  87  93  93  95  98

Final thoughts:

Implementing heaps using arrays may be more cache-friendly than trees.

Although, copying data could be expensive.

Less memory is required because no pointers are needed (the indexes are implicit).

Don't need to dynamically allocate each node.

Heaps form the logic behind Heapsort, (Go figure), which has worst-case O(N lg N)

Self check: Fill in the array from the tree below such that it will allow efficient (constant-time) access to a node's children and parent. Yes, there will be "holes" in the array for the missing nodes. Note: This isn't a heap, just a mapping of tree nodes into array slots.

Tree:

Array: