The Heap - Operations and Usage

In this tutorial, we are going to discuss one special Data structure called Heap. We will see what are the operations supported by it and how we can use them in solving our problems.

Algorithms and Data structures are used in combination to solve any programming problem which in turn solve some real life problem. In our previous tutorials, we solved the sorting problem with the use of some trivial solutions and some subtle solutions. You may not have realized but, we were using Data structure along with our algorithms in these previous solutions also.Yes, you may have guessed it right, an Array! So, why didn't we discuss the array before and why do we now want to discuss this data structure, Heap? It's because in the previous solutions data structure used was a trivial one. Why is it so? Because an array supports very basic operations and does not hold any specific property which could help us solve the sorting problem. So, what are the operations an array supports? It is Insertion and Deletion at any index. But it doesn't hold any ordering property in itself to assist us in solving any problem. Hence, we used it as a means to store the data, nothing more.

Now we move towards the Heap data structure, it's definition and the operations supported by it. But before that let us define some real world problem which can be solved by the use of Heap data structure.

Heap

Problem Statement:

Let us suppose, we are given a bunch of events (let us say \(N\)) with the timestamp or priority assigned to them. We have to schedule these events according to their timestamp or priority. These events can be anything in real life, for example - a queue of patients in a clinic with different problems thus having different priorities, or as a Programmer, you are asked to implement a Discrete Event simulation of some complex system. For simplicity let us say that \(N\) events are represented as \(A[]\)

Solutions:

The first thought that comes to mind for solving this problem is, that we choose the event with Minimum timestamp or highest priority and execute it. After its execution is completed repeat the procedure until all events have been executed. This is very naive approach and we know that there must be better ones too. Let us, for a second, analyze how efficiently it solves our problem. So there are \(N\) events and to find the minimum among them we will have to traverse all elements at least once. So in first iteration we have \(N\) operations, in next iteration we have \(N-1\) operations and so on. So the number of operations in this solution are \(N+(N-1)+ (N-2)+ \dots +1 = \frac{N*(N+1)}{2} = O(N^2)\).

Now it has a time complexity of the order \(N^2\) which is not good. So, what is the second thought that comes in mind to improve this? Sort \(N\) events with timestamp or priority as a Key, and execute the events in order after sorting. We have already seen that sorting can be done with the time complexity of \(O(NlogN)\). As of now, it seems like a pretty good solution and in fact, to find the order among N items we can't do better than \(O(NlogN)\) so we are good to go and have solved our problem efficiently!

But wait for a moment! What if we added one more condition in Problem statement? What if a new event with any timestamp or priority may be added to the dataset at any moment? If you think about this requirement, it is not superficial as new patients may come into the hospital at any time or any new event may be added to you Discrete Event Simulation program at any time. Now with this new scenario, let us see how well we perform with our current solution. So, to sort \(N\) items we will have \(O(NlogN)\) operations but each time a new item is added to the list we will have to sort the whole list again! Let just say that after executing the first item, a new item is added to the list making its size \(N\) again, and to sort this list again, \(O(NlogN)\) operations will have to be performed. At first, this penalty may not seem much but as a programmer, we should try to reduce \(O(NlogN)\) operations during each insert.

Before proceeding further we want you to think about the requirements of our problem:

  • We need some data structure which can provide us the event with minimum Key out of all existing events without much penalty.
  • We need to add events dynamically in the data structure with time penalty less than \(O(NlogN)\) which also don't affect our first requirement.

The Heap data structure fulfills the above two requirements and according to Wikipedia definition of Heap is :

In computer science, a heap is a specialized tree-based data structure that satisfies the heap property: If A is a parent node of B then the key of node A is ordered with respect to the key of node B with the same ordering applying across the heap. A heap can be classified further as either a "max heap" or a "min heap".

So for this problem we would use the min heap which support following operations:

  • Insertion - insert an element with cost \(O(logN)\)
  • Deletion - delete an element at given index with cost \(O(logN)\)
  • extractMin - Delete minimum key element with cost \(O(logN)\)
  • getMin - get the minimum key element with cost \(O(1)\)
  • InsertBulk - insert \(N\) elements in bulk with cost \(O(N)\)

Correspondingly, here is our solution which uses min heap as data structure instead on an array, in the following order.

  • Bulk insert of N elements using InsertBulk.
  • get event by extractMin and execute it.
  • follow step 2 until all events have been executed.
  • If new events occur add it to heap using Insert.

Following is the code snippet for your reference:

// Create a min heap with capacity 2N
MinHeap h(2*N);
// bulk insert the N elements
h.bulkInsert(A, N);
int event = h.extractMin();
while(event != NULL)
{
// execute the events   
event = h.extractMin();
}

Also, there would be some worker thread which will call insert function of heap whenever a new event occurs.

With this, we wrap up our solution for now, but we do owe you some explanation about the inner working of Heap and how it achieves the above-stated operations with the given cost. We'll keep that for another tutorial. For now, by just changing the underlying data-structure, we made our solution robust which can take care of dynamic event insertion with little cost as much as \(O(logN)\). Apart from this application, Heap can be used whenever we want to have the Minimum or Maximum element and Insert or Delete operation. The only downside of Heap is that it does not support the searching operations. Heap is also used in Sorting, Selection Algorithms and Graph Algorithms such as Prim's minimal-spanning-tree algorithm and Dijkstra's shortest-path algorithm and we will try to talk about them in future tutorials.