Divide and Conquer Algorithms - Merge Sort

In the first tutorial of our Divide and Conquer Algorithms series, we will analyse the Merge sort algorithm. Before moving forward let us formalise the input/output of our algorithm.

INPUT: An array \(A[]\) containing \(N\) elements.

OUTPUT: A sorted array \(A[]\) containing \(N\) elements.

As it follows the Divide and Conquer approach, it will have three stages of sorting i.e. Divide, solve and combine. So let us formalize the stages involved in Merge Sort.

The first step would be to divide the problem into two or more than two sub-problems. As already stated, the input is an array containing N elements, this algorithm divides the array into two sub-arrays containing half of element in each of them, i.e. \(N/2\). So division step is straight forward; just find the mid element in the array and divide it.

The second step would be to solve these subproblems recursively until we reach to the base case where the solution is direct. Now consider the following statement:

An array containing a single element is sorted in itself.

So according to this statement our Base case would be the array containing the single element. Now apply the division until we reach to the base case. Following is the image showing partition involved in the merge sort showing first and second phase combined.

Image showing partition

void sort(a,lo, hi)
{
    if(hi <= lo)
        return;
    mid = lo + (hi-lo)/2;
    sort(a,lo,mid);
    sort(a,mid+1,hi);
    // add code for merging
}

The third step would be to combine the solution of individual sub-problems to formulate the overall solution to our problem. This is the most important step in the Merge sort algorithm and this is where the magic happens. So after we reach to the base case, we will have to combine two arrays having the single element in each of them, in such a way that resulted array is sorted in itself. To do that we need extra space for two elements to store. After the base case two arrays containing 2 elements would be merged and so on. In the same way, when the last merging will happen we would need extra space of \(N\) to merge two array having \(N/2\) elements in each, thus leading to \(O(N)\) space complexity. Code for merging of two arrays is shown bellow.

So for Merging, we need an extra array of size N. Change the code accordingly.

void merge(a,aux, lo, mid, hi)
{
    // copy required element to aux
    
    for(k=lo;k<=hi;k++) ------------------------(1)
        aux[k] = a[k];

    // merge
    i= lo;     ---------------------------------(2)
    j = mid+1;
    for(k=lo;k<=hi;k++) ------------------------(3)
    {
        if(i>mid)         a[k] = aux[j++]; // i has been exausted just copy remaining element of j
        else if(j>hi)     a[k] = aux[i++]; // j has been exausted just copy remaining element of i
        else if(less(aux[j],aux[i]))     a[k] = aux[j++];
        else              a[k] = aux[i++];
    }
}

Now it's bit early but let's find out the number of operations it takes to complete the merging procedure if the Input is two arrays having \(N/2\) elements.

In this case lo = 0 and hi = N then,

\(Operations = Step1 + step2 + step3 \)

\(Operations = N + 2 + 4*N\)

\(Operations <= 6*N\)

\(Operations = O(N)\)

So, in a nutshell, we have following plan for sorting the array

a) Divide the array into two halves.
b) Recursively sort each half (repeat step 'a' if elements are more than 1 ).
c) Merge the two halves.

Image showing merge sort

void sort(a,aux,lo, hi)
{
    if(hi <= lo)
        return;
    mid = lo + (hi-lo)/2;
    sort(a,aux,lo,mid);
    sort(a,aux,mid+1,hi);
    // add code for merging
    merge(a,aux,lo,mid,hi);
}

void sort(a,N)
{
    aux = new char(N)
    sort(a,aux,0,N-1);
}

If we will have to define the merge sort in a mathematical formula, what would that be?
Let us say that \(D(N)\) is total cost of divide-->sort-->combine operation on \(N\) elements then:

\(D(N) = 2D(N/2) + N for all N > 1, with D(1)= 0\)

D(N/2) --> cost of applying merge sort in half of array.
N --> Merge two array having N/2 elements

Proposition: If D(N) satisfies, \(D(N) = 2D(N/2) + N for all N > 1, with D(1)= 0\), then \(D(N) = NlgN\).

Graphical proof for the performance (assuming N is power of 2)

Merge sort analysis

We already know that at any particular level for one subproblem there are two recursive calls and there is one merge operation. We have already established that for INPUT size N merge takes at most 6N operations so if we want to examine how may operations, are there at any level say j then it can be found as follows:

Number of operation at level J = No of subproblems * operations at one subproblem

\(Operations = 2^j * 6*N/ 2^j\)

\(Operations = 6*N\)

So it seems that at one level we are doing \(6N\)operations, now there are levels ranging, from \(0\) to \(log_2 N\) so total number of operation are:

\(Operations = 6*N(log_2N + 1)\)

\(Operations = 6Nlog_2N + 6N\)

\(Operation = O(Nlog_2N)\)

Proof by induction:

Base case: N=1.
Inductive hypothesis: D(N) = NlgN.
Goal: D(2N) = 2Nlg(2N).

D(N) = 2D(N/2) + N
D(2N) = 2D(N) + 2N
D(2N) = 2NlgN + 2N
D(2N) = 2N(lg(2N) - 1) + 2N ----- (log 2 of base 2 = 1 thus lg(2N) = lg2 + lgN = 1+ lgN --> lgN = lg(2N) - 1 )
D(2N) = 2Nlg(2N)

It uses at most \(Nlog_2N\) compares and \(6Nlog_2N\) array accesses to sort any array of size N.

Improvements:
Merge sort has too much overhead for tiny subarrays
1. Cutoff to insertion sort for elements less than 7. (almost 20% faster)

void sort(a,aux,lo, hi)
{
    if(hi <= lo + CUTOFF -1)
    {
        InsertionSort(a,lo,hi);
        return;
    }
        
    mid = lo + (hi-lo)/2;
    sort(a,aux,lo,mid);
    sort(a,aux,mid+1,hi);
    // add code for merging
    merge(a,aux,lo,mid,hi);
}

2. Before Merge procedure check if subarrays are already sorted, i.e. Biggest item in first half <= smallest item in second half

void sort(a,aux,lo, hi)
{
    if(hi <= lo + CUTOFF -1)
    {
        InsertionSort(a,lo,hi);
        return;
    }
        
    mid = lo + (hi-lo)/2;
    sort(a,aux,lo,mid);
    sort(a,aux,mid+1,hi);
    if(!less(a[mid+1],a[mid]))    return; // first element of second array is not less than last element of first array.
    merge(a,aux,lo,mid,hi);
}

3. Switch the role of \(a[]\) and \(aux[]\) in sorting and merging. (don't have to copy the to aux)

void merge(a,aux, lo, mid, hi)
{
    // NO copy
    
    // merge
    i= lo;
    j = mid+1;
    for(k=lo;k<=hi;k++)
    {
        if(i>mid)                        aux[k] = a[j++]; // i has been exausted just copy remaining element of j
        else if(j>hi)                     aux[k] = a[i++]; // j has been exausted just copy remaining element of i
        else if(less(aux[j],aux[i]))     aux[k] = a[j++];
        else                            aux[k] = a[i++];

    }
}
void sort(a,aux,lo, hi)
{
    if(hi <= lo)
    {
        return;
    }
    mid = lo + (hi-lo)/2;
    sort(aux,a,lo,mid);
    sort(aux,a,mid+1,hi);
    merge(a,aux,lo,mid,hi);
}

By switching the array in sort, we are assigning the aux array for each element in the merge function.

Practical usage of Merge sort are.
1. Java uses it for sorting objects.
2. Perl, C++ stable sort, Python stable sort, Firefox javascript.

In the next tutorial, we will take a look on how Quick Sort uses Divide and Conquer to solve the sorting problem.