Quicksort

quicksort.svg

Figure 1: Using quicksort to sort the letters of the word “QUICKSORT”. Pivot items are underlined.

Description

Quicksort, like mergesort, is another divide and conquer sorting technique. The general strategy of quicksort is as follows:

  1. Given an array, choose an element to act as the “pivot” which will be used to partition the array.
  2. Separate the array in two, forming a group of elements smaller than the pivot and a group larger than the pivot.
  3. Recursively partition these two groups in the same manner as the prior steps.
  4. Combine all the partitions (no work is required as they are all sorted).

The graph above shows how quicksort would sort the letters of the word “QUICKSORT”. Note that when partitioning has completed (only 1 letter arrays remain), the highlighted leaf nodes end up in sorted order when read left-to-right. This illustrates how the partitioning routine of quicksort is responsible for sorting, whereas the final merge step is trivial.

Below, a numerical array is sorted with quicksort. Highlighted elements are undergoing sorting, with the pivot element highlighted a different color. A horizontal line represents the value (height) of the pivot. Note how on each iteration, values to the left of the pivot sit below the line, whereas values to the right sit above. The elements to the left of the pivot are not themselves sorted, but they are all less than the pivot. The reverse is true for those elements on the right.

quicksorter.gif

Figure 2: Using quicksort to sort a numerical array. Highlighted elements are being sorted, with the height of the pivot represented as a horizontal line.

Implementation

quick_sort facilitates the recursive partitioning, while swap is a helper to cleanly move elements around.

def quick_sort(array, low_idx, high_idx):
    """
    Sort an array using the quick sort algorithm.

    >>> quick_sort([6, 7, 13, 1, 12, 8, 14, 2], 0, 7)
    [1, 2, 6, 7, 8, 12, 13, 14]
    """
    if low_idx < high_idx:
        _, pivot_idx = partition(array, low_idx, high_idx)
        quick_sort(array, low_idx, pivot_idx - 1)
        quick_sort(array, pivot_idx + 1, high_idx)
    return array


def swap(array, first_idx, second_idx):
    """
    Swap two elements in an array.

    >>> swap([13, 31], 0, 1)
    [31, 13]
    """
    array[first_idx], array[second_idx] = array[second_idx], array[first_idx]
    return array

As mentioned above, the partitioning of the array does the actual sorting work, thus all the interesting details are in partition. This function selects the last element of the given (sub-)array as the pivot and traverses the rest of the (sub-)array, comparing the pivot to each value. If an encountered value is less than the pivot, it is swapped to the end of the left side (i.e. values less than the pivot) of the values being partitioned. Two indices are involved, one to iterate over the array, and the other to keep track of the last value encountered that is less than the pivot, used in swapping.

def partition(array, low_idx, high_idx):
    """
    Partition array into arrays of items smaller and larger than pivot.

    >>> partition([2, 8, 7, 1, 3, 5, 6, 4], 0, 7)
    ([2, 1, 3, 4, 7, 5, 6, 8], 3)
    """
    i = low_idx - 1
    pivot = array[high_idx]
    for j in range(low_idx, high_idx):
        if array[j] <= pivot:
            i += 1
            swap(array, i, j)
    swap(array, i + 1, high_idx)
    pivot_idx = i + 1
    return array, pivot_idx

Complexity

Choosing a partition has been glossed over thus far, but plays a central role in the performance of quicksort. As presented here, the pivot is always assumed to be the last element of the array under investigation. There exist many variations on this, including assuming the first, middle, or taking a random element as a pivot.

Worst case performance occurs when the chosen pivot results in one empty and one size \((n - 1)\) array after partitioning. This would result in \(O(n^2)\) behavior. Conversely, the best case is when partitioning results in two balanced arrays yielding \(O(n \ln{n})\) performance. In reality, the average expected performance tends more toward the best case.

Tracking Sorting Progression

The following class is used to track the progression of sorting which can be used to visualize how sorting evolves.

import pandas as pd

class QuickSorter(object):
    STATUS_NONE = "none"
    STATUS_SORTING = "sorting"
    STATUS_PIVOT = "pivot"

    def __init__(self, data):
        self._data = data
        self._snapshots = []
        self._sorted = False
        for idx, value in enumerate(self._data):
            self._snapshots.append(
                {
                    "position": idx,
                    "value": value,
                    "iteration": 0,
                    "status": QuickSorter.STATUS_NONE,
                    "pivot": 0
                }
            )
        self._iteration_count = 1

    @property
    def data(self):
        return self._data

    @data.setter
    def data(self, new_data):
        """
        Ensures a "sorted" state is invalidated on data change.
        """
        self._data = new_data
        self.sorted = False

    @property
    def snapshots(self):
        return pd.DataFrame.from_records(self._snapshots)

    def quick_sort(self, low_idx, high_idx):
        if low_idx < high_idx:
            _, pivot_idx = self.partition(low_idx, high_idx)
            self._snapshot(low_idx, high_idx, pivot_idx)
            self.quick_sort(low_idx, pivot_idx - 1)
            self.quick_sort(pivot_idx + 1, high_idx)

    def partition(self, low_idx, high_idx):
        i = low_idx - 1
        pivot = self.data[high_idx]
        for j in range(low_idx, high_idx):
            if self.data[j] <= pivot:
                i += 1
                swap(self.data, i, j)
        swap(self.data, i + 1, high_idx)
        pivot_idx = i + 1
        return self.data, pivot_idx

    def _snapshot(self, low_idx, high_idx, pivot_idx):
        """
        Copy self.data, indicating if each element is currently being sorted.
        """
        for idx, value in enumerate(self._data):
            self._snapshots.append(
                {
                    "position": idx,
                    "value": value,
                    "iteration": self._iteration_count,
                    "status": QuickSorter.STATUS_PIVOT
                    if idx == pivot_idx
                    else QuickSorter.STATUS_SORTING
                    if idx in range(low_idx, high_idx + 1)
                    else QuickSorter.STATUS_NONE,
                    "pivot": self.data[pivot_idx]
                }
            )
        self._iteration_count += 1