Community Article

Sorting Algorithms Compared

The complete comparison table, why your standard library is almost always faster than what you would write, and the two questions that decide which sort you actually want.

Sorting Algorithms Compared

The complete comparison table, why your standard library is almost always faster than what you would write, and the two questions that decide which sort you actually want.

sorting
quick-sort
merge-sort
algorithms
interview-prep
khalidcooper

By @khalidcooper

March 15, 2026

·

Updated May 20, 2026

638 views

16

Rate

Nobody who ships software writes their own sort anymore, and you should not either. Every modern standard library ships a sort that is faster, more battle-tested, and more numerically careful than anything you would write in an afternoon. Python's sorted uses Timsort. JavaScript's Array.prototype.sort uses Timsort in V8 and a hybrid in JavaScriptCore. Java's Arrays.sort uses a dual-pivot quicksort for primitives and Timsort for objects. C++'s std::sort is introsort. They are all good. None of them is the algorithm you learned in CS101.

That is the boring part. The interesting part, and the reason this article is worth reading, is that you still have to make choices about sorting. Which key to sort by. Whether stability matters. Whether your data is adversarial. Whether to sort in place or allocate a fresh array. Whether your data is small enough that the algorithm choice does not matter at all. The classical sort comparison table is a teaching tool; the real engineering questions live above it.

The classical comparison table

This is the table every algorithms class teaches. I am going to put it here once, then never reference it again, because the production engineering questions are different.

AlgorithmTime bestTime avgTime worstSpaceStableIn-place
Bubble sortO(n)O(n^2)O(n^2)O(1)YesYes
Selection sortO(n^2)O(n^2)O(n^2)O(1)NoYes
Insertion sortO(n)O(n^2)O(n^2)O(1)YesYes
Merge sortO(n log n)O(n log n)O(n log n)O(n)YesNo
Quick sortO(n log n)O(n log n)O(n^2)O(log n)NoYes
Heap sortO(n log n)O(n log n)O(n log n)O(1)NoYes
TimsortO(n)O(n log n)O(n log n)O(n)YesNo
Counting sortO(n + k)O(n + k)O(n + k)O(n + k)YesNo
Radix sortO(d(n + k))O(d(n + k))O(d(n + k))O(n + k)YesNo

Six things to notice that the table does not yell at you.

First, all the comparison-based sorts (bubble through heap) are bounded below by O(n log n). That is not a property of the algorithms; it is a theorem about comparison-based sorting. You cannot beat O(n log n) if your only operation is compare(a, b).

Second, counting sort and radix sort beat O(n log n) because they are not comparison-based; they exploit structure in the data (small key range, or fixed number of digits).

Third, quick sort's worst case is quadratic, which means an adversarial input or a degenerate pivot choice tanks it. The standard library's quicksort uses random or median-of-medians pivots to avoid this in practice.

Fourth, heap sort and merge sort guarantee O(n log n) worst case, which is why they show up in real-time and adversarial-input contexts.

Fifth, stability matters more than people remember. A stable sort preserves the relative order of equal keys; an unstable one does not. This decides whether you can sort by primary key then secondary key, or whether you have to sort by a tuple in one shot.

Sixth, in-place sorting is a memory question, not a time question. If your data is bigger than RAM, the O(n) extra space of merge sort might mean swapping to disk, which is two orders of magnitude slower. In-place sorts are not faster on small inputs; they just do not blow the memory budget on large ones.

Quick sort, in code, with the pivots that actually matter

Quick sort is the algorithm I see implemented from scratch most often, mostly because it is short. The textbook version looks like this:

def quicksort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[0]
    left = [x for x in arr[1:] if x < pivot]
    right = [x for x in arr[1:] if x >= pivot]
    return quicksort(left) + [pivot] + quicksort(right)

This is wrong for production. It is O(n^2) on already-sorted input (the pivot is always the smallest element), and it allocates a fresh list at every level of recursion. The in-place version is the one to know:

import random

def quicksort(arr, lo=0, hi=None):
    if hi is None: hi = len(arr) - 1
    if lo < hi:
        # Random pivot to avoid adversarial worst case
        pivot_idx = random.randint(lo, hi)
        arr[pivot_idx], arr[hi] = arr[hi], arr[pivot_idx]
        pivot = arr[hi]

        i = lo - 1
        for j in range(lo, hi):
            if arr[j] <= pivot:
                i += 1
                arr[i], arr[j] = arr[j], arr[i]
        arr[i + 1], arr[hi] = arr[hi], arr[i + 1]
        p = i + 1

        quicksort(arr, lo, p - 1)
        quicksort(arr, p + 1, hi)

The random pivot is the line that protects against adversarial input. Without it, an attacker (or an unlucky data shape) can force the O(n^2) worst case. Real implementations go further: Java's DualPivotQuicksort uses two pivots and a median-of-five pre-check; C++'s std::sort falls back to heap sort if recursion gets too deep (this is what "introsort" is). The cleverness is in the safety net, not in the recursion.

Timsort: the algorithm in your standard library

Timsort is the sort you actually use, even if you do not know it by name. Tim Peters wrote it for Python in 2002, and it has since been adopted by Java (object arrays), JavaScript (V8), Rust (slice::sort), and a long list of other runtimes. The reason it dominated is not theoretical; it is practical.

Timsort is a hybrid of merge sort and insertion sort, with one key insight: real-world data is almost never random. It contains runs (consecutive elements that are already in order, or in reverse order). Timsort scans for these runs, reverses any descending ones to make them ascending, then merges them in a galloping pattern that exploits remaining order. On data that is already sorted, it runs in O(n). On reverse-sorted data, also O(n) after the initial reverse. On random data, it falls back to standard merge-sort-shaped O(n log n).

In practice, this means the standard library sort beats your hand-rolled quicksort on most realistic data, sometimes by a wide margin. The first time I benchmarked Python's sorted against my own quicksort, I was 4x slower on random input and 50x slower on partially-sorted input. I never wrote my own sort again unless the standard library was unavailable.

The two questions that actually decide your sort choice

When I am picking how to sort something in real code, I ask exactly two questions, in this order.

Question 1: Do I need stability? If my data has secondary information attached to the sort key (any object with multiple fields, basically), and I might want to sort by one field then re-sort by another, I need a stable sort. Python sorted is stable. Java's Arrays.sort for objects is stable, but for primitives is not (it is dual-pivot quicksort). JavaScript's Array.prototype.sort was unstable in some engines until 2019 and is now guaranteed stable by the ECMAScript spec. C++ std::sort is unstable; if you need stability, you call std::stable_sort. Mixing this up is the source of a small but real class of bugs.

Question 2: Is my data adversarial, real-time, or huge? If yes, the standard library might not be enough.

  • Adversarial input (an attacker controls the data): random pivots help, but the safest answer is heap sort or merge sort because their worst case is O(n log n). C++'s std::sort falls back to heap sort, which is one reason it is preferred over hand-rolled quicksort for security-sensitive code.
  • Real-time / latency-sensitive: again, you want the worst-case bound, not the average. Heap sort or merge sort.
  • Bigger than memory: external sort (a merge-sort variant that streams chunks through disk). This is what the database does for ORDER BY on a billion-row table. You almost never write this from scratch, but you should recognize it when you see it.
  • Tiny (under, say, 32 elements): insertion sort is faster than anything else because the cache and constant-factor wins dominate the O(n^2) penalty. This is why every O(n log n) implementation switches to insertion sort below a small threshold.
  • Keys are bounded integers: counting sort or radix sort can do O(n + k), beating the comparison lower bound. Useful for sorting age values, color indices, byte arrays.

If the answer to question 2 is "none of the above", use the standard library. That is most of the time.

When the standard library is the wrong answer

Three honest cases I have actually shipped where the language sort was not the right tool.

First, when sorting was not the actual operation needed. I have seen arr.sort()[0] more than once in a code review, used to find the minimum. That is O(n log n) for an O(n) problem. Just walk the array and track the min. The same goes for the top-K problem: do not sort and slice; use a heap of size K for O(n log K).

Second, when partial sorting was enough. C++ has std::partial_sort and std::nth_element. Python has heapq.nsmallest and heapq.nlargest. If you only need the top 10 of a million, do not sort the million.

Third, when the data was already partly sorted and the natural "insert in order" was cheaper. bisect.insort in Python keeps a sorted list as you add elements, in O(n) per insert (because of the shift) but O(log n) per find. For workloads where you read the sorted view far more than you write, this beats re-sorting after each insert.

Knowing what your data shape is, and what queries you actually run on it, often reveals that you do not need to sort at all.

Where sorting is heading next

The asymptotic story for comparison-based sorting is settled; the lower bound is O(n log n) and Timsort and friends are within a constant factor of optimal on the data shapes that show up in real workloads. The interesting work is in three other directions, and each one is going to matter more in product code over the next few years.

First, adaptive parallel sorts. Modern CPUs are eight to thirty-two cores; a sort that uses one core is leaving most of the machine on the table. Java's Arrays.parallelSort and C++'s std::execution::par policies are early movers. The default sort being parallel-by-default is plausibly the next API change in big standard libraries, and it changes the cost model: O(n log n / p) for p cores is a different equation than the single-threaded version, and it has implications for chunk sizes and memory allocation that the textbook sort does not cover.

Second, GPU-accelerated sorting for very large arrays. The CUDA Thrust library and similar tools can sort hundreds of millions of integers in milliseconds on consumer GPUs. This is already standard in machine learning pipelines and is leaking into general-purpose data pipelines via tools like RAPIDS. If your job is to sort a billion log lines, the answer might not be a better CPU algorithm.

Third, learned sorting, where a small ML model predicts the rough position of each element and sorting becomes "refine the prediction" rather than "compare from scratch". This is research-grade as of 2026, but the gains on specific data shapes (numeric, predictable distribution) are real. It will not replace Timsort for general use, but it might replace it inside specific database engines for specific column types.

The practical version of "where this is heading" is roughly: keep using your standard library, but when you find yourself sorting hundreds of millions of items in a hot path, the answer is no longer a hand-rolled algorithm. It is a different machine, a different concurrency model, or a domain-specific data layout. The single-threaded O(n log n) sort has been a solved problem for two decades. The interesting unsolved parts are everywhere else.

Back to Articles