Community Article

Heaps and Priority Queues: When the Order Matters

Why a binary heap is the right shape for top-K, Dijkstra, k-way merge, and scheduling, plus the one trap I have hit twice with mutable heap entries.

Heaps and Priority Queues: When the Order Matters

Why a binary heap is the right shape for top-K, Dijkstra, k-way merge, and scheduling, plus the one trap I have hit twice with mutable heap entries.

heap

priority-queue

data-structures

min-heap

interview-prep

By @leoeriksson

March 20, 2026

Updated May 20, 2026

346 views

4.3 (9)

There is a moment in every backend project where the FIFO queue stops being enough. The first time it happened to me, I was building a webhook delivery service. Most events were ordinary. A small subset (payment confirmations, fraud alerts) had to jump the line. A plain Queue object could not express that, and adding a "priority" field plus sort-on-every-pop turned the consumer loop into O(n log n) per dequeue. I rewrote the consumer around a binary heap that afternoon, and the throughput numbers stopped lying.

Heaps are not a structure I think about every week, but every quarter or so a problem lands on my desk that has a heap-shaped hole in it. The interesting thing is that I almost never write a heap from scratch in production. The library version is good enough. What I do every time is decide whether the problem actually wants a heap, or whether I am about to over-engineer my way past a sorted array that would have worked.

What a heap actually is

A binary heap is a complete binary tree (filled level by level, left to right) that satisfies the heap property: the value at every node is less than or equal to the values of its children, in a min-heap (or greater than or equal to, in a max-heap). The root is the minimum (or maximum) of the whole tree. That is the entire structural rule.

The interesting bit is the implementation. A complete binary tree maps cleanly onto a flat array, with no pointers, where the children of index i live at 2i + 1 and 2i + 2, and the parent of i lives at (i - 1) / 2. Insertion is "add to the end of the array, then bubble up". Deletion of the root is "swap root with the last element, shrink the array, then sift down". Both are O(log n) because the tree height is O(log n).

class MinHeap:
    def __init__(self):
        self.data = []

    def push(self, x):
        self.data.append(x)
        self._sift_up(len(self.data) - 1)

    def pop(self):
        last = self.data.pop()
        if not self.data:
            return last
        result = self.data[0]
        self.data[0] = last
        self._sift_down(0)
        return result

    def _sift_up(self, i):
        while i > 0:
            parent = (i - 1) // 2
            if self.data[i] < self.data[parent]:
                self.data[i], self.data[parent] = self.data[parent], self.data[i]
                i = parent
            else:
                break

    def _sift_down(self, i):
        n = len(self.data)
        while True:
            l, r = 2 * i + 1, 2 * i + 2
            smallest = i
            if l < n and self.data[l] < self.data[smallest]:
                smallest = l
            if r < n and self.data[r] < self.data[smallest]:
                smallest = r
            if smallest == i:
                break
            self.data[i], self.data[smallest] = self.data[smallest], self.data[i]
            i = smallest

That is roughly 30 lines of working min-heap. I keep it in my head not because I write it, but because reading the standard library version of heapq (Python) or PriorityQueue (Java) is much easier when you already know the moves.

A priority queue is the abstract type, a heap is one implementation

The terms get muddled in interviews. A priority queue is an abstract data type that supports push(value, priority), peek (read the highest-priority element), and pop (remove the highest-priority element). A heap is the most common concrete data structure used to implement a priority queue, but it is not the only one. A balanced BST works. So does a sorted array if you only push occasionally. So does a Fibonacci heap if you need a faster decrease-key.

In Python, heapq operates on a list. In Java, java.util.PriorityQueue is a binary heap under the hood. In C++, std::priority_queue is built on top of a heap that lives inside a container of your choice. JavaScript has nothing in the standard library, which is one of the few places I have written my own heap in a real project (or, more often, pulled in mnemonist/heap).

The four real uses I keep coming back to

Listing the cases where a heap has actually paid for itself in my work, in roughly the order I encountered them.

Top-K selection. "Give me the top 100 records by score, out of a stream of millions." A bounded min-heap of size K solves this in O(n log K) with O(K) memory. Push every element. If the heap exceeds K, pop the minimum. What remains is the top-K. This pattern shows up in leaderboards, log analysis ("top 50 slowest queries today"), search ranking, and ad ranking. I have shipped variants of this loop more times than I can count.

import heapq

def top_k(stream, k):
    heap = []
    for x in stream:
        if len(heap) < k:
            heapq.heappush(heap, x)
        elif x > heap[0]:
            heapq.heapreplace(heap, x)
    return heap   # the K largest, in heap order

Dijkstra's shortest path. The textbook "shortest path in a weighted graph with non-negative edges" algorithm needs a priority queue keyed by tentative distance. Without one, the algorithm devolves to O(V^2). With a binary heap, it is O((V + E) log V), which on sparse graphs is essentially linear. Every routing engine, every game pathfinder, every navigation app has a Dijkstra (or A*, which is Dijkstra with a heuristic) loop somewhere.

Merging K sorted streams. "I have 10 sorted log files. Give me the merged output in sorted order." A heap of size K (one element per stream) lets you walk all of them in O(N log K) time, where N is the total element count. This is also how external merge sort works: you read chunks into memory, sort each chunk, write them back, then merge them with a heap. Database engines do this all day.

Scheduling and timer wheels. "Run task X at time T1, task Y at time T2." A heap keyed by run-time gives you O(log n) insertion and O(log n) "pop the next task to run". Linux's old CFS scheduler used a red-black tree for the same purpose; userspace task schedulers (celery, sidekiq, BullMQ) often use a Redis sorted set, which is also a heap-shaped structure under the hood. Whenever the question is "which task fires next", a priority queue is the data structure.

Heap operations and their costs

Operation	Cost	Notes
Peek (read root)	O(1)	Just read index 0
Push	O(log n)	Append + sift up
Pop	O(log n)	Swap with last + sift down
Build from N items	O(n)	Floyd's heapify, not N inserts
Decrease-key	O(log n)	If you have a handle to the node
Delete arbitrary node	O(log n)	If you have a handle; otherwise O(n) to find it
Find minimum (max)	O(1)	Same as peek
Find arbitrary value	O(n)	Heaps are not searchable structures

That last row is the one that catches people. A heap is great at "give me the smallest" and terrible at "is value X in here". If you need both, you want a heap plus a hash set side structure, with careful synchronization between them.

The "build from N items in O(n)" line is also worth noting. The naive way is N pushes, which is O(n log n). The correct way (Floyd's algorithm) is to dump everything into the array and sift down from the bottom up. The math is non-obvious but the cost is genuinely linear, and heapq.heapify in Python uses this trick.

Where I would NOT reach for a heap

Three honest cases where the heap is the wrong tool.

First, when N is small. For under a few hundred items, a sorted array with binary-search insertion or even an unsorted array with linear scan is faster in practice. The constant factor on heap operations is higher than people expect, and the cache behaviour of a small flat array is excellent.

Second, when you need ordered iteration of all elements. A heap can give you the minimum, but iterating in sorted order requires popping every element, which is O(n log n) and destroys the heap. If you need a sorted view, use a sorted array or a balanced BST.

Third, when you need range queries or rank queries. "How many elements are between 5 and 10?" or "What is the 50th smallest?" are not heap operations. Reach for an order-statistics tree, a skip list, or a segment tree.

A trap I have hit twice

Heaps with mutable elements are a sharp edge. If you push an object into a heap and then modify the field that determines its priority, the heap invariant breaks silently. Sift operations after that point will return wrong answers, and the corruption is hard to debug because the heap structure looks fine.

In a job scheduler I wrote in 2022, we let users update a job's priority after submission. The naive "find the job in the heap, change its priority field" path corrupted the structure and made some jobs run far too late. The fix is one of: (1) treat heap entries as immutable, push a new entry on update and skip stale ones on pop, or (2) use a heap that supports decrease-key with handles (more complex but exact).

The "lazy invalidation" version (option 1) is what I usually ship. Push the new entry. On pop, check whether the entry is the current one for that job. If not, drop it and pop again. Memory cost: up to one stale entry per update. Time cost: amortized O(log n) per real pop. Correctness cost: zero, as long as the staleness check is right.

def pop_valid(heap, current_priority):
    while heap:
        priority, job_id = heapq.heappop(heap)
        if priority == current_priority.get(job_id):
            return job_id
    return None

The heap is the answer when the order changes

The line I want to leave you with is this: a heap is the right structure when you need fast access to the extreme element of a collection that is changing. Not "sometimes". Not "for certain inputs". Specifically when the collection is mutable and the question is always "what is the smallest right now". Top-K, Dijkstra, k-way merge, scheduling. Each one fits that pattern. If your collection is static, sort it and binary-search; if your collection is mutable but you never need the extreme, use a hash table; if your collection is mutable and the extreme matters at every step, use a heap. That decision tree has not let me down once in eight years of writing backend code, and the standard-library implementations are good enough that I have not had to write the moving parts from scratch since the webhook service. Trust the library, but understand the shape, and the next time someone asks "should this be a queue or a priority queue" you will have the right answer in five seconds instead of thirty minutes.

Back to Articles