Community Article

Segment Trees: When Arrays Aren't Fast Enough

Range queries with mutable data: when a segment tree is the right tool, when a Fenwick tree is better, when lazy propagation pays for itself, and the production leaderboard problem that finally made it click for me.

Segment Trees: When Arrays Aren't Fast Enough

segment-tree

fenwick-tree

lazy-propagation

range-queries

data-structures

By @hannahdelgado

November 19, 2025

Updated May 20, 2026

467 views

Rate

There is a class of problems where arrays stop being good enough. Range-sum queries with point updates. Range-min queries on a stream. Range updates with subsequent range queries. The naive solution is some combination of "loop over the range" (which is O(n) per query) and "rebuild a prefix-sum array on every update" (which is O(n) per update). For a few thousand elements with a thousand operations, naive is fine. For a million elements with a million mixed operations, naive is half a trillion comparisons and hours of wall time. That is the gap a segment tree closes.

I want to be honest up front: I have written a segment tree from scratch maybe six times in twelve years. Most of those were competitive programming. The two production cases (an analytics dashboard with range aggregations and a gaming leaderboard with windowed ranking) were the times the structure earned its keep. The rest of the time I reach for something simpler: a Fenwick tree (BIT), a sorted structure, a precomputed sparse table, or just a smarter algorithm. The interesting question is not "how do you write a segment tree" (the code is mechanical once you have seen it twice) but "when do you reach for one, and when is something else better".

What a segment tree is

A segment tree is a binary tree built over a range [0, n) of an array, where each internal node represents a contiguous subrange and stores some aggregate value (sum, min, max, GCD, XOR, whatever monoid you want) over that subrange. The leaves represent single-element ranges. The root represents the whole array. Each internal node's value is computable from its two children's values.

The structure is usually stored as a flat array of size 4n (an upper bound for safety). Children of index i live at 2i + 1 and 2i + 2, and the parent of i lives at (i - 1) / 2. There are no pointers, just an array and the implicit tree.

For an array of size 8, the tree has 15 nodes:

Segment tree over indices [0..7]
                       [0..7]
                      /      \
                [0..3]        [4..7]
                /    \         /    \
            [0..1] [2..3]  [4..5] [6..7]
            /  \    /  \    /  \    /  \
          [0][1][2][3][4][5][6][7]

Building this from a source array is a single O(n) pass. Querying a range sum is O(log n): you walk down from the root, splitting into left and right halves until each node either fully contains the query range (return its stored value) or is fully outside the query range (return identity). Updating a single element is also O(log n): walk down to the leaf, update it, then walk back up recomputing each ancestor's aggregate.

The minimal working implementation

This is the version I keep, in Python, for sums. Min/max are a one-line change to the combination function and the identity value.

class SegmentTree:
    def __init__(self, data):
        self.n = len(data)
        self.tree = [0] * (4 * self.n)
        if self.n:
            self._build(0, 0, self.n - 1, data)

    def _build(self, node, l, r, data):
        if l == r:
            self.tree[node] = data[l]
            return
        mid = (l + r) // 2
        self._build(2 * node + 1, l, mid, data)
        self._build(2 * node + 2, mid + 1, r, data)
        self.tree[node] = self.tree[2 * node + 1] + self.tree[2 * node + 2]

    def update(self, i, value):
        self._update(0, 0, self.n - 1, i, value)

    def _update(self, node, l, r, i, value):
        if l == r:
            self.tree[node] = value
            return
        mid = (l + r) // 2
        if i <= mid:
            self._update(2 * node + 1, l, mid, i, value)
        else:
            self._update(2 * node + 2, mid + 1, r, i, value)
        self.tree[node] = self.tree[2 * node + 1] + self.tree[2 * node + 2]

    def query(self, ql, qr):
        # inclusive [ql, qr]
        return self._query(0, 0, self.n - 1, ql, qr)

    def _query(self, node, l, r, ql, qr):
        if qr < l or r < ql:
            return 0  # identity for sum
        if ql <= l and r <= qr:
            return self.tree[node]
        mid = (l + r) // 2
        left_sum = self._query(2 * node + 1, l, mid, ql, qr)
        right_sum = self._query(2 * node + 2, mid + 1, r, ql, qr)
        return left_sum + right_sum

About 40 lines. Build is O(n), point update is O(log n), range query is O(log n). Memory is O(n). The tree array is sized 4n because the recursive build can hit indices in that range; you can prove a tighter 2 * 2^ceil(log2(n)) bound but 4n is the safe rule-of-thumb that everyone uses.

Point update vs range update: where lazy propagation enters

The version above handles point updates ("change index i to value v") and range queries. The harder problem is range updates ("add v to every element in [l, r]") combined with range queries. A naive range update would touch every leaf in the range, which is O(n) in the worst case and defeats the structure. The fix is lazy propagation.

The idea: when a range update fully covers a node, do not push the change down to the children. Instead, store a pending update on the node ("everyone below me has +v queued") and skip the recursion. The children are not actually updated; their stored values are stale. When a future query or update needs to descend into a child, it first applies the parent's pending update to both children (pushing the lazy value down) and clears the parent's pending value.

class LazySegmentTree:
    def __init__(self, data):
        self.n = len(data)
        self.tree = [0] * (4 * self.n)
        self.lazy = [0] * (4 * self.n)
        if self.n:
            self._build(0, 0, self.n - 1, data)

    def _build(self, node, l, r, data):
        if l == r:
            self.tree[node] = data[l]
            return
        mid = (l + r) // 2
        self._build(2 * node + 1, l, mid, data)
        self._build(2 * node + 2, mid + 1, r, data)
        self.tree[node] = self.tree[2 * node + 1] + self.tree[2 * node + 2]

    def _push(self, node, l, r):
        if self.lazy[node]:
            mid = (l + r) // 2
            left, right = 2 * node + 1, 2 * node + 2
            # propagate the lazy value to children
            self.tree[left] += self.lazy[node] * (mid - l + 1)
            self.tree[right] += self.lazy[node] * (r - mid)
            self.lazy[left] += self.lazy[node]
            self.lazy[right] += self.lazy[node]
            self.lazy[node] = 0

    def update_range(self, ql, qr, value):
        self._update_range(0, 0, self.n - 1, ql, qr, value)

    def _update_range(self, node, l, r, ql, qr, value):
        if qr < l or r < ql:
            return
        if ql <= l and r <= qr:
            self.tree[node] += value * (r - l + 1)
            self.lazy[node] += value
            return
        self._push(node, l, r)
        mid = (l + r) // 2
        self._update_range(2 * node + 1, l, mid, ql, qr, value)
        self._update_range(2 * node + 2, mid + 1, r, ql, qr, value)
        self.tree[node] = self.tree[2 * node + 1] + self.tree[2 * node + 2]

    def query_range(self, ql, qr):
        return self._query_range(0, 0, self.n - 1, ql, qr)

    def _query_range(self, node, l, r, ql, qr):
        if qr < l or r < ql:
            return 0
        if ql <= l and r <= qr:
            return self.tree[node]
        self._push(node, l, r)
        mid = (l + r) // 2
        return (self._query_range(2 * node + 1, l, mid, ql, qr) +
                self._query_range(2 * node + 2, mid + 1, r, ql, qr))

The key invariant: every node's stored value is correct for its subrange, assuming all of its ancestors' lazy values have been pushed down to it. The _push helper is what enforces that invariant whenever we are about to descend.

The math self.lazy[node] * (mid - l + 1) is the part that catches people. When you propagate a "+v" lazy update to a child, the child's stored sum increases by v times the size of the child's range. For min/max, the propagation is just +v regardless of range size (since min/max are not size-weighted). For "set" updates rather than "add", the lazy value's interaction is different again. Each combination of update operator and aggregate operator has its own propagation rules, and the bug surface is high. This is why competitive programmers keep templates for "sum + add", "sum + set", "min + add", "min + set" and so on.

When to reach for a Fenwick tree (BIT) instead

A Fenwick tree (also called a Binary Indexed Tree, or BIT) handles prefix-sum queries with point updates in O(log n) per operation, with code that fits in twenty lines. For the specific case of "range sum with point updates", a Fenwick tree is strictly better than a segment tree: same asymptotic cost, much smaller code, much smaller constant factor, half the memory.

class Fenwick:
    def __init__(self, n):
        self.n = n
        self.tree = [0] * (n + 1)

    def update(self, i, delta):    # 0-indexed
        i += 1
        while i <= self.n:
            self.tree[i] += delta
            i += i & -i

    def prefix_sum(self, i):       # sum of [0..i], 0-indexed
        i += 1
        s = 0
        while i > 0:
            s += self.tree[i]
            i -= i & -i
        return s

    def range_sum(self, l, r):     # inclusive [l, r]
        if l == 0:
            return self.prefix_sum(r)
        return self.prefix_sum(r) - self.prefix_sum(l - 1)

Twenty lines, including imports if there were any. The tradeoff: Fenwick trees are much harder to extend. They handle sums (and other invertible operations like XOR) cleanly, but min/max do not work because there is no inverse. They handle point updates with range queries in their basic form. Range updates with range queries on a Fenwick require a clever two-Fenwick trick that I never remember and have to look up every time. For anything beyond "range sum or range XOR with point updates", I reach for the segment tree.

The decision matrix I keep in my head:

Problem shape	Reach for
Point update, range sum	Fenwick tree
Point update, range XOR / GCD	Fenwick tree (XOR) / segment tree (GCD)
Point update, range min/max	Segment tree
Range update, range sum	Segment tree (lazy)
Range update, range min/max	Segment tree (lazy)
Static array, range min/max queries only	Sparse table (O(1) per query)
Range "kth smallest" queries	Merge sort tree or persistent segment tree
Dynamic insert/delete with range queries	Order-statistics tree, treap, or skip list

The last two rows are where the structure family gets exotic. Persistent segment trees (Krzysztof Diks et al., used heavily in competitive programming) let you query historical versions of the array. Merge sort trees handle "how many values in [l, r] are less than X". For most production code, you will never need these, but knowing they exist saves you from rolling your own.

A real production use: a leaderboard with windowed ranking

The clearest case I have shipped a segment tree for was a real-time gaming leaderboard. The product requirement was "show the player their rank among everyone who scored within the last 24 hours". Naive implementations of this are surprisingly hard:

Sorted set keyed by score: easy to find rank, but cannot filter by time without scanning.
Sorted set keyed by time: easy to filter by time, but cannot find rank without sorting.
A relational database with ORDER BY score WHERE time > now() - 24h LIMIT N: works for small N, dies under load when there are millions of players.

The shape that actually worked: a segment tree indexed by score buckets (we discretized scores into 100K buckets), where each leaf stored the count of players currently in that bucket. Score insertion was a point update. "What is the rank of player P" became "sum of buckets above P's score", which is a range sum query. The 24-hour window was handled by a circular buffer of these segment trees, one per minute, with insertion adding to the current minute's tree and removal coming from a 24-hours-ago expiration job.

Per query, the cost was dominated by the O(log B) range-sum walk where B was 100K, which is about 17 comparisons. Across 1440 minute-buckets per day, the aggregate query cost was effectively constant. We served the leaderboard at a few thousand QPS on one box. A naive sorted-set-per-window approach would have cost orders of magnitude more memory and would not have supported the windowed ranking at all.

This is the use case I would not call obvious from a textbook. The textbook segment tree is "array of integers, range sum". The production version is "array of bucket counts, range sum used to compute rank". Once you spot that rank is just "count of items above me", the segment tree is the natural fit.

Other production uses I have seen or read

Three more cases worth naming, all of which I have either shipped or read in someone else's code.

Range aggregations in an analytics dashboard. "Total sales between any two arbitrary dates" with frequent point updates as new sales come in. A segment tree over date-indexed buckets handles this with O(log n) per insert and O(log n) per query. For static historical data, a prefix-sum array is faster (O(1) per query); for live data with continuous updates, the segment tree wins.

Building-skyline-style geometric problems. The line-sweep version of "given N rectangles, find the union of their projections onto the x-axis" uses a segment tree of x-coordinates, with range updates for "rectangle covers [xl, xr]" and queries for "total covered length". This is the canonical sweep-line plus segment tree pattern, and variants of it show up in collision detection, GIS overlay analysis, and circuit-design DRC checks.

Scheduling and reservation systems. "Is the room booked at any point between time A and time B" can be answered by a segment tree of time slots with range updates. Cleaner than the relational query. A friend who works on hotel inventory describes this as the structure their pricing engine uses for availability lookups under load.

The off-by-one, the missing push, the wrong identity

Off-by-one in range bounds. Inclusive vs exclusive ranges, 0-indexed vs 1-indexed, half-open vs closed. The two implementations above use inclusive [l, r]. Mixing conventions across functions is the bug I write most often. The fix is to pick one (I prefer inclusive) and stick to it through every helper.

Forgetting to push lazy values before a query. The _push call inside _query_range is mandatory. Skip it and the query reads stale aggregates from children. The bug looks like "queries that touch ranges I have updated return wrong answers, queries on untouched ranges return correct answers". Hard to debug, easy to fix, easy to forget on the next implementation.

Using the wrong identity value. For sum, identity is 0. For min, identity is +infinity. For max, identity is -infinity. For GCD, identity is 0. For multiplication, identity is 1. The query function returns the identity when the query range is disjoint from the node range; pick the wrong identity and the answer is wrong, but only on edge-case inputs that you might not test.

When the segment tree is overkill

The contrarian view I want to commit to: most "I need range queries" problems do not need a segment tree, and reaching for one too early is a code-complexity trap. Specifically, here are the cases I have seen where someone wrote a segment tree and should not have:

Static array, no updates. A prefix-sum array gives O(1) per query in three lines of code. Build once, query forever. If you never mutate the array after construction, the segment tree is doing strictly more work than a prefix array.

Tiny inputs. For arrays under a few thousand elements with a few thousand operations, the naive O(n) per query is faster in wall time than the O(log n) segment tree, because the constant factor on segment tree operations is real. The crossover is somewhere between 1K and 10K elements depending on the hardware. Profile before assuming.

The query is "give me the top K" or "give me the kth-largest", not a range query. A heap or a balanced BST is the right tool. A segment tree can technically answer rank queries (as in my leaderboard), but only after you have transformed the problem into a "sum of buckets" question. If the natural shape of the problem is "ordered access to elements", reach for a heap or a sorted structure.

Append-only data with append-time queries. A streaming algorithm (running sum, running min, count-min sketch for approximate counts) is often a better fit than a segment tree, because it does not require storing the full history.

The honest summary is that segment trees are the right tool when (a) you have an array, (b) you have both updates and queries on ranges, and (c) at least one of the two is range-shaped (range update or range query, not just point operations on both). Knock any of those three out and a simpler structure is usually faster, simpler, and easier to maintain. The segment tree's elegance has always been "one structure, lots of monoidal operations", but elegance is a luxury that production code can usually afford to skip when a Fenwick tree, a prefix-sum array, or a heap will do the job. Reach for the segment tree on purpose, because the problem actually has the shape, not because it sounded like the fanciest answer. Knowing when to reach for it (and when not to) is the part of the lesson that took me five years longer than the implementation did.

Back to Articles