Data Structures

Persistent Data Structures

Difficulty: Advanced

Cloning a 100k-element segment tree before every update so the previous version stays queryable costs O(n) time and space per update, which is unworkable. Persistent Data Structures achieve the same 'every past version is still queryable'...

ADVANCED

Complexity varies

Persistent Data Structures

Cloning a 100k-element segment tree before every update so the previous version stays queryable costs O(n) time and space per update, which is unworkable. Persistent Data Structures achieve the same 'every past version is still queryable' guarantee in O(log n) extra time and space per update by sharing the unchanged subtrees between versions and only allocating new nodes along the path that actually changed.

This lesson covers the path-copying technique on a persistent segment tree (each update returns a brand-new root that shares everything except a thin spine of new nodes with the previous version), the O(log n) point update and range query on any historical version, persistent arrays built on top of persistent segment trees, and persistent stacks and queues using the cons-list approach from functional programming. You will analyze the total space (O(n + q log n) for q updates) and apply the structure to k-th smallest in subarrays, time-travel queries, and offline-query reductions.

In Trees: Binary Tree Fundamentals, a recursive subtree was a self-contained piece of data; structural sharing exploits exactly that property to reuse old subtrees in new versions. Segment Tree built the recursive aggregate-over-intervals shape that the persistent variant adapts directly, swapping in-place updates for path copying.

Next, Treap (Randomized BST) returns to the BST family with a different twist: random priorities for expected balance and O(log n) split and merge primitives.

Advanced

75 min

6 Objectives

5 Sections

Prerequisites:

Trees: Binary Tree Fundamentals Segment Tree

Topics:

Persistent Data Structures

Segment Tree

Trees

Data Structures

Advanced

Premium

Recursion

Range Queries

What You'll Learn

By the end of this lesson, you will be able to:

Explain what persistence means for a data structure and why path copying is more efficient than full cloning.

Implement a persistent segment tree that creates a new version on each point update while sharing unchanged nodes.

Query any historical version of a persistent segment tree in O(log n) time.

Analyze the time and space complexity of persistent segment tree operations.

Implement a persistent array using a persistent segment tree as the backing structure.

Describe how persistent stacks and queues work using the functional (cons-list) approach.

Why This Matters

Persistent data structures let you query or compare any two historical versions without storing full copies. A naive approach of cloning the entire structure before each update costs O(n) space per version, but path copying achieves O(log n) space per version by sharing unchanged subtrees.

The persistent segment tree is a workhorse in competitive programming, solving problems like 'find the kth smallest element in any subarray' or 'answer range queries at different points in time.' These problems are impossible to solve efficiently with ephemeral structures alone.

Understanding persistence deepens your grasp of immutability and structural sharing -- the same principles that power functional programming languages (Clojure's persistent vectors, Haskell's immutable trees) and modern state management libraries (React/Redux time-travel debugging).

Persistent structures appear in production systems for version-controlled databases, Git-like object stores, and any application that needs to efficiently maintain and query a history of changes.

Key Terms

7 terms you'll encounter in this lesson

Persistent Data Structure

A data structure that preserves all previous versions of itself. After every modification, both the old and new versions remain accessible. Partial persistence allows queries on old versions; full persistence allows updates on old versions too.

Path Copying

The technique of creating new copies of only the nodes along the path from root to the modified leaf, while sharing all other nodes with the previous version. This achieves O(log n) space per update in a balanced tree.

Structural Sharing

The property of persistent data structures where unchanged subtrees are shared (referenced) between versions rather than duplicated. This is what makes persistence space-efficient.

Version / Root Array

An array of root pointers where roots[v] points to the root node of version v. Querying version v starts from roots[v] and traverses the tree normally.

Ephemeral Data Structure

A standard (non-persistent) data structure that is modified in place. After an update, the previous state is lost unless explicitly saved.

Partial vs Full Persistence

Partial persistence allows queries on any version but updates only on the latest version. Full persistence allows both queries and updates on any version, creating a branching version tree.

Persistent Segment Tree

A segment tree where each update creates a new root and O(log n) new nodes along the modified path, sharing all unchanged nodes with the previous version. Supports O(log n) queries on any version.

Heads Up

Common misconceptions to watch out for

"Persistent means the data is saved to disk"

In data structure terminology, 'persistent' means the structure retains all previous versions in memory. It has nothing to do with disk storage or database persistence. The term comes from the idea that old versions 'persist' after modifications.

"Making a data structure persistent requires copying the entire structure on each update"

Path copying only creates new nodes along the path from root to the modified leaf -- O(log n) nodes for a balanced tree. All other nodes are shared between versions. This is far more efficient than a full O(n) clone.

"Persistent data structures are always slower than their ephemeral counterparts"

The asymptotic time complexity is the same: O(log n) per operation for both persistent and ephemeral segment trees. The constant factor is slightly higher due to dynamic node allocation (pointers instead of array indexing), but the Big-O is identical.

"You need a special garbage collector to manage old versions"

In competitive programming and most implementations, all versions remain in memory for the duration of the program. In production systems with reference counting or tracing GC (like in Clojure or Haskell), unreferenced versions are automatically reclaimed. You do not need to manually manage version lifecycle.