Data Structures

Union-Find (Disjoint Set Union)

Difficulty: Intermediate

Given a million pairwise 'these two accounts belong to the same person' relations, decide whether two arbitrary accounts are now equivalent. Brute-force flood fills are O(n) per query; sorting is the wrong shape entirely. Union-Find...

INTERMEDIATE

Complexity varies

Union-Find (Disjoint Set Union)

Given a million pairwise 'these two accounts belong to the same person' relations, decide whether two arbitrary accounts are now equivalent. Brute-force flood fills are O(n) per query; sorting is the wrong shape entirely. Union-Find answers each query in nearly O(1) amortized time after the relations are merged, with two primitives and a parent array.

This lesson covers the disjoint-set abstraction, the find operation that walks parent pointers up to a representative, the union operation that links two representatives, and the two optimizations (path compression during find and union by rank during union) that together push the amortized cost to O(alpha(n)), where alpha is the inverse Ackermann function (effectively constant for any input you can fit in memory). You will trace the parent array as merges happen, and apply the structure to cycle detection in undirected graphs, connected components, equivalence classes, and Kruskal's minimum spanning tree.

In Arrays & Strings, you saw that an array can encode a mapping from index to value in O(1); a parent array is exactly that mapping, applied to representative pointers. Graphs: Representation Basics gave you the vertex-and-edge vocabulary that Union-Find uses to define connectivity, even though the structure itself stores no edges directly.

With Union-Find in your toolkit, the curriculum continues with composite designs that combine multiple primitives, starting with the LRU cache pattern.

Intermediate

55 min

6 Objectives

5 Sections

Prerequisites:

Arrays & Strings Graphs: Representation Basics

Topics:

Union-Find / DSU

Path Compression

Union by Rank

Data Structures

Graphs

Connected Components

Cycle Detection

Intermediate

Premium

What You'll Learn

By the end of this lesson, you will be able to:

Explain the disjoint set concept and how Union-Find partitions elements into non-overlapping groups.

Implement find and union operations with a parent array in both JavaScript and Python.

Apply path compression to flatten trees during find and explain why it speeds up future queries.

Apply union by rank to keep trees shallow and analyze the combined amortized complexity.

Trace Union-Find operations step by step, predicting parent array changes and tree shapes.

Identify applications of Union-Find including cycle detection, connected components, and Kruskal's MST.

Why This Matters

Union-Find is essential for graph connectivity problems -- determining whether two nodes are in the same connected component, detecting cycles in undirected graphs, and building minimum spanning trees with Kruskal's algorithm all rely on it.

The path compression and union by rank optimizations showcase amortized analysis in practice: individually an operation may do extra work, but the total cost across all operations is nearly linear, governed by the inverse Ackermann function.

Interview problems like 'Number of Connected Components,' 'Redundant Connection,' and 'Accounts Merge' all expect a Union-Find solution, and interviewers specifically look for the optimized version with both compression and rank.

Understanding Union-Find prepares you for advanced topics like offline dynamic connectivity, Kruskal's algorithm for minimum spanning trees, and equivalence-class partitioning in compilers.

Key Terms

7 terms you'll encounter in this lesson

Disjoint Set

A collection of non-overlapping (disjoint) sets where every element belongs to exactly one set. Union-Find maintains this partition and supports efficient merging and membership queries.

Find

The operation that determines which set an element belongs to by returning the representative (root) of that set. With path compression, find runs in nearly O(1) amortized time.

Union

The operation that merges two sets into one by linking the root of one set to the root of the other. With union by rank, this keeps the tree shallow.

Path Compression

An optimization applied during find that makes every node on the path point directly to the root, flattening the tree structure and speeding up all future find operations on those nodes.

Union by Rank

An optimization for union that always attaches the shorter tree under the taller tree's root, keeping the overall tree height logarithmic. Rank is an upper bound on tree height.

Representative (Root)

The canonical element of a set -- the node whose parent is itself. Two elements are in the same set if and only if they have the same representative.

Inverse Ackermann Function

Denoted alpha(n), this function grows so slowly that for all practical input sizes it is at most 4. It describes the amortized per-operation cost of Union-Find with both path compression and union by rank.

Heads Up

Common misconceptions to watch out for

"Path compression makes find O(1) in the worst case"

Path compression gives amortized nearly-O(1) cost, not worst-case O(1). A single find can still traverse several nodes, but the total cost across all operations is nearly linear. The precise bound is O(alpha(n)) amortized per operation.

"Union by rank always produces a perfectly balanced tree"

Union by rank keeps the tree height at most O(log n), but the tree is not necessarily balanced in the binary-tree sense. Rank is an upper bound on height, not the actual height -- path compression can make the actual tree much flatter than the rank suggests.

"You need both optimizations for correctness"

Union-Find is correct without any optimizations -- the naive version just runs slower (up to O(n) per operation). Path compression and union by rank are performance optimizations, not correctness requirements. Even using just one optimization gives O(log n) amortized.

"Union-Find can efficiently split sets apart"

Standard Union-Find only supports merging sets, not splitting them. Once two elements are in the same set, there is no efficient way to separate them. If you need split operations, you need a different data structure like a link-cut tree.