Union-Find / DSU
union-find
Data Structures
Union-Find (Disjoint Set Union)
Given a million pairwise 'these two accounts belong to the same person' relations, decide whether two arbitrary accounts are now equivalent. Brute-force flood fills are `O(n)` per query; sorting is the wrong shape entirely. **Union-Find** answers each query in nearly `O(1)` amortized time after the relations are merged, with two primitives and a parent array. This lesson covers the disjoint-set abstraction, the `find` operation that walks parent pointers up to a representative, the `union` operation that links two representatives, and the two optimizations (path compression during `find` and union by rank during `union`) that together push the amortized cost to `O(alpha(n))`, where `alpha` is the inverse Ackermann function (effectively constant for any input you can fit in memory). You will trace the parent array as merges happen, and apply the structure to cycle detection in undirected graphs, connected components, equivalence classes, and Kruskal's minimum spanning tree. In **Arrays & Strings**, you saw that an array can encode a mapping from index to value in `O(1)`; a parent array is exactly that mapping, applied to representative pointers. **Graphs: Representation Basics** gave you the vertex-and-edge vocabulary that Union-Find uses to define connectivity, even though the structure itself stores no edges directly. With Union-Find in your toolkit, the curriculum continues with composite designs that combine multiple primitives, starting with the LRU cache pattern.
Not Started
0%
Practice Problems
Accounts Merge
Given a list of accounts where each account has a name and emails, merge accounts that share a common email address.
Number of Connected Components
Given n nodes and a list of undirected edges, find the number of connected components in the graph.
Graph Valid Tree
Given n nodes and a list of undirected edges, determine if these edges form a valid tree (connected and acyclic).
Min Cost to Connect All Points
Given an array of points on a 2D plane, find the minimum cost to connect all points such that there is a path between every pair, using Manhattan distance as the edge cost.
Number of Provinces
Given an adjacency matrix representing connections between cities, find the total number of provinces (connected components).
Redundant Connection
Given a graph that started as a tree with one extra edge added, find and return the edge that can be removed to make it a tree again.
Code Snippets
Disjoint Set (Union-Find) Template
Disjoint Set Union (DSU) tracks a partition of N elements into disjoint groups, supporting near-constant-time `find` (which group?) and `union` (merge two groups). It is the building block for Kruskal's MST, connected components on dynamic graphs, and the redundant-connection problem. This snippet covers the parent-array skeleton, the path-compression optimisation that flattens trees on every find, and the union-by-rank merge that keeps trees shallow.
Union-Find in Python
Union-Find (Disjoint Set Union, DSU) tracks 'which group does each element belong to' and answers `find` / `union` in nearly O(1) amortized when you implement both path compression and union by rank. It is the right data structure for connected-components, Kruskal's MST, and 'is X equivalent to Y' under a stream of merges. This entry covers a basic class, the rank + path compression upgrades, and an MST application.
Question Banks
Community
The Union-Find With Rank I Copy Into Every Contest
After timing out on a Codeforces 'connected components' problem twice, I memorized this 25-line union-find with path compression and union-by-rank. The find() is iterative (no recursion to blow the stack) and component_count is O(1).
Union-Find Explained with Three Real Problems
Disjoint-set union seen through Kruskal MST, dynamic connectivity in a message bus, and grid percolation, plus when the path-compression-plus-rank optimization stops being optional.
Union-Find with Path Compression, Step by Step
Four implementations of union-find, in order: naive, union by rank, path compression, both combined. Watch the per-operation cost drop from O(n) to inverse-Ackermann at each stage.
