Tracking Connected Components
This module explores the core principles of Union-Find (Disjoint Set Union), deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. Problem Definition
Problem: Given a set of N elements, implement a data structure that can efficiently perform the following two operations:
- Union(A, B): Merge the set containing element
Aand the set containing elementB. - Find(A): Find the representative element (or root) of the set that contains
A.
Constraints:
1 <= N <= 10^5- Elements are represented as integers from
0toN-1.
Examples:
- Initialization:
N = 5. Sets:{0}, {1}, {2}, {3}, {4} Union(0, 1)-> Sets:{0, 1}, {2}, {3}, {4}Union(1, 2)-> Sets:{0, 1, 2}, {3}, {4}Find(0)==Find(2)->trueFind(0)==Find(3)->false
Edge Cases & Clarifying Questions
- What if the elements are already connected?
Unionshould returnfalseor do nothing, which is often used for cycle detection. - Are node IDs 0-indexed or 1-indexed? This affects the initialization of the parent array. We assume 0-indexed for simplicity.
- What if we union an element with itself? It should have no effect, as both
Findcalls return the same root.
Applications: Kruskal’s MST Algorithm, Cycle Detection in Undirected Graphs, Grid Connectivity (Number of Islands).
Optimization:
- Path Compression: Point nodes directly to the root during
find. - Union by Rank/Size: Attach the smaller tree to the larger tree to keep height low.
- Result: Time complexity becomes O(alpha(N)) (Inverse Ackermann function), which is effectively O(1) for all practical N.
2. Intuition (The “Genesis”)
At its core, a Disjoint Set is a forest (a collection of trees). When we start, every element is its own tree with itself as the root.
The Corporate Merger Analogy
Imagine a world where every employee is the CEO of their own one-person company.
- Union(A, B) represents a corporate merger. The CEO of company B agrees to report to the CEO of company A.
- Find(A) is the process of climbing the corporate ladder to find the ultimate CEO of A’s current conglomerate.
Brute Force Approach
- Array approach: We could use an array where
arr[i]stores the component ID of nodei.Findis O(1), butUnion(A, B)requires changing the component ID for all nodes in A’s component to B’s component, taking O(N) time. - Tree approach (Naive): We use an array
parentwhereparent[i]is the parent of nodei.Union(A, B)just setsparent[root(A)] = root(B). But trees can become linear chains, makingFindO(N).
Optimized Approach (The Pattern)
To achieve near O(1) performance, we apply two simple but powerful optimizations:
- Path Compression (during Find): Every time we traverse from a node to the root, we update the parent pointer of all nodes along the path to point directly to the root. This flattens the tree dynamically. In our analogy, if you ask HR who the ultimate CEO is, HR updates your file so you report directly to the CEO from now on, completely bypassing middle management.
- Union by Rank / Size (during Union): When merging two trees, always attach the shorter/smaller tree under the root of the taller/larger tree. This minimizes the maximum depth. In our analogy, the smaller company always gets absorbed into the larger company’s hierarchy to keep the overall corporate ladder as short as possible.
Common Pitfalls
- Forgetting Path Compression: If you just do
parent[i] = find(parent[i])but don’t return it, or forget it entirely,Finddegrades to O(N). - Unioning elements instead of roots: You must connect
Find(A)toFind(B). Doingparent[A] = parent[B]breaks the tree structure! - Path Compression without recursion: While possible iteratively, the recursive one-liner
parent[i] = Find(parent[i])is much cleaner and less error-prone.
3. Interactive Analysis
Union nodes and watch them merge. Find highlights the path to the representative (Root).
4. Implementation
class DSU {
int[] parent;
int[] rank;
public DSU(int n) {
parent = new int[n];
rank = new int[n];
for (int i = 0; i < n; i++) parent[i] = i;
}
public int find(int i) {
if (parent[i] != i) {
parent[i] = find(parent[i]); // Path Compression
}
return parent[i];
}
public boolean union(int i, int j) {
int rootI = find(i);
int rootJ = find(j);
if (rootI != rootJ) {
// Union by Rank
if (rank[rootI] > rank[rootJ]) {
parent[rootJ] = rootI;
} else if (rank[rootI] < rank[rootJ]) {
parent[rootI] = rootJ;
} else {
parent[rootJ] = rootI;
rank[rootI]++;
}
return true;
}
return false;
}
}
type DSU struct {
parent []int
rank []int
}
func NewDSU(n int) *DSU {
p := make([]int, n)
for i := range p { p[i] = i }
return &DSU{parent: p, rank: make([]int, n)}
}
func (d *DSU) Find(i int) int {
if d.parent[i] != i {
d.parent[i] = d.Find(d.parent[i]) // Path Compression
}
return d.parent[i]
}
func (d *DSU) Union(i, j int) bool {
rootI := d.Find(i)
rootJ := d.Find(j)
if rootI != rootJ {
// Union by Rank
if d.rank[rootI] > d.rank[rootJ] {
d.parent[rootJ] = rootI
} else if d.rank[rootI] < d.rank[rootJ] {
d.parent[rootI] = rootJ
} else {
d.parent[rootJ] = rootI
d.rank[rootI]++
}
return true
}
return false
}
5. Complexity Analysis
| Approach | Time (Union/Find) | Space | Notes |
|---|---|---|---|
| Brute Force (Array) | O(N) / O(1) | O(N) | Union is very slow as it requires scanning the entire array. |
| Naive Trees | O(N) / O(N) | O(N) | Trees can degrade into linear linked lists without optimization. |
| Optimized (Path Compression + Rank) | O(α(N)) / O(α(N)) | O(N) | Effectively O(1). α(N) is the Inverse Ackermann function. |
α(N) is the Inverse Ackermann function. For all practical values of N (even universe size), α(N) ≤ 4.