Tracking Connected Components

Note

This module explores the core principles of Union-Find (Disjoint Set Union), deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Problem Definition

Problem: Given a set of N elements, implement a data structure that can efficiently perform the following two operations:

  1. Union(A, B): Merge the set containing element A and the set containing element B.
  2. Find(A): Find the representative element (or root) of the set that contains A.

Constraints:

  • 1 <= N <= 10^5
  • Elements are represented as integers from 0 to N-1.

Examples:

  • Initialization: N = 5. Sets: {0}, {1}, {2}, {3}, {4}
  • Union(0, 1) -> Sets: {0, 1}, {2}, {3}, {4}
  • Union(1, 2) -> Sets: {0, 1, 2}, {3}, {4}
  • Find(0) == Find(2) -> true
  • Find(0) == Find(3) -> false

Edge Cases & Clarifying Questions

  • What if the elements are already connected? Union should return false or do nothing, which is often used for cycle detection.
  • Are node IDs 0-indexed or 1-indexed? This affects the initialization of the parent array. We assume 0-indexed for simplicity.
  • What if we union an element with itself? It should have no effect, as both Find calls return the same root.

Applications: Kruskal’s MST Algorithm, Cycle Detection in Undirected Graphs, Grid Connectivity (Number of Islands).

Optimization:

  • Path Compression: Point nodes directly to the root during find.
  • Union by Rank/Size: Attach the smaller tree to the larger tree to keep height low.
  • Result: Time complexity becomes O(alpha(N)) (Inverse Ackermann function), which is effectively O(1) for all practical N.

2. Intuition (The “Genesis”)

At its core, a Disjoint Set is a forest (a collection of trees). When we start, every element is its own tree with itself as the root.

The Corporate Merger Analogy

Imagine a world where every employee is the CEO of their own one-person company.

  • Union(A, B) represents a corporate merger. The CEO of company B agrees to report to the CEO of company A.
  • Find(A) is the process of climbing the corporate ladder to find the ultimate CEO of A’s current conglomerate.

Brute Force Approach

  • Array approach: We could use an array where arr[i] stores the component ID of node i. Find is O(1), but Union(A, B) requires changing the component ID for all nodes in A’s component to B’s component, taking O(N) time.
  • Tree approach (Naive): We use an array parent where parent[i] is the parent of node i. Union(A, B) just sets parent[root(A)] = root(B). But trees can become linear chains, making Find O(N).

Optimized Approach (The Pattern)

To achieve near O(1) performance, we apply two simple but powerful optimizations:

  1. Path Compression (during Find): Every time we traverse from a node to the root, we update the parent pointer of all nodes along the path to point directly to the root. This flattens the tree dynamically. In our analogy, if you ask HR who the ultimate CEO is, HR updates your file so you report directly to the CEO from now on, completely bypassing middle management.
  2. Union by Rank / Size (during Union): When merging two trees, always attach the shorter/smaller tree under the root of the taller/larger tree. This minimizes the maximum depth. In our analogy, the smaller company always gets absorbed into the larger company’s hierarchy to keep the overall corporate ladder as short as possible.

Common Pitfalls

  • Forgetting Path Compression: If you just do parent[i] = find(parent[i]) but don’t return it, or forget it entirely, Find degrades to O(N).
  • Unioning elements instead of roots: You must connect Find(A) to Find(B). Doing parent[A] = parent[B] breaks the tree structure!
  • Path Compression without recursion: While possible iteratively, the recursive one-liner parent[i] = Find(parent[i]) is much cleaner and less error-prone.

3. Interactive Analysis

Union nodes and watch them merge. Find highlights the path to the representative (Root).

Parents: [0, 1, 2, 3, 4, 5]

4. Implementation

Java
Go
class DSU {
  int[] parent;
  int[] rank;

  public DSU(int n) {
    parent = new int[n];
    rank = new int[n];
    for (int i = 0; i < n; i++) parent[i] = i;
  }

  public int find(int i) {
    if (parent[i] != i) {
      parent[i] = find(parent[i]); // Path Compression
    }
    return parent[i];
  }

  public boolean union(int i, int j) {
    int rootI = find(i);
    int rootJ = find(j);
    if (rootI != rootJ) {
      // Union by Rank
      if (rank[rootI] > rank[rootJ]) {
        parent[rootJ] = rootI;
      } else if (rank[rootI] < rank[rootJ]) {
        parent[rootI] = rootJ;
      } else {
        parent[rootJ] = rootI;
        rank[rootI]++;
      }
      return true;
    }
    return false;
  }
}
type DSU struct {
  parent []int
  rank   []int
}

func NewDSU(n int) *DSU {
  p := make([]int, n)
  for i := range p { p[i] = i }
  return &DSU{parent: p, rank: make([]int, n)}
}

func (d *DSU) Find(i int) int {
  if d.parent[i] != i {
    d.parent[i] = d.Find(d.parent[i]) // Path Compression
  }
  return d.parent[i]
}

func (d *DSU) Union(i, j int) bool {
  rootI := d.Find(i)
  rootJ := d.Find(j)
  if rootI != rootJ {
    // Union by Rank
    if d.rank[rootI] > d.rank[rootJ] {
      d.parent[rootJ] = rootI
    } else if d.rank[rootI] < d.rank[rootJ] {
      d.parent[rootI] = rootJ
    } else {
      d.parent[rootJ] = rootI
      d.rank[rootI]++
    }
    return true
  }
  return false
}

5. Complexity Analysis

Approach Time (Union/Find) Space Notes
Brute Force (Array) O(N) / O(1) O(N) Union is very slow as it requires scanning the entire array.
Naive Trees O(N) / O(N) O(N) Trees can degrade into linear linked lists without optimization.
Optimized (Path Compression + Rank) O(α(N)) / O(α(N)) O(N) Effectively O(1). α(N) is the Inverse Ackermann function.

α(N) is the Inverse Ackermann function. For all practical values of N (even universe size), α(N) ≤ 4.