Union-Find (DSU) Interview Patterns (2025)

Union-Find (Disjoint Set Union) Interview Patterns

Union-Find (DSU) solves the dynamic connectivity problem: given a graph where edges are added one at a time, efficiently answer “are nodes X and Y in the same connected component?” With path compression and union by rank, both operations run in nearly O(1) amortized time (technically O(α(n)) where α is the inverse Ackermann function — effectively constant for all practical inputs).

Implementation

class UnionFind:
    def __init__(self, n: int):
        self.parent = list(range(n))   # parent[i] = i initially (each node is its own root)
        self.rank = [0] * n            # rank[i] = upper bound on tree height
        self.components = n            # number of connected components

    def find(self, x: int) -> int:
        """Find root of x's component with path compression."""
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])   # path compression: point to root
        return self.parent[x]

    def union(self, x: int, y: int) -> bool:
        """
        Merge components of x and y.
        Returns True if they were in different components (new connection).
        """
        root_x, root_y = self.find(x), self.find(y)
        if root_x == root_y:
            return False   # already connected

        # Union by rank: attach smaller tree under larger
        if self.rank[root_x]  bool:
        return self.find(x) == self.find(y)

Time: O(α(n)) per operation. Space: O(n). The two optimizations work together: path compression flattens trees during find; union by rank keeps trees shallow during union.

Pattern 1: Number of Connected Components (LC 323, 547)

def count_components(n: int, edges: list[list[int]]) -> int:
    uf = UnionFind(n)
    for u, v in edges:
        uf.union(u, v)
    return uf.components   # start at n, decremented on each successful union

Also solves LC 547 (Number of Provinces) and LC 684 (Redundant Connection). For graphs given as adjacency matrix: union connected pairs.

Pattern 2: Cycle Detection in Undirected Graph (LC 684)

def find_redundant_connection(edges: list[list[int]]) -> list[int]:
    """Return the last edge that creates a cycle."""
    n = len(edges)
    uf = UnionFind(n + 1)   # 1-indexed
    for u, v in edges:
        if not uf.union(u, v):   # union returns False = already connected = cycle
            return [u, v]
    return []

Cycle detected when union() returns False — both endpoints are already in the same component, so this edge creates a cycle.

Pattern 3: Kruskal’s Minimum Spanning Tree

MST connects all N nodes with N-1 edges of minimum total weight. Kruskal’s algorithm: sort all edges by weight, add each edge if it doesn’t create a cycle (union-find check).

def minimum_spanning_tree(n: int, edges: list[tuple]) -> int:
    """edges: list of (weight, u, v). Returns total MST weight."""
    uf = UnionFind(n)
    edges.sort()   # sort by weight
    total_weight = 0
    edges_used = 0
    for weight, u, v in edges:
        if uf.union(u, v):           # add edge if it connects two components
            total_weight += weight
            edges_used += 1
            if edges_used == n - 1:  # MST complete
                break
    return total_weight if edges_used == n - 1 else -1  # -1 if graph disconnected

LeetCode: 1584 (Min Cost to Connect All Points), 1135 (Connecting Cities With Minimum Cost).

Pattern 4: Accounts Merge (LC 721)

Group email accounts belonging to the same person. Two accounts are the same person if they share any email.

from collections import defaultdict

def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:
    email_to_id = {}   # email -> first account index that owns it
    uf = UnionFind(len(accounts))

    for i, account in enumerate(accounts):
        for email in account[1:]:    # skip name at index 0
            if email in email_to_id:
                uf.union(i, email_to_id[email])   # same person
            else:
                email_to_id[email] = i

    # Group emails by root account
    root_to_emails = defaultdict(set)
    for email, account_id in email_to_id.items():
        root_to_emails[uf.find(account_id)].add(email)

    # Build result
    result = []
    for root, emails in root_to_emails.items():
        result.append([accounts[root][0]] + sorted(emails))
    return result

Pattern 5: Number of Islands II (LC 305)

Dynamic connectivity: land cells are added one at a time; after each addition, report the count of islands.

def num_islands2(m: int, n: int, positions: list[list[int]]) -> list[int]:
    uf = UnionFind(m * n)
    grid = [[False] * n for _ in range(m)]
    result = []
    count = 0

    for r, c in positions:
        if grid[r][c]:          # duplicate position
            result.append(count)
            continue
        grid[r][c] = True
        count += 1              # new island
        for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
            nr, nc = r + dr, c + dc
            if 0 <= nr < m and 0 <= nc < n and grid[nr][nc]:
                if uf.union(r * n + c, nr * n + nc):
                    count -= 1  # merged two islands
        result.append(count)
    return result

When to Use Union-Find vs BFS/DFS

Scenario Use
Static graph, single connectivity query BFS/DFS — simpler code
Dynamic graph (edges added online) Union-Find — O(1) per update
Minimum spanning tree Union-Find (Kruskal’s)
Cycle detection in undirected graph Union-Find — cleaner than DFS
Number of components after N edge additions Union-Find — track component count

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What are path compression and union by rank in Union-Find?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Two optimizations that together make Union-Find nearly O(1) amortized per operation: (1) Path compression: during find(x), after traversing up to the root, set parent[node] = root for every node on the path. Future finds on these nodes jump directly to the root in O(1). In code: if parent[x] != x: parent[x] = find(parent[x]) — this one recursive line implements path compression. (2) Union by rank: when merging two trees, attach the shorter tree under the root of the taller tree. Prevents degenerate cases where the tree becomes a linked list (O(n) find). Track rank[root] = upper bound on tree height. When two equal-rank trees merge, increment the new root’s rank. Together, path compression + union by rank achieve O(α(n)) amortized time — inverse Ackermann function, effectively O(1) for all practical input sizes (α(n) ≤ 4 for n ≤ 10^600).”}},{“@type”:”Question”,”name”:”How do you use Union-Find to detect cycles in an undirected graph?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Process edges one at a time using union(). Before merging: find the roots of both endpoints. If the roots are the same, both nodes are already in the same component — adding this edge creates a cycle. If the roots differ, merge the components (no cycle yet). In code: for each edge (u, v): if find(u) == find(v): return “cycle found” (or return this edge for “Redundant Connection” problems); else: union(u, v). This works because Union-Find tracks connected components. A cycle exists if and only if we try to add an edge between two nodes that are already connected. Time: O(E * α(V)) where E = edges, V = vertices — essentially O(E). This is simpler than DFS-based cycle detection (which needs to track visited status and parent nodes) and works naturally with edge-list input format.”}},{“@type”:”Question”,”name”:”How does Kruskal’s algorithm use Union-Find to build a Minimum Spanning Tree?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Kruskal’s algorithm builds the MST greedily: sort all edges by weight (ascending), then iterate through edges adding each to the MST if it doesn’t create a cycle. Union-Find provides the cycle check in O(α(n)). Algorithm: sort edges by weight. Initialize Union-Find with n nodes. For each edge (u, v, weight): if not uf.connected(u, v): uf.union(u, v); mst_weight += weight; mst_edges.append((u,v,weight)); if len(mst_edges) == n-1: break (MST is complete). The MST has exactly n-1 edges for a connected graph of n nodes. Correctness: Kruskal’s is a special case of the greedy matroid intersection algorithm — the cut property guarantees that the minimum-weight edge crossing any cut is in the MST. Time: O(E log E) dominated by sorting. Kruskal’s is preferred over Prim’s when the graph is sparse (E << V^2); Prim's with a Fibonacci heap is preferred for dense graphs."}}]}

🏢 Asked at: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

🏢 Asked at: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture

🏢 Asked at: Apple Interview Guide 2026: iOS Systems, Hardware-Software Integration, and iCloud Architecture

🏢 Asked at: Coinbase Interview Guide

🏢 Asked at: Atlassian Interview Guide

Asked at: LinkedIn Interview Guide

Scroll to Top