Topological Sort: Kahn’s Algorithm and DFS Approach

Topological sort is the graph algorithm you need for scheduling problems: course prerequisites, build systems, task dependencies, and package managers. It only works on Directed Acyclic Graphs (DAGs) and there are two main approaches — each with different strengths.

What Is Topological Sort?

Given a DAG, produce a linear ordering of vertices such that for every directed edge (u → v), vertex u comes before v in the ordering.

Example: if course A is a prerequisite for course B, A must come before B in the schedule.

Approach 1: DFS Post-Order (Depth-First)

def topological_sort_dfs(graph: dict) -> list | None:
    """
    DFS-based topological sort.
    A node goes into the result AFTER all its dependencies are processed (post-order).
    Reverse the result to get the topological order.
    Returns None if graph has a cycle (no valid topological order exists).
    Time: O(V + E), Space: O(V)
    """
    visited = set()
    in_stack = set()  # Detect cycles (nodes currently in DFS path)
    result = []

    def dfs(node):
        if node in in_stack:
            return False  # Back edge = cycle
        if node in visited:
            return True   # Already processed

        in_stack.add(node)
        for neighbor in graph.get(node, []):
            if not dfs(neighbor):
                return False
        in_stack.remove(node)
        visited.add(node)
        result.append(node)  # Post-order: append after processing all neighbors
        return True

    for node in graph:
        if node not in visited:
            if not dfs(node):
                return None  # Cycle detected

    return result[::-1]  # Reverse post-order = topological order

# Example: Course prerequisites
graph = {
    'algorithms': ['data structures'],
    'data structures': ['programming basics'],
    'programming basics': [],
    'operating systems': ['programming basics'],
    'compilers': ['algorithms', 'data structures']
}
order = topological_sort_dfs(graph)
print(order)  # ['programming basics', 'data structures', 'algorithms', 'operating systems', 'compilers']
# (exact order among siblings may vary)

Approach 2: Kahn’s Algorithm (BFS / Indegree)

from collections import deque

def topological_sort_kahn(n: int, edges: list[tuple]) -> list | None:
    """
    Kahn's algorithm: process nodes with indegree=0 first (no dependencies).
    Returns None if a cycle exists (not all nodes get processed).
    Time: O(V + E), Space: O(V)
    """
    graph = [[] for _ in range(n)]
    indegree = [0] * n

    for u, v in edges:
        graph[u].append(v)
        indegree[v] += 1

    # Start with all nodes that have no prerequisites
    queue = deque([i for i in range(n) if indegree[i] == 0])
    result = []

    while queue:
        node = queue.popleft()
        result.append(node)

        for neighbor in graph[node]:
            indegree[neighbor] -= 1
            if indegree[neighbor] == 0:
                queue.append(neighbor)

    # If result doesn't contain all nodes, there's a cycle
    if len(result) != n:
        return None

    return result

# LeetCode 210: Course Schedule II
def find_order(num_courses: int, prerequisites: list[list[int]]) -> list[int]:
    """Return a valid course ordering, or [] if impossible (cycle)."""
    edges = [(pre, course) for course, pre in prerequisites]
    result = topological_sort_kahn(num_courses, edges)
    return result if result is not None else []

Which Approach to Use?

Criterion DFS (Post-order) Kahn’s (BFS)
Cycle detection Natural (in_stack set) Natural (check len(result) == n)
Parallelism hint No Yes — queue contains all nodes safe to process in parallel
Lexicographic order Hard Easy — use min-heap instead of deque
Multiple valid orderings Returns one Both return one; min-heap gives lexicographic smallest
Recursion limit concern Yes (use iterative DFS) No (always iterative)

Minimum Time to Complete All Tasks (Parallel Execution)

Kahn’s algorithm reveals which tasks can be done in parallel — the nodes in the queue at any moment are all executable simultaneously:

def min_time_parallel(n: int, edges: list[tuple], time: list[int]) -> int:
    """
    Find minimum time to complete all tasks when tasks can run in parallel.
    time[i] = duration of task i.
    Constraint: task v cannot start until task u completes (for each edge u->v).
    """
    graph = [[] for _ in range(n)]
    indegree = [0] * n

    for u, v in edges:
        graph[u].append(v)
        indegree[v] += 1

    # finish_time[i] = earliest time task i can complete
    finish_time = list(time)
    queue = deque([i for i in range(n) if indegree[i] == 0])

    while queue:
        node = queue.popleft()
        for neighbor in graph[node]:
            # Neighbor can't start until node finishes
            finish_time[neighbor] = max(
                finish_time[neighbor],
                finish_time[node] + time[neighbor]
            )
            indegree[neighbor] -= 1
            if indegree[neighbor] == 0:
                queue.append(neighbor)

    return max(finish_time)

Practice Problems

  • LeetCode 207: Course Schedule (just detect cycle)
  • LeetCode 210: Course Schedule II (return ordering)
  • LeetCode 310: Minimum Height Trees (find topological roots)
  • LeetCode 1136: Parallel Courses (minimum semesters)
  • LeetCode 329: Longest Increasing Path in a Matrix (topo sort + DP)

Related Graph Topics

  • Detect a Cycle in a Graph — topological sort requires a DAG; use cycle detection (DFS three-color or Kahn’s node count check) to validate before sorting
  • BFS vs DFS: When to Use Each — Kahn’s algorithm is BFS; DFS post-order is the other approach; same graph traversal primitives, different application
Scroll to Top