Dynamic Programming on Graphs: Shortest Path DP, DAG DP, and Tree DP (2025)

When to Use DP on Graphs

DP on graphs works when the graph has structure that prevents cycles in the DP recurrence: DAGs (directed acyclic graphs), trees, or layered graphs. For general graphs with cycles, DP requires bitmask over visited nodes (TSP). Key insight: topological order on a DAG gives the computation order for DP — process nodes in topological order, referencing already-computed values for predecessors.

Bellman-Ford: DP on General Graphs

Shortest path with negative edges. dp[k][v] = minimum distance to reach v using at most k edges. Recurrence: dp[k][v] = min(dp[k-1][v], min over all edges (u,v): dp[k-1][u] + weight(u,v)). Run for k = 1..n-1 iterations. If dp[n][v] < dp[n-1][v] for any v: negative cycle detected. Time: O(V*E). Space optimization: use a single array and relax in place (but lose the "at most k edges" property). Bellman-Ford is a DP on the graph structure — each iteration relaxes one more "hop".

def bellman_ford(n, edges, src):
    dist = [float('inf')] * n
    dist[src] = 0
    for _ in range(n - 1):
        for u, v, w in edges:
            if dist[u] + w < dist[v]:
                dist[v] = dist[u] + w
    # check negative cycle
    for u, v, w in edges:
        if dist[u] + w < dist[v]:
            return None  # negative cycle
    return dist

DAG DP: Longest Path

Longest path in a DAG (NP-hard for general graphs, polynomial for DAGs). Topological sort first, then: dp[v] = max(dp[u] + weight(u,v)) for all predecessors u of v. The topological order ensures all predecessors are computed before v. Applications: critical path in project scheduling (longest path = minimum project duration), number of ways to reach a node (sum instead of max), longest increasing subsequence (reduce to DAG DP on sorted elements). LIS as DAG DP: edge from i to j if i < j and nums[i] < nums[j]; longest path = LIS length. But the O(n log n) patience sort is faster than explicit DAG DP.

Tree DP

Many tree problems require computing values that depend on subtree structure. Standard pattern: define dp[v] = some property of the subtree rooted at v. Compute by post-order DFS (leaves first). Example: maximum path sum (LC 124). For each node, compute the max gain from its left subtree and right subtree: max_gain(v) = max(0, max_gain(left)) + max(0, max_gain(right)) + val[v]. Update global max with this. Return max(0, val[v] + max(max_gain(left), max_gain(right))) — the one-sided path usable by the parent.

def max_path_sum(root):
    res = [float('-inf')]
    def dp(node):
        if not node: return 0
        left = max(0, dp(node.left))
        right = max(0, dp(node.right))
        res[0] = max(res[0], node.val + left + right)
        return node.val + max(left, right)
    dp(root)
    return res[0]

Tree Diameter

Diameter = longest path between any two nodes. Two approaches: (1) Two-BFS: BFS from any node to find the farthest node u; BFS from u to find the farthest node v; distance(u,v) = diameter. O(V). (2) Tree DP: for each node, diameter through that node = depth of longest path in left subtree + depth of longest path in right subtree. Post-order DFS, maintain global max. Both are O(V). The tree DP generalizes to weighted trees and is the basis for solving harder subtree problems.

Bitmask DP: Traveling Salesman Problem

Visit all n cities exactly once, minimize total distance. dp[mask][v] = minimum cost to visit exactly the cities in mask, ending at v. Transition: dp[mask | (1 << u)][u] = min over all u not in mask: dp[mask][v] + dist[v][u]. Final answer: min over all v: dp[(1<<n)-1][v] + dist[v][0]. Time: O(2^n * n^2). Space: O(2^n * n). Practical for n <= 20. Reconstruct path by storing parent pointers alongside DP values. This pattern applies to any problem asking for minimum cost to cover all elements of a small set.

Floyd-Warshall: All-Pairs Shortest Path

dp[k][i][j] = shortest path from i to j using only nodes 0..k as intermediates. Recurrence: dp[k][i][j] = min(dp[k-1][i][j], dp[k-1][i][k] + dp[k-1][k][j]). Space optimization: update in-place (dp[i][j] = min(dp[i][j], dp[i][k] + dp[k][j])). O(V^3) time, O(V^2) space. Detects negative cycles: if dp[i][i] < 0 after running, there is a negative cycle through node i. Use when you need all-pairs shortest paths or when the graph is dense.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the key insight for applying DP to graphs?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “DP on graphs requires a topological order so that each subproblem depends only on already-solved subproblems. DAGs have a natural topological order. For trees, post-order DFS (leaves first) is the topological order. For general graphs with cycles, bitmask DP encodes the visited set into the state, preventing cycles in the DP recurrence. The three main variants: DAG DP (topological sort, then DP), Tree DP (post-order DFS), Bitmask DP (TSP, coverage problems). Bellman-Ford is also DP: dp[k][v] = shortest path using at most k edges; the iteration index k imposes order.”
}
},
{
“@type”: “Question”,
“name”: “How does Tree DP work for the binary tree maximum path sum problem?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Define dp(node) = maximum one-sided path sum starting at this node going downward (usable by the parent). Base case: dp(null) = 0. For each node: left_gain = max(0, dp(left)); right_gain = max(0, dp(right)). The path through this node = node.val + left_gain + right_gain — update the global maximum with this. Return node.val + max(left_gain, right_gain) to the parent (one-sided path). The key is the two-value pattern: a function returns the value usable by the parent (one-sided), while also updating a global maximum with the full path (two-sided) at each node. O(n) time, O(h) space for the recursion stack (h = tree height).”
}
},
{
“@type”: “Question”,
“name”: “How does Bellman-Ford detect negative cycles?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Bellman-Ford runs n-1 relaxation passes (where n = number of vertices). After n-1 passes, if the graph has no negative cycles, all shortest paths are finalized (a shortest path visits at most n-1 edges in a graph with n nodes). To detect negative cycles: run a final n-th pass. If any distance can still be reduced (dist[u] + w < dist[v] for some edge (u,v,w)), there is a negative cycle. A negative cycle means there is no finite shortest path — you can keep traversing the cycle to reduce the distance indefinitely. To find which nodes are affected by negative cycles: after detecting, BFS/DFS from the nodes where the n-th pass still relaxes to mark all reachable nodes as having distance -infinity."
}
},
{
"@type": "Question",
"name": "What is bitmask DP and how does it solve TSP?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Bitmask DP uses an integer to represent a subset: bit i = 1 means city i has been visited. State: dp[mask][v] = minimum cost to visit exactly the cities in mask, currently at city v. Transition: dp[mask | (1<<u)][u] = min(dp[mask][v] + dist[v][u]) for all unvisited u (bit u not in mask). Base: dp[1<<src][src] = 0. Answer: min over all v of dp[(1<<n)-1][v] + dist[v][0] (return to start). Time: O(2^n * n^2). Space: O(2^n * n). Practical for n <= 20. Reconstruct the path by storing which city was chosen at each transition. The bitmask state is the key: it encodes which cities have been visited, preventing revisiting — the equivalent of a visited set, but usable in DP."
}
},
{
"@type": "Question",
"name": "How is Floyd-Warshall different from running Dijkstra from every node?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Floyd-Warshall computes all-pairs shortest paths in O(V^3) time and O(V^2) space. It handles negative edge weights (but not negative cycles). Running Dijkstra from every node: O(V * (E + V) log V) with a binary heap. For dense graphs (E = V^2): Dijkstra from all nodes is O(V^3 log V), slower than Floyd-Warshall O(V^3). For sparse graphs: Dijkstra from all nodes is O(V*E log V), faster than Floyd-Warshall. Floyd-Warshall is simpler to implement (3 nested loops) and handles negative weights. Dijkstra cannot handle negative weights (use Bellman-Ford or Johnson's algorithm instead for sparse graphs with negative weights). Floyd-Warshall also detects negative cycles: if dp[i][i] < 0 after running, node i is on a negative cycle."
}
}
]
}

Asked at: Meta Interview Guide

Asked at: Apple Interview Guide

Asked at: LinkedIn Interview Guide

Asked at: Databricks Interview Guide

Scroll to Top