Coding Interview: DP on Strings — Edit Distance, LCS, Regex Matching, Word Break, Interleaving Strings

Dynamic programming on strings is one of the most important and frequently tested DP categories. The pattern is consistent: define dp[i][j] as the answer for the first i characters of string s1 and first j characters of string s2. Transitions depend on whether the current characters match. This guide covers every major string DP problem with the transition logic explained clearly.

Longest Common Subsequence (LCS)

Given two strings, find the length of their longest common subsequence (characters in the same order, not necessarily contiguous). State: dp[i][j] = LCS length for s1[0..i-1] and s2[0..j-1]. Transition: if s1[i-1] == s2[j-1]: dp[i][j] = dp[i-1][j-1] + 1 (both characters match, extend LCS). Else: dp[i][j] = max(dp[i-1][j], dp[i][j-1]) (skip one character from either string, take the better option). Base: dp[0][j] = 0, dp[i][0] = 0 (empty string has LCS 0). Time: O(N*M). Space: O(N*M), optimizable to O(min(N,M)) with rolling arrays. Reconstruction: trace back from dp[N][M]. If characters match, include and move diagonally. Else move toward the larger value. Variants: (1) Longest Common Substring (contiguous) — reset dp to 0 on mismatch: dp[i][j] = dp[i-1][j-1] + 1 if match, else 0. Track maximum. (2) Shortest Common Supersequence — length = N + M – LCS. Build by interleaving both strings, sharing LCS characters. (3) Minimum Insertions to Make Palindrome — LCS of s and reverse(s). Answer: N – LCS length.

Edit Distance (Levenshtein)

Minimum operations (insert, delete, replace) to transform s1 into s2. State: dp[i][j] = edit distance between s1[0..i-1] and s2[0..j-1]. Transition: if s1[i-1] == s2[j-1]: dp[i][j] = dp[i-1][j-1] (match, no operation). Else: dp[i][j] = 1 + min(dp[i-1][j] (delete from s1), dp[i][j-1] (insert into s1), dp[i-1][j-1] (replace)). Base: dp[0][j] = j (insert j characters), dp[i][0] = i (delete i characters). Time: O(N*M). Applications: spell correction (find dictionary words within edit distance 1-2), DNA sequence alignment (with weighted operations), and diff algorithms (computing the minimum changes between two files). Variants: (1) One Edit Distance — are s1 and s2 exactly 1 edit apart? O(N) without DP: if lengths differ by > 1, false. Find the first mismatch, check if remaining characters match after applying the edit. (2) Delete Operations Only — minimum deletions to make two strings equal. Same as N + M – 2*LCS (delete non-LCS characters from both). (3) Minimum ASCII Delete Sum — weighted version where cost = ASCII value of deleted characters.

Regular Expression and Wildcard Matching

Wildcard Matching: pattern with ? (any single char) and * (any sequence including empty). State: dp[i][j] = does s[0..i-1] match p[0..j-1]? Transition: if p[j-1] == ? or p[j-1] == s[i-1]: dp[i][j] = dp[i-1][j-1]. If p[j-1] == *: dp[i][j] = dp[i][j-1] (star matches empty) OR dp[i-1][j] (star matches one more character). Else: dp[i][j] = false. Base: dp[0][0] = true. dp[0][j] = true only if p[0..j-1] is all stars. Time: O(N*M). Regular Expression Matching: pattern with . (any single char) and * (zero or more of the preceding element). More complex: * refers to the PRECEDING character. State: dp[i][j] = does s[0..i-1] match p[0..j-1]? Transition: if p[j-1] == . or p[j-1] == s[i-1]: dp[i][j] = dp[i-1][j-1]. If p[j-1] == *: the * applies to p[j-2]. Two cases: dp[i][j] = dp[i][j-2] (zero occurrences of p[j-2]) OR (if p[j-2] == . or p[j-2] == s[i-1]): dp[i][j] |= dp[i-1][j] (one more occurrence). Base: dp[0][0] = true. dp[0][j] depends on * patterns matching empty. Time: O(N*M).

Word Break and Interleaving

Word Break: can string s be segmented into dictionary words? State: dp[i] = can s[0..i-1] be segmented? Transition: dp[i] = true if there exists j < i where dp[j] is true and s[j..i-1] is in the dictionary. Base: dp[0] = true (empty string). Time: O(N^2 * M) with hash set dictionary (M = average word length for substring creation). Optimization: only check j values where i-j dp[i][j]. Transitions based on character match at positions i-1 and j-1.

Distinct Subsequences and Palindrome DP

Distinct Subsequences: count the number of distinct subsequences of s that equal t. State: dp[i][j] = count for s[0..i-1] and t[0..j-1]. Transition: if s[i-1] == t[j-1]: dp[i][j] = dp[i-1][j-1] + dp[i-1][j]. The matching character can be used (dp[i-1][j-1]) or skipped (dp[i-1][j]). Else: dp[i][j] = dp[i-1][j] (skip s[i-1]). Base: dp[i][0] = 1 (empty t is a subsequence of any string). dp[0][j>0] = 0. Time: O(N*M). Longest Palindromic Subsequence: dp[i][j] = LPS length for s[i..j]. Transition: if s[i] == s[j]: dp[i][j] = dp[i+1][j-1] + 2. Else: dp[i][j] = max(dp[i+1][j], dp[i][j-1]). Iterate by length (j-i) from 0 to N-1. Time: O(N^2). Alternative: LPS = LCS(s, reverse(s)). Palindrome Partitioning (minimum cuts): dp[i] = minimum cuts for s[0..i]. Pre-compute isPalin[i][j]. Transition: dp[i] = min(dp[j] + 1) for all j < i where s[j+1..i] is a palindrome. Time: O(N^2). These problems all follow the same template: define what dp[i][j] represents, identify the character-based transition, and handle base cases carefully.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is the universal template for DP on strings?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Two strings -> dp[i][j] where i indexes string s1 and j indexes s2. dp[i][j] represents the answer for s1[0..i-1] and s2[0..j-1]. Transitions depend on whether s1[i-1] == s2[j-1]: LCS: match -> dp[i-1][j-1]+1. No match -> max(dp[i-1][j], dp[i][j-1]). Edit Distance: match -> dp[i-1][j-1]. No match -> 1+min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1]). Wildcard (*): dp[i][j-1] (empty) OR dp[i-1][j] (extend). Regex (x*): dp[i][j-2] (zero) OR dp[i-1][j] if chars match. Word Break: single string dp[i], check all j<i where dp[j] AND s[j..i-1] in dict. Base cases: dp[0][0] usually true or 0. dp[0][j] and dp[i][0] depend on the problem. Time: O(N*M). Space: O(N*M), optimizable to O(M) with rolling arrays."}},{"@type":"Question","name":"How does Regular Expression Matching differ from Wildcard Matching in DP?","acceptedAnswer":{"@type":"Answer","text":"Wildcard (* matches any sequence): if p[j-1]==*: dp[i][j] = dp[i][j-1] (empty match) OR dp[i-1][j] (extend by one char). The * is independent — it matches anything on its own. Regex (* means zero or more of PRECEDING element): if p[j-1]==*: it refers to p[j-2]. dp[i][j] = dp[i][j-2] (zero occurrences of p[j-2]) OR, if p[j-2]==. or p[j-2]==s[i-1]: dp[i][j] |= dp[i-1][j] (one more occurrence). The regex * is dependent on the previous character, making transitions more complex. Both are O(N*M). The key difference: wildcard * is standalone, regex * modifies the preceding character. This makes regex matching harder — you must always consider the preceding character when processing *."}}]}
Scroll to Top