Question 1

How does the KMP algorithm achieve O(N+M) string matching?

Accepted Answer

KMP (Knuth-Morris-Pratt) avoids redundant comparisons by using a failure function that tells you: on a mismatch at position j in the pattern, what is the longest proper prefix of pattern[0..j-1] that is also a suffix? That prefix has already been matched, so jump there instead of restarting. Key property: the text pointer never moves backward -- it only advances. Each text character is compared at most twice (once as a new character, once after a failure function jump), giving O(N) for matching. The failure function is built in O(M) by applying the same idea to the pattern itself. Example: pattern ABCABD has failure = [0,0,0,1,2,0]. If mismatch at position 5 (D), failure[4]=2 tells us AB is already matched -- resume at position 2 of the pattern instead of restarting at 0. This skips re-comparing the AB prefix, saving O(M) comparisons per mismatch in the worst case.

Question 2

How does Rabin-Karp use rolling hash for pattern matching?

Accepted Answer

Rabin-Karp computes hash(pattern) and slides a window of length M over the text, computing hash(window) at each position. If hashes match, verify character-by-character (hash collisions are possible). The key optimization: a rolling hash updates in O(1) by removing the leftmost character contribution and adding the rightmost. Polynomial rolling hash: H = (c0 * p^(M-1) + c1 * p^(M-2) + ... + c(M-1)) mod q. Sliding: H_new = (H - c_old * p^(M-1)) * p + c_new. Time: O(N+M) expected, O(N*M) worst case if many collisions. Advantages over KMP: easily extends to multiple patterns (check hash against a set of K pattern hashes), 2D pattern matching, and approximate matching. Choose Rabin-Karp when searching for multiple patterns simultaneously or when the comparison operation is expensive.

Question 3

What is the Z-algorithm and how does it compare to KMP?

Accepted Answer

The Z-algorithm computes Z[i] = length of the longest substring starting at i that matches a prefix of the string. For pattern matching: concatenate pattern + $ + text, compute Z-array. Any position where Z[i] == len(pattern) is a match. Time: O(N+M). Compared to KMP: the Z-algorithm is simpler to implement (the Z-array computation is more intuitive than the failure function). Both are O(N+M). KMP is more traditional in interviews; Z-algorithm is popular in competitive programming. Z-algorithm also directly solves: string compression (shortest period = smallest k where Z[k] + k == N and N % k == 0), longest palindromic prefix (Z on original + $ + reversed), and counting distinct substrings. If you find KMP failure function confusing, the Z-algorithm is an equally powerful alternative that many engineers find easier to reason about.

Question 4

When should you use KMP versus Rabin-Karp versus brute force?

Accepted Answer

Single exact pattern match: KMP or Z-algorithm (both O(N+M), guaranteed). Use for implement strStr, repeated substring pattern, shortest palindrome. Multiple patterns simultaneously: Rabin-Karp (hash each pattern, O(N*K) average) or Aho-Corasick (O(N+M+Z) guaranteed, where Z is matches). Use for multi-pattern search, DNA motif finding. Approximate matching (allow K errors): dynamic programming edit distance approach. KMP and Rabin-Karp do not handle approximate matching. Simple interview problems (implement strStr with small inputs): brute force O(N*M) is acceptable and easier to code. Only optimize to KMP if the interviewer asks for O(N+M). Repeated queries on the same text: suffix array + LCP array. O(N log N) preprocessing, O(M log N) per query. For most interview problems, know KMP well -- it covers the majority of string matching questions.

Coding Interview: String Pattern Matching — KMP, Rabin-Karp, Z-Algorithm, Rolling Hash, Substring Search

Brute Force and Its Limitations

KMP (Knuth-Morris-Pratt) Algorithm

Rabin-Karp Algorithm

Z-Algorithm

When to Use Each Algorithm