String pattern matching — finding a pattern within a larger text — is a fundamental algorithm that powers text editors, search engines, DNA sequence analysis, and plagiarism detection. While brute force is O(N*M), several algorithms achieve O(N+M). This guide covers KMP, Rabin-Karp, and the Z-algorithm with their interview applications and implementation details.
Brute Force and Its Limitations
Naive approach: for each position i in the text, check if the pattern matches starting at i. Compare text[i..i+M-1] with pattern[0..M-1]. If all characters match, found. If any mismatch, move to i+1 and restart. Time: O(N*M) worst case (e.g., text = “AAAAAAB”, pattern = “AAAB” — at each position, you compare M characters before finding the mismatch). For most practical inputs (random text), the average case is closer to O(N) because mismatches happen early. But for structured or adversarial inputs, the worst case is real. The key inefficiency: when a mismatch occurs at position j in the pattern, brute force discards all information from the partial match and restarts. KMP and Z-algorithm exploit the structure of the pattern to avoid redundant comparisons.
KMP (Knuth-Morris-Pratt) Algorithm
KMP preprocesses the pattern to build a failure function (also called the prefix function or partial match table). The failure function tells you: if a mismatch occurs at position j in the pattern, what is the longest proper prefix of pattern[0..j-1] that is also a suffix? This prefix has already been matched — no need to re-check it. Failure function construction: for pattern “ABCABD”: failure[0] = 0 (no proper prefix), failure[1] = 0 (“AB” has no matching prefix-suffix), failure[2] = 0, failure[3] = 1 (“ABCA” — “A” is both prefix and suffix), failure[4] = 2 (“ABCAB” — “AB” is both prefix and suffix), failure[5] = 0. Matching: compare text[i] with pattern[j]. If match, advance both. If mismatch: if j > 0, set j = failure[j-1] (jump back to the longest matching prefix). If j == 0, advance i. The text pointer i never moves backward — it only advances. Time: O(N+M) — N for matching, M for building the failure function. Space: O(M) for the failure function. KMP is the standard algorithm for single-pattern exact matching.
Rabin-Karp Algorithm
Rabin-Karp uses hashing to compare the pattern with substrings of the text. Compute hash(pattern). Slide a window of length M over the text. At each position, compute hash(text[i..i+M-1]). If the hash matches hash(pattern), verify with a character-by-character comparison (to handle hash collisions). Rolling hash: computing the hash from scratch at each position is O(M), making the total O(N*M). A rolling hash updates the hash in O(1) by removing the contribution of the leftmost character and adding the rightmost. Polynomial rolling hash: H = (c0 * p^(M-1) + c1 * p^(M-2) + … + c(M-1)) mod q. When sliding: H_new = (H – c_old * p^(M-1)) * p + c_new. Time: O(N+M) expected. Worst case O(N*M) if many hash collisions occur (adversarial input). Advantages over KMP: extends easily to multi-pattern matching (check if the hash matches any of K patterns), 2D pattern matching (search for a matrix pattern in a larger matrix), and approximate matching (find substrings with similar hashes). Rabin-Karp is preferred when you need to search for multiple patterns simultaneously or when the pattern comparison is expensive.
Z-Algorithm
The Z-algorithm computes the Z-array: Z[i] = the length of the longest substring starting at position i that matches a prefix of the string. For string “aabxaab”: Z = [-, 1, 0, 0, 3, 1, 0] (Z[0] is undefined or equals the string length). Pattern matching with Z: concatenate pattern + “$” + text (the “$” is a sentinel not in either string). Compute the Z-array. Any position i where Z[i] == len(pattern) is a match starting at position i – len(pattern) – 1 in the text. Time: O(N+M). The Z-algorithm is simpler to implement than KMP and solves the same problems. It also solves additional problems: find the longest prefix of the string that appears as a substring elsewhere, count the number of distinct substrings, and string compression (find the shortest period). Interview applications: (1) Pattern matching (as above). (2) String compression: the shortest period of “abcabcabc” is “abc” (length 3). If N % period == 0, the string is periodic. (3) Longest palindromic prefix: reverse the string, concatenate original + “$” + reversed, compute Z-array on the concatenated string. The largest Z[i] in the reversed part that equals N-i gives the palindromic prefix.
When to Use Each Algorithm
Decision guide: (1) Single pattern, exact match — KMP or Z-algorithm. Both O(N+M). KMP is more commonly asked in interviews. Z-algorithm is simpler to implement. (2) Multiple patterns simultaneously — Rabin-Karp (hash each pattern, check rolling hash against all) or Aho-Corasick (automaton-based, O(N + M + Z) where Z is matches). (3) Approximate matching (allow K mismatches or edits) — dynamic programming (edit distance), not KMP. (4) Repeated substring queries on the same text — suffix array + LCP array. Preprocessing O(N log N), each query O(M log N). (5) Simple interview problems (“implement strStr()”) — KMP or even brute force with explanation of the optimization. In interviews, the most commonly asked pattern matching problems are: implement strStr (KMP), repeated substring pattern (Z-array or KMP failure function), shortest palindrome (KMP or Z on reversed + original), and longest happy prefix (KMP failure function on the string itself). Know KMP well — it appears in ~5% of string problems at top companies.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does the KMP algorithm achieve O(N+M) string matching?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”KMP (Knuth-Morris-Pratt) avoids redundant comparisons by using a failure function that tells you: on a mismatch at position j in the pattern, what is the longest proper prefix of pattern[0..j-1] that is also a suffix? That prefix has already been matched, so jump there instead of restarting. Key property: the text pointer never moves backward — it only advances. Each text character is compared at most twice (once as a new character, once after a failure function jump), giving O(N) for matching. The failure function is built in O(M) by applying the same idea to the pattern itself. Example: pattern ABCABD has failure = [0,0,0,1,2,0]. If mismatch at position 5 (D), failure[4]=2 tells us AB is already matched — resume at position 2 of the pattern instead of restarting at 0. This skips re-comparing the AB prefix, saving O(M) comparisons per mismatch in the worst case.”}},{“@type”:”Question”,”name”:”How does Rabin-Karp use rolling hash for pattern matching?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Rabin-Karp computes hash(pattern) and slides a window of length M over the text, computing hash(window) at each position. If hashes match, verify character-by-character (hash collisions are possible). The key optimization: a rolling hash updates in O(1) by removing the leftmost character contribution and adding the rightmost. Polynomial rolling hash: H = (c0 * p^(M-1) + c1 * p^(M-2) + … + c(M-1)) mod q. Sliding: H_new = (H – c_old * p^(M-1)) * p + c_new. Time: O(N+M) expected, O(N*M) worst case if many collisions. Advantages over KMP: easily extends to multiple patterns (check hash against a set of K pattern hashes), 2D pattern matching, and approximate matching. Choose Rabin-Karp when searching for multiple patterns simultaneously or when the comparison operation is expensive.”}},{“@type”:”Question”,”name”:”What is the Z-algorithm and how does it compare to KMP?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The Z-algorithm computes Z[i] = length of the longest substring starting at i that matches a prefix of the string. For pattern matching: concatenate pattern + $ + text, compute Z-array. Any position where Z[i] == len(pattern) is a match. Time: O(N+M). Compared to KMP: the Z-algorithm is simpler to implement (the Z-array computation is more intuitive than the failure function). Both are O(N+M). KMP is more traditional in interviews; Z-algorithm is popular in competitive programming. Z-algorithm also directly solves: string compression (shortest period = smallest k where Z[k] + k == N and N % k == 0), longest palindromic prefix (Z on original + $ + reversed), and counting distinct substrings. If you find KMP failure function confusing, the Z-algorithm is an equally powerful alternative that many engineers find easier to reason about.”}},{“@type”:”Question”,”name”:”When should you use KMP versus Rabin-Karp versus brute force?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Single exact pattern match: KMP or Z-algorithm (both O(N+M), guaranteed). Use for implement strStr, repeated substring pattern, shortest palindrome. Multiple patterns simultaneously: Rabin-Karp (hash each pattern, O(N*K) average) or Aho-Corasick (O(N+M+Z) guaranteed, where Z is matches). Use for multi-pattern search, DNA motif finding. Approximate matching (allow K errors): dynamic programming edit distance approach. KMP and Rabin-Karp do not handle approximate matching. Simple interview problems (implement strStr with small inputs): brute force O(N*M) is acceptable and easier to code. Only optimize to KMP if the interviewer asks for O(N+M). Repeated queries on the same text: suffix array + LCP array. O(N log N) preprocessing, O(M log N) per query. For most interview problems, know KMP well — it covers the majority of string matching questions.”}}]}