String Search Algorithm Interview Patterns: KMP, Rabin-Karp, Z-Algorithm, and Rolling Hash (2025)

Naive String Search and Why to Avoid It

Naive string search: for each position i in text (0..n-m), compare pattern[0..m-1] with text[i..i+m-1]. Worst case O(n*m) – catastrophic for inputs like text = “aaaa…a” and pattern = “aaaa…ab”. All optimized algorithms improve on this by avoiding re-scanning characters we already know match. The key insight: when a mismatch occurs after k matched characters, we know the last k characters of text. We can use this knowledge to skip ahead instead of restarting from scratch.

KMP (Knuth-Morris-Pratt) – O(n + m)

def kmp_search(text: str, pattern: str) -> list[int]:
    def build_lps(pat: str) -> list[int]:
        # Longest Proper Prefix which is also Suffix
        lps = [0] * len(pat)
        length = 0
        i = 1
        while i < len(pat):
            if pat[i] == pat[length]:
                length += 1
                lps[i] = length
                i += 1
            elif length != 0:
                length = lps[length - 1]  # fall back, don't increment i
            else:
                lps[i] = 0
                i += 1
        return lps

    lps = build_lps(pattern)
    matches = []
    i = j = 0  # i = text index, j = pattern index

    while i < len(text):
        if text[i] == pattern[j]:
            i += 1; j += 1
        if j == len(pattern):
            matches.append(i - j)
            j = lps[j - 1]  # continue searching
        elif i < len(text) and text[i] != pattern[j]:
            if j != 0:
                j = lps[j - 1]  # skip using LPS
            else:
                i += 1
    return matches
# O(n + m): building LPS is O(m), search is O(n)
# Used for: LC 28 (Find Index of First Occurrence), repeated substring pattern

Rabin-Karp – Rolling Hash for Multiple Pattern Search

def rabin_karp(text: str, pattern: str) -> list[int]:
    MOD = 10**9 + 7
    BASE = 31  # prime base for hashing
    n, m = len(text), len(pattern)

    def char_val(c): return ord(c) - ord('a') + 1

    # Precompute pattern hash and first window hash
    pat_hash = 0
    win_hash = 0
    power = 1  # BASE^(m-1) mod MOD

    for i in range(m):
        pat_hash = (pat_hash * BASE + char_val(pattern[i])) % MOD
        win_hash = (win_hash * BASE + char_val(text[i])) % MOD
        if i < m - 1:
            power = power * BASE % MOD

    matches = []
    for i in range(n - m + 1):
        if win_hash == pat_hash and text[i:i+m] == pattern:  # verify to avoid hash collision
            matches.append(i)
        if i < n - m:
            # Roll the hash: remove leftmost char, add new right char
            win_hash = (win_hash - char_val(text[i]) * power) % MOD
            win_hash = (win_hash * BASE + char_val(text[i + m])) % MOD
            win_hash = (win_hash + MOD) % MOD  # ensure non-negative

    return matches
# O(n + m) average. Advantage: find K patterns simultaneously in O(n + sum(m_i))
# LC 187: Repeated DNA Sequences - rolling hash on k-length windows

Z-Algorithm – Pattern Matching via Z-Array

def z_algorithm(s: str) -> list[int]:
    n = len(s)
    z = [0] * n
    z[0] = n
    l = r = 0
    for i in range(1, n):
        if i < r:
            z[i] = min(r - i, z[i - l])
        while i + z[i]  r:
            l, r = i, i + z[i]
    return z

def z_search(text: str, pattern: str) -> list[int]:
    combined = pattern + '$' + text  # sentinel separates pattern from text
    z = z_algorithm(combined)
    m = len(pattern)
    return [i - m - 1 for i in range(m + 1, len(combined)) if z[i] == m]
# O(n + m). Z[i] = length of longest substring starting at i that matches prefix of s
# Advantage: simpler to implement than KMP for pattern matching

When to Use Each Algorithm

Naive O(n*m): only for very short strings or guaranteed no repeated prefixes. KMP O(n+m): single pattern search, standard choice for most interview problems. Strong when pattern has repeated prefixes (e.g., “abcabc”). LPS table is the key insight – practice building it. Rabin-Karp O(n+m) average: multiple patterns simultaneously, or when the “rolling window hash” insight is the point (LC 187 Repeated DNA Sequences). Z-algorithm O(n+m): cleaner implementation than KMP for some, same complexity. Also useful for string periodicity problems. All three are O(n+m) but KMP is most commonly tested in interviews – practice the LPS construction until you can do it from memory.


{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does the KMP algorithm achieve O(n + m) time complexity?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”KMP avoids rescanning characters by precomputing a Longest Proper Prefix which is also Suffix (LPS) array for the pattern. When a mismatch occurs after j matched characters, instead of resetting j to 0, KMP sets j = lps[j-1] – jumping back only as far as necessary. This means the text pointer i never moves backward: each character is compared at most once. Building the LPS table is O(m); the search loop is O(n). Total: O(n + m).”}},{“@type”:”Question”,”name”:”What is the LPS array in KMP and how do you build it?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The LPS (Longest Proper Prefix which is also Suffix) array stores, for each position i in the pattern, the length of the longest proper prefix of pattern[0..i] that is also a suffix. For pattern "abcabc": lps = [0,0,0,1,2,3]. Building: use two pointers (length, i). If pattern[i] == pattern[length]: lps[i] = ++length. If mismatch and length > 0: length = lps[length-1] (fall back, do not increment i). If mismatch and length == 0: lps[i] = 0, increment i. This is O(m).”}},{“@type”:”Question”,”name”:”When should you use Rabin-Karp instead of KMP?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use Rabin-Karp when: searching for multiple patterns simultaneously (hash each pattern, compare window hash against all pattern hashes in O(1) per position), or when the problem is fundamentally about rolling window hashes (LC 187 Repeated DNA Sequences). Rabin-Karp O(n+m) average but O(nm) worst case due to hash collisions – use double hashing to reduce collision probability. KMP is O(n+m) worst case and better for single-pattern search. For interview purposes, if the problem says "find all occurrences," KMP is usually the cleaner choice.”}},{“@type”:”Question”,”name”:”How does the Z-algorithm work for string matching?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The Z-array Z[i] stores the length of the longest substring starting at position i of string S that matches a prefix of S. To search for pattern P in text T: concatenate P + "$" + T (sentinel prevents Z values from spanning the boundary). Compute the Z-array. Any position i in the combined string where Z[i] == len(P) is a match starting at i – len(P) – 1 in T. Computing the Z-array uses a window [l, r] to reuse previously computed values, achieving O(n+m).”}},{“@type”:”Question”,”name”:”What is LC 28 (Find Index of First Occurrence) and how do you solve it with KMP?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”LC 28 asks for the index of the first occurrence of needle in haystack, or -1 if not found. KMP solution: build LPS for needle. Run the KMP search: match haystack characters against needle using the LPS table to skip on mismatch. Return the first match position. O(n + m) time, O(m) space. The brute force is O(n*m) and will TLE on large inputs with repeated characters. KMP is the optimal solution and the expected approach for this problem at FAANG companies.”}}]}

Meta coding interviews include string search problems like KMP and rolling hash. Review patterns for Meta interview: string search and pattern matching problems.

LinkedIn coding rounds test string algorithms including KMP. See common patterns for LinkedIn interview: string algorithm problems.

Apple coding interviews test string search algorithms. Review KMP and rolling hash patterns for Apple interview: string manipulation and search problems.

Scroll to Top