String Search Algorithm Interview Patterns: KMP, Rabin-Karp, Z-Algorithm, and Rolling Hash (2025)

Naive String Search and Why to Avoid It

Naive string search: for each position i in text (0..n-m), compare pattern[0..m-1] with text[i..i+m-1]. Worst case O(n*m) – catastrophic for inputs like text = “aaaa…a” and pattern = “aaaa…ab”. All optimized algorithms improve on this by avoiding re-scanning characters we already know match. The key insight: when a mismatch occurs after k matched characters, we know the last k characters of text. We can use this knowledge to skip ahead instead of restarting from scratch.

KMP (Knuth-Morris-Pratt) – O(n + m)

def kmp_search(text: str, pattern: str) -> list[int]:
    def build_lps(pat: str) -> list[int]:
        # Longest Proper Prefix which is also Suffix
        lps = [0] * len(pat)
        length = 0
        i = 1
        while i < len(pat):
            if pat[i] == pat[length]:
                length += 1
                lps[i] = length
                i += 1
            elif length != 0:
                length = lps[length - 1]  # fall back, don't increment i
            else:
                lps[i] = 0
                i += 1
        return lps

    lps = build_lps(pattern)
    matches = []
    i = j = 0  # i = text index, j = pattern index

    while i < len(text):
        if text[i] == pattern[j]:
            i += 1; j += 1
        if j == len(pattern):
            matches.append(i - j)
            j = lps[j - 1]  # continue searching
        elif i < len(text) and text[i] != pattern[j]:
            if j != 0:
                j = lps[j - 1]  # skip using LPS
            else:
                i += 1
    return matches
# O(n + m): building LPS is O(m), search is O(n)
# Used for: LC 28 (Find Index of First Occurrence), repeated substring pattern
def rabin_karp(text: str, pattern: str) -> list[int]:
    MOD = 10**9 + 7
    BASE = 31  # prime base for hashing
    n, m = len(text), len(pattern)

    def char_val(c): return ord(c) - ord('a') + 1

    # Precompute pattern hash and first window hash
    pat_hash = 0
    win_hash = 0
    power = 1  # BASE^(m-1) mod MOD

    for i in range(m):
        pat_hash = (pat_hash * BASE + char_val(pattern[i])) % MOD
        win_hash = (win_hash * BASE + char_val(text[i])) % MOD
        if i < m - 1:
            power = power * BASE % MOD

    matches = []
    for i in range(n - m + 1):
        if win_hash == pat_hash and text[i:i+m] == pattern:  # verify to avoid hash collision
            matches.append(i)
        if i < n - m:
            # Roll the hash: remove leftmost char, add new right char
            win_hash = (win_hash - char_val(text[i]) * power) % MOD
            win_hash = (win_hash * BASE + char_val(text[i + m])) % MOD
            win_hash = (win_hash + MOD) % MOD  # ensure non-negative

    return matches
# O(n + m) average. Advantage: find K patterns simultaneously in O(n + sum(m_i))
# LC 187: Repeated DNA Sequences - rolling hash on k-length windows

Z-Algorithm – Pattern Matching via Z-Array

def z_algorithm(s: str) -> list[int]:
    n = len(s)
    z = [0] * n
    z[0] = n
    l = r = 0
    for i in range(1, n):
        if i < r:
            z[i] = min(r - i, z[i - l])
        while i + z[i]  r:
            l, r = i, i + z[i]
    return z

def z_search(text: str, pattern: str) -> list[int]:
    combined = pattern + '$' + text  # sentinel separates pattern from text
    z = z_algorithm(combined)
    m = len(pattern)
    return [i - m - 1 for i in range(m + 1, len(combined)) if z[i] == m]
# O(n + m). Z[i] = length of longest substring starting at i that matches prefix of s
# Advantage: simpler to implement than KMP for pattern matching

When to Use Each Algorithm

Naive O(n*m): only for very short strings or guaranteed no repeated prefixes. KMP O(n+m): single pattern search, standard choice for most interview problems. Strong when pattern has repeated prefixes (e.g., “abcabc”). LPS table is the key insight – practice building it. Rabin-Karp O(n+m) average: multiple patterns simultaneously, or when the “rolling window hash” insight is the point (LC 187 Repeated DNA Sequences). Z-algorithm O(n+m): cleaner implementation than KMP for some, same complexity. Also useful for string periodicity problems. All three are O(n+m) but KMP is most commonly tested in interviews – practice the LPS construction until you can do it from memory.

Meta coding interviews include string search problems like KMP and rolling hash. Review patterns for Meta interview: string search and pattern matching problems.

LinkedIn coding rounds test string algorithms including KMP. See common patterns for LinkedIn interview: string algorithm problems.

Apple coding interviews test string search algorithms. Review KMP and rolling hash patterns for Apple interview: string manipulation and search problems.

Scroll to Top