When should you use a Trie instead of a HashSet for string problems?

Use a Trie when: (1) you need prefix queries - "find all words with prefix 'app'" is O(m) with Trie vs. O(n*m) scanning a HashSet; (2) autocomplete - Trie naturally groups words by prefix; (3) word search on a 2D grid with dictionary - Trie enables pruning dead-end prefixes during DFS, dramatically reducing work; (4) finding the longest common prefix of a set of strings - traverse the Trie until a branching node. Use a HashSet when you only need exact match lookups. The Trie advantage is prefix operations; its cost is higher memory (each character is a node) and more complex implementation.

How does the KMP algorithm avoid redundant comparisons?

Naive pattern matching has O(n*m) complexity because on a mismatch, it backtracks the text pointer. KMP pre-computes an LPS (Longest Proper Prefix which is also Suffix) array for the pattern. LPS[i] = the length of the longest prefix of pattern[0..i] that is also a suffix. On a mismatch at pattern position j, instead of restarting from pattern[0], jump to pattern[LPS[j-1]] - reusing the already-matched prefix. The text pointer never backtracks. Total comparisons: at most 2n (each character is compared at most twice), giving O(n+m) total. Example: pattern "ABCABD", LPS=[0,0,0,1,2,0]. A mismatch at position 5 (D) jumps back to position 2 (C), reusing the "AB" prefix that already matched.

What is rolling hash in Rabin-Karp and why is it efficient?

Rabin-Karp uses a polynomial rolling hash to compare a pattern against all substrings of the text in O(1) per window. Hash of a string: h = (c0*B^(m-1) + c1*B^(m-2) + ... + cm-1) mod P. When sliding the window right by one: new_hash = (old_hash - c0*B^(m-1)) * B + new_char. This removes the leftmost character and adds the new rightmost character in O(1). Total: O(n) hash computations + O(m) verification on matches. Average O(n+m). Rabin-Karp shines for multi-pattern search: hash all K patterns, then slide one window over the text and check each hash against a HashSet of pattern hashes - O(n + Km) total instead of O(nKm) naive.

Trie and String Matching Interview Patterns: Autocomplete, KMP, Rabin-Karp

⏱ 6 min read

Trie and String Matching Interview Patterns

Tries (prefix trees) and string matching algorithms are a distinct category in coding interviews. They appear in autocomplete, spell checker, word search, and IP routing problems. This guide covers the Trie data structure, its key operations, and the classic string matching algorithms you need to know.

Trie Data Structure

A Trie stores strings by their prefixes. Each node represents one character; the path from root to a node spells out a prefix.

class TrieNode:
    def __init__(self):
        self.children = {}   # char -> TrieNode
        self.is_end = False  # marks end of a word

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str) -> None:
        node = self.root
        for ch in word:
            if ch not in node.children:
                node.children[ch] = TrieNode()
            node = node.children[ch]
        node.is_end = True

    def search(self, word: str) -> bool:
        node = self.root
        for ch in word:
            if ch not in node.children: return False
            node = node.children[ch]
        return node.is_end

    def starts_with(self, prefix: str) -> bool:
        node = self.root
        for ch in prefix:
            if ch not in node.children: return False
            node = node.children[ch]
        return True

Time: O(m) for insert/search/prefix, where m = word length. Space: O(total characters across all words).

Autocomplete (Design Search Autocomplete System — LeetCode 642)

Each TrieNode stores the top-3 matching sentences for its prefix:

class AutocompleteSystem:
    def __init__(self, sentences, times):
        self.root = TrieNode()
        self.curr_input = []
        for s, t in zip(sentences, times):
            self._insert(s, t)

    def _insert(self, s, count):
        node = self.root
        for ch in s:
            if ch not in node.children:
                node.children[ch] = TrieNode()
            node = node.children[ch]
            # keep top 3 by count at each node
            node.top3 = sorted(node.top3 + [(count, s)],
                               key=lambda x: (-x[0], x[1]))[:3]
        node.is_end = True

    def input(self, c):
        if c == '#':
            self._insert(''.join(self.curr_input), 1)
            self.curr_input = []
            return []
        self.curr_input.append(c)
        node = self.root
        for ch in self.curr_input:
            if ch not in node.children: return []
            node = node.children[ch]
        return [s for _, s in node.top3]

Word Search II (LeetCode 212) — Trie + Backtracking

Find all words from a dictionary that exist in a 2D grid. Build a Trie from the dictionary, then DFS from each cell:

def find_words(board, words):
    trie = Trie()
    for w in words: trie.insert(w)
    result = set()
    rows, cols = len(board), len(board[0])

    def dfs(node, r, c, path):
        if node.is_end: result.add(path)
        if not (0 <= r < rows and 0 <= c < cols): return
        ch = board[r][c]
        if ch not in node.children or ch == '#': return
        board[r][c] = '#'                  # mark visited
        next_node = node.children[ch]
        for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
            dfs(next_node, r+dr, c+dc, path+ch)
        board[r][c] = ch                   # restore

    for r in range(rows):
        for c in range(cols):
            dfs(trie.root, r, c, '')
    return list(result)
# Trie pruning avoids re-exploring dead-end prefixes

KMP (Knuth-Morris-Pratt) Pattern Matching

Find all occurrences of pattern P in text T in O(n+m) time (vs O(n*m) naive).

def kmp_search(text, pattern):
    def build_lps(pattern):
        # lps[i] = length of longest proper prefix of pattern[:i+1]
        # that is also a suffix
        lps = [0] * len(pattern)
        length, i = 0, 1
        while i < len(pattern):
            if pattern[i] == pattern[length]:
                length += 1
                lps[i] = length
                i += 1
            elif length:
                length = lps[length - 1]  # don't increment i
            else:
                lps[i] = 0
                i += 1
        return lps

    lps = build_lps(pattern)
    matches = []
    i = j = 0   # i = text index, j = pattern index
    while i < len(text):
        if text[i] == pattern[j]:
            i += 1; j += 1
        if j == len(pattern):
            matches.append(i - j)
            j = lps[j - 1]
        elif i < len(text) and text[i] != pattern[j]:
            if j: j = lps[j - 1]
            else: i += 1
    return matches

The LPS (Longest Proper Prefix which is also Suffix) array lets KMP skip redundant comparisons by reusing already-matched prefix information.

Rabin-Karp Rolling Hash

Use hashing to find pattern occurrences in O(n+m) average. Useful for multi-pattern search.

def rabin_karp(text, pattern):
    BASE, MOD = 26, 10**9 + 7
    n, m = len(text), len(pattern)
    if m > n: return []

    def char_val(c): return ord(c) - ord('a')

    # Compute hash of pattern and first window
    p_hash = t_hash = 0
    high = pow(BASE, m-1, MOD)
    for i in range(m):
        p_hash = (p_hash * BASE + char_val(pattern[i])) % MOD
        t_hash = (t_hash * BASE + char_val(text[i])) % MOD

    matches = []
    for i in range(n - m + 1):
        if t_hash == p_hash and text[i:i+m] == pattern:  # verify to avoid hash collision
            matches.append(i)
        if i < n - m:
            t_hash = (t_hash - char_val(text[i]) * high) % MOD
            t_hash = (t_hash * BASE + char_val(text[i+m])) % MOD
    return matches

IP Address Routing (Longest Prefix Match)

Routers use a bitwise Trie to find the longest matching prefix for an IP address. Each bit of the IP (0 or 1) is one level in the Trie. This is how CIDR routing tables work — lookup is O(32) for IPv4, O(128) for IPv6.

Common Trie Problems

LeetCode 208: Implement Trie
LeetCode 211: Design Add and Search Words (wildcard '.' matching)
LeetCode 212: Word Search II (Trie + backtracking)
LeetCode 642: Design Search Autocomplete System
LeetCode 648: Replace Words (prefix replacement)
LeetCode 720: Longest Word in Dictionary

Interview Tips

Always implement Trie with a dict of children, not a fixed 26-char array — handles Unicode, faster to write
For Word Search II, the key insight is using Trie instead of a set to enable prefix pruning
KMP is often asked conceptually — explain the LPS table intuition even if you can't code it perfectly under pressure
Rabin-Karp is the answer when you need to find multiple patterns simultaneously (K patterns, O(n+Km) total)
For autocomplete system design, mention storing top-K results at each Trie node to avoid DFS on every query

Airbnb Interview Guide

Uber Interview Guide

Atlassian Interview Guide

LinkedIn Interview Guide

Apple Interview Guide

Meta Interview Guide