Trie and String Matching Interview Patterns
Tries (prefix trees) and string matching algorithms are a distinct category in coding interviews. They appear in autocomplete, spell checker, word search, and IP routing problems. This guide covers the Trie data structure, its key operations, and the classic string matching algorithms you need to know.
Trie Data Structure
A Trie stores strings by their prefixes. Each node represents one character; the path from root to a node spells out a prefix.
class TrieNode:
def __init__(self):
self.children = {} # char -> TrieNode
self.is_end = False # marks end of a word
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word: str) -> None:
node = self.root
for ch in word:
if ch not in node.children:
node.children[ch] = TrieNode()
node = node.children[ch]
node.is_end = True
def search(self, word: str) -> bool:
node = self.root
for ch in word:
if ch not in node.children: return False
node = node.children[ch]
return node.is_end
def starts_with(self, prefix: str) -> bool:
node = self.root
for ch in prefix:
if ch not in node.children: return False
node = node.children[ch]
return True
Time: O(m) for insert/search/prefix, where m = word length. Space: O(total characters across all words).
Autocomplete (Design Search Autocomplete System — LeetCode 642)
Each TrieNode stores the top-3 matching sentences for its prefix:
class AutocompleteSystem:
def __init__(self, sentences, times):
self.root = TrieNode()
self.curr_input = []
for s, t in zip(sentences, times):
self._insert(s, t)
def _insert(self, s, count):
node = self.root
for ch in s:
if ch not in node.children:
node.children[ch] = TrieNode()
node = node.children[ch]
# keep top 3 by count at each node
node.top3 = sorted(node.top3 + [(count, s)],
key=lambda x: (-x[0], x[1]))[:3]
node.is_end = True
def input(self, c):
if c == '#':
self._insert(''.join(self.curr_input), 1)
self.curr_input = []
return []
self.curr_input.append(c)
node = self.root
for ch in self.curr_input:
if ch not in node.children: return []
node = node.children[ch]
return [s for _, s in node.top3]
Word Search II (LeetCode 212) — Trie + Backtracking
Find all words from a dictionary that exist in a 2D grid. Build a Trie from the dictionary, then DFS from each cell:
def find_words(board, words):
trie = Trie()
for w in words: trie.insert(w)
result = set()
rows, cols = len(board), len(board[0])
def dfs(node, r, c, path):
if node.is_end: result.add(path)
if not (0 <= r < rows and 0 <= c < cols): return
ch = board[r][c]
if ch not in node.children or ch == '#': return
board[r][c] = '#' # mark visited
next_node = node.children[ch]
for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
dfs(next_node, r+dr, c+dc, path+ch)
board[r][c] = ch # restore
for r in range(rows):
for c in range(cols):
dfs(trie.root, r, c, '')
return list(result)
# Trie pruning avoids re-exploring dead-end prefixes
KMP (Knuth-Morris-Pratt) Pattern Matching
Find all occurrences of pattern P in text T in O(n+m) time (vs O(n*m) naive).
def kmp_search(text, pattern):
def build_lps(pattern):
# lps[i] = length of longest proper prefix of pattern[:i+1]
# that is also a suffix
lps = [0] * len(pattern)
length, i = 0, 1
while i < len(pattern):
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
elif length:
length = lps[length - 1] # don't increment i
else:
lps[i] = 0
i += 1
return lps
lps = build_lps(pattern)
matches = []
i = j = 0 # i = text index, j = pattern index
while i < len(text):
if text[i] == pattern[j]:
i += 1; j += 1
if j == len(pattern):
matches.append(i - j)
j = lps[j - 1]
elif i < len(text) and text[i] != pattern[j]:
if j: j = lps[j - 1]
else: i += 1
return matches
The LPS (Longest Proper Prefix which is also Suffix) array lets KMP skip redundant comparisons by reusing already-matched prefix information.
Rabin-Karp Rolling Hash
Use hashing to find pattern occurrences in O(n+m) average. Useful for multi-pattern search.
def rabin_karp(text, pattern):
BASE, MOD = 26, 10**9 + 7
n, m = len(text), len(pattern)
if m > n: return []
def char_val(c): return ord(c) - ord('a')
# Compute hash of pattern and first window
p_hash = t_hash = 0
high = pow(BASE, m-1, MOD)
for i in range(m):
p_hash = (p_hash * BASE + char_val(pattern[i])) % MOD
t_hash = (t_hash * BASE + char_val(text[i])) % MOD
matches = []
for i in range(n - m + 1):
if t_hash == p_hash and text[i:i+m] == pattern: # verify to avoid hash collision
matches.append(i)
if i < n - m:
t_hash = (t_hash - char_val(text[i]) * high) % MOD
t_hash = (t_hash * BASE + char_val(text[i+m])) % MOD
return matches
IP Address Routing (Longest Prefix Match)
Routers use a bitwise Trie to find the longest matching prefix for an IP address. Each bit of the IP (0 or 1) is one level in the Trie. This is how CIDR routing tables work — lookup is O(32) for IPv4, O(128) for IPv6.
Common Trie Problems
- LeetCode 208: Implement Trie
- LeetCode 211: Design Add and Search Words (wildcard '.' matching)
- LeetCode 212: Word Search II (Trie + backtracking)
- LeetCode 642: Design Search Autocomplete System
- LeetCode 648: Replace Words (prefix replacement)
- LeetCode 720: Longest Word in Dictionary
Interview Tips
- Always implement Trie with a dict of children, not a fixed 26-char array — handles Unicode, faster to write
- For Word Search II, the key insight is using Trie instead of a set to enable prefix pruning
- KMP is often asked conceptually — explain the LPS table intuition even if you can't code it perfectly under pressure
- Rabin-Karp is the answer when you need to find multiple patterns simultaneously (K patterns, O(n+Km) total)
- For autocomplete system design, mention storing top-K results at each Trie node to avoid DFS on every query
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “When should you use a Trie instead of a HashSet for string problems?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Use a Trie when: (1) you need prefix queries – "find all words with prefix 'app'" is O(m) with Trie vs. O(n*m) scanning a HashSet; (2) autocomplete – Trie naturally groups words by prefix; (3) word search on a 2D grid with dictionary – Trie enables pruning dead-end prefixes during DFS, dramatically reducing work; (4) finding the longest common prefix of a set of strings – traverse the Trie until a branching node. Use a HashSet when you only need exact match lookups. The Trie advantage is prefix operations; its cost is higher memory (each character is a node) and more complex implementation.” }
},
{
“@type”: “Question”,
“name”: “How does the KMP algorithm avoid redundant comparisons?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Naive pattern matching has O(n*m) complexity because on a mismatch, it backtracks the text pointer. KMP pre-computes an LPS (Longest Proper Prefix which is also Suffix) array for the pattern. LPS[i] = the length of the longest prefix of pattern[0..i] that is also a suffix. On a mismatch at pattern position j, instead of restarting from pattern[0], jump to pattern[LPS[j-1]] – reusing the already-matched prefix. The text pointer never backtracks. Total comparisons: at most 2n (each character is compared at most twice), giving O(n+m) total. Example: pattern "ABCABD", LPS=[0,0,0,1,2,0]. A mismatch at position 5 (D) jumps back to position 2 (C), reusing the "AB" prefix that already matched.” }
},
{
“@type”: “Question”,
“name”: “What is rolling hash in Rabin-Karp and why is it efficient?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Rabin-Karp uses a polynomial rolling hash to compare a pattern against all substrings of the text in O(1) per window. Hash of a string: h = (c0*B^(m-1) + c1*B^(m-2) + … + cm-1) mod P. When sliding the window right by one: new_hash = (old_hash – c0*B^(m-1)) * B + new_char. This removes the leftmost character and adds the new rightmost character in O(1). Total: O(n) hash computations + O(m) verification on matches. Average O(n+m). Rabin-Karp shines for multi-pattern search: hash all K patterns, then slide one window over the text and check each hash against a HashSet of pattern hashes – O(n + Km) total instead of O(nKm) naive.” }
}
]
}