Given a 2D grid of letters and a list of words, find all words from the list that appear in the grid. A word appears if its letters can be traced through a path of horizontally or vertically adjacent cells, with no cell used twice in a single word.
This is LeetCode 212, “Word Search II”, and it is the canonical “trie pattern recognition” test in the modern interview canon. The naive approach — search for each word independently — is too slow when there are many words. The optimal approach builds a trie from all the words at once and runs a single DFS that prunes paths whose prefix is not in the trie. The optimization is the key insight: searching all words simultaneously is faster than searching each word in isolation.
Examples
board = [
['o','a','a','n'],
['e','t','a','e'],
['i','h','k','r'],
['i','f','l','v']
]
words = ["oath","pea","eat","rain"]
Output: ["eat","oath"]
Naive: search each word independently, O(W × M × N × 4^L)
def find_words_naive(board, words):
def can_find(word, i, j, k):
if k == len(word):
return True
if not (0 <= i < len(board)) or not (0 <= j < len(board[0])):
return False
if board[i][j] != word[k]:
return False
tmp = board[i][j]
board[i][j] = '#'
found = (can_find(word, i+1, j, k+1) or
can_find(word, i-1, j, k+1) or
can_find(word, i, j+1, k+1) or
can_find(word, i, j-1, k+1))
board[i][j] = tmp
return found
result = []
for word in words:
for i in range(len(board)):
for j in range(len(board[0])):
if can_find(word, i, j, 0):
result.append(word)
break
else:
continue
break
return result
For each word, run a DFS from each cell. O(W × M × N × 4^L) where W is the number of words, M and N are the grid dimensions, and L is the average word length. For long word lists this is too slow.
Optimal: trie + DFS with prefix pruning, O(M × N × 4^L)
Build a trie containing all the words. Run a single DFS over the grid; at each cell, descend into the trie based on the letter. If the trie has no node for the current letter, prune. If the trie node marks an end of word, add that word to the result.
def find_words(board, words):
# Build trie
trie = {}
for word in words:
node = trie
for c in word:
node = node.setdefault(c, {})
node['$'] = word # mark end with the full word
rows, cols = len(board), len(board[0])
result = set()
def dfs(i, j, parent):
c = board[i][j]
node = parent.get(c)
if not node:
return
if '$' in node:
result.add(node['$'])
board[i][j] = '#' # mark visited
for di, dj in [(-1,0),(1,0),(0,-1),(0,1)]:
ni, nj = i + di, j + dj
if 0 <= ni < rows and 0 <= nj < cols and board[ni][nj] != '#':
dfs(ni, nj, node)
board[i][j] = c # restore
for i in range(rows):
for j in range(cols):
dfs(i, j, trie)
return list(result)
O(M × N × 4^L) in the worst case. The crucial optimization: the trie is shared across all words, so the DFS prunes immediately when the current path’s prefix is not a valid word prefix. For typical inputs this dramatically outperforms the per-word approach.
Why the trie optimization matters
Suppose your word list contains 10,000 words, all starting with “z”. On a grid where ‘z’ appears in only one cell, the naive approach runs the per-word DFS 10,000 times, almost all of which fail at the first cell. The trie approach starts the DFS from the ‘z’ cell once, descends into the trie’s ‘z’ branch, and explores all 10,000 prefixes simultaneously while traversing the grid. The factor of 10,000 in the work is replaced by a single shared traversal.
This is the broader pattern: tries let you search many strings simultaneously by sharing prefixes. Anywhere a problem involves “find all of these strings in this thing”, a trie is likely the right tool.
Common bugs
- Marking cells visited but not unmarking. The DFS must restore the board after recursing, or subsequent searches will see the modified state.
- Adding duplicates to the result. If the same word can be found via multiple paths, the result will contain it multiple times. Use a set or track via the trie node.
- Forgetting to remove found words from the trie. An optimization: once a word is found, you can prune it from the trie to avoid re-finding it. Often added as a follow-up optimization.
- Not handling the empty grid or empty word list. Edge cases that cause crashes.
The trie pattern
The trie pattern shows up in many problems where multi-string matching is involved:
- Word Search II (LC 212) — the canonical example.
- Implement Trie (LC 208) — the prerequisite.
- Replace Words (LC 648) — replace each word with its shortest root from a dictionary.
- Add and Search Word (LC 211) — wildcard search across a dictionary of words.
- Word Squares (LC 425) — build all valid word squares from a list.
- Maximum XOR of Two Numbers (LC 421) — binary trie for XOR.
- Longest Word in Dictionary (LC 720) — find the longest word that can be built one letter at a time from other words in the dictionary.
Word Search II is the canonical first example because the trie payoff is dramatic — the optimization changes the asymptotic factor by the number of words. Once a candidate has internalized the trie pattern on this problem, the related ones become accessible.
What interviewers grade
This is a senior-level FAANG question. Signal layers:
- Recognize the trie opportunity. The candidate articulates that searching for many words simultaneously is faster than per-word searches.
- Build the trie cleanly. The dict-of-dicts representation is standard; some candidates over-engineer with custom classes.
- Implement the DFS with backtracking. The visited-marking and unmarking has to be correct.
- Avoid duplicates. Use a set or trie pruning.
- Discuss optimizations. Removing found words from the trie; pruning empty trie branches.
Is it asked in 2026?
Yes, frequently at senior+ FAANG and AI labs. Word Search II is in the LeetCode top 50 hard problems and remains a staple of senior-level coding rounds. The question has not aged out because the trie pattern is genuinely useful for many real-world problems, and the connection between this canonical example and broader applications keeps the question generative.
Frequently Asked Questions
What is the time complexity?
O(M × N × 4^L) where L is the maximum word length. Each cell can be the start of a DFS that branches up to 4 ways and goes up to L deep.
What is the space complexity?
O(K) where K is the total number of characters across all words (the trie size), plus O(L) for the DFS recursion stack.
Why is a trie better than a hash set of prefixes?
A hash set lets you check whether a specific prefix is a valid word prefix, but the trie’s branching structure lets you walk character-by-character without recomputing the prefix string. The trie is also more cache-friendly for typical inputs.
Should I prune the trie as I find words?
Optionally yes; it is a follow-up optimization that interviewers sometimes ask about. Removing found words means you do not re-find them via different paths, and you can clean up empty trie branches to speed up future DFS calls.
Is this problem still asked in 2026?
Yes, frequently at senior+ FAANG and AI labs. It is the canonical trie-pattern recognition test.