What Is a Typeahead / Search Suggestion System?
A typeahead (also called autocomplete or search suggestion) system predicts what a user is typing and suggests completions in real time. Google Search, Amazon product search, and Twitter search all use it. The system must respond within 100ms per keystroke, handle millions of concurrent users, and return the most relevant suggestions (not just alphabetically next).
System Requirements
Functional
- Return top 5–10 search suggestions for a given prefix in <100ms
- Suggestions ranked by search frequency (most popular first)
- Support billion-scale query corpus (Google handles 8.5B searches/day)
- Personalized suggestions (factor in user’s search history)
- Trending queries: newly popular searches surface quickly
Non-Functional
- Latency: <100ms p99 (users stop typing if suggestions lag)
- Availability: high — search degraded without suggestions is still usable
- Freshness: trending queries appear in suggestions within 10–15 minutes
Core Data Structure: Trie
A Trie (prefix tree) is the canonical data structure for autocomplete. Each node represents one character. Traversing from root to a node spells a prefix. At each node, store the top-K (e.g., 10) most popular queries in its subtree — precomputed and cached.
class TrieNode:
def __init__(self):
self.children = {}
self.top_queries = [] # [(frequency, query), ...] max 10
class Trie:
def search(self, prefix):
node = self.root
for c in prefix:
if c not in node.children:
return []
node = node.children[c]
return node.top_queries # O(len(prefix)) lookup, O(1) return
By pre-caching top queries at every node, lookups are O(prefix_length) — no subtree traversal needed at query time. The trie is built offline and served read-only.
Trie at Scale: Why You Can’t Fit It in Memory
A Trie for all English queries would have billions of nodes — too large for one server’s RAM. Solutions:
- Shard by prefix: queries starting with ‘a’ go to shard 1, ‘b’ to shard 2, etc. More granular: shard by first two characters (26² = 676 shards). The router maps prefix → shard server.
- Serialize to Redis/disk: serialize the Trie as a hash map: prefix → [top queries]. Redis GET “se” → [“search”, “seattle”, “send”, …]. Simple, fast, easily sharded.
- Approximate with top-N prefix table: for each prefix observed in search logs, precompute top-10 queries. Store as key-value: “se” → […]. Much simpler than a real Trie, equally effective in practice.
Ranking Suggestions
Pure frequency isn’t enough. Ranking factors:
- Query frequency: #1 signal — “weather” is searched billions of times
- Recency: trending queries decay over time. Exponential decay: score = frequency * e^(-λ * age_in_hours)
- Personalization: user’s own recent searches weighted higher. “py” → “python tutorial” for a developer, “pyrotechnics” for a fireworks retailer
- Geography: “starbucks” in Seattle vs. “starbucks near me”
- Spell correction: “gogle” → suggest “google”
Trie Update Pipeline
The Trie can’t be updated in real time (would invalidate cached top-K at many ancestor nodes). Instead:
- Raw query logs → Kafka → aggregation service counts query frequencies (sliding window, last 7 days)
- Weekly (or daily) batch job rebuilds the Trie from the frequency table
- New Trie deployed as a snapshot to Trie servers (blue/green swap)
- For trending queries (<15 minutes freshness): maintain a separate “trending” Redis sorted set updated every 5 minutes. Merge trending results with Trie results at query time.
Frontend Optimization
Reduce backend requests:
- Debounce: only send a request after 200ms of typing inactivity — avoids a request per every single character
- Cancel previous: abort in-flight requests when user types more (fetch AbortController)
- Browser cache: cache responses keyed by prefix. “se”, “sea”, “sear” each cached independently. Common prefixes are re-used across sessions.
- Prefetch: when user focuses the search bar, pre-fetch top 10 global suggestions (empty prefix) — these show as soon as user starts typing.
API Design
GET /suggest?q=sea&limit=10&locale=en_US&session_id=abc123
Response: {
suggestions: [
{ query: "seattle weather", score: 0.98 },
{ query: "search engine", score: 0.87 },
...
],
latency_ms: 42
}
Interview Tips
- The Trie with cached top-K at each node is the key optimization — distinguish it from a naive Trie that requires subtree traversal on every lookup.
- Explain the update pipeline clearly: offline rebuild + separate trending overlay for freshness.
- Frontend optimizations (debouncing, browser caching) are often overlooked but show product thinking.
- If asked about personalization: the simplest approach is to fetch both global Trie suggestions and a user’s recent query history matching the prefix, then rank and merge.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does a Trie with cached top-K queries enable O(1) autocomplete lookup?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “A naive Trie requires traversing the entire subtree rooted at the prefix node to find all completions — O(N) where N is the subtree size. The optimization: precompute and cache the top-K (e.g., 10) most popular queries at every Trie node. When a user types "sea", traverse three nodes (s → e → a) and directly return the cached top-10 queries for that node. Lookup becomes O(prefix_length) instead of O(subtree_size). The trade-off: every time query frequencies change, the cache at every ancestor node must be updated. For a static Trie (rebuilt weekly), this is done offline during the build phase — propagate frequencies up the tree and compute top-K at each node. For dynamic updates, use approximate methods (decay the cache slowly rather than recomputing exactly on every query frequency change).” }
},
{
“@type”: “Question”,
“name”: “How do you keep search suggestions fresh for trending queries?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “The main Trie is rebuilt weekly or daily from aggregated query logs — it captures long-term popularity but misses sudden trends. To handle trending queries (appearing within minutes): maintain a separate "trending" data structure updated every 5 minutes. A Flink streaming job processes real-time search events and computes trending query scores using a sliding window (e.g., queries with 10x their 7-day average frequency in the last 15 minutes). Store the top-100 trending queries in Redis. At query time: fetch both Trie suggestions and trending queries matching the prefix, then merge and re-rank. Trending queries get a temporal boost in ranking score. This two-tier approach gives you both long-term popularity accuracy (from the batch-built Trie) and freshness (from the real-time trending overlay).” }
},
{
“@type”: “Question”,
“name”: “How does debouncing reduce backend load for a typeahead system?”,
“acceptedAnswer”: { “@type”: “Answer”, “text”: “Without debouncing, every keystroke triggers a backend request. Typing "search" generates 6 requests: "s", "se", "sea", "sear", "searc", "search". Most of these are wasted — the user hasn't finished typing. Debouncing: wait N milliseconds (typically 100–200ms) after the last keystroke before sending a request. If the user types quickly, only the request for "search" is sent. This reduces request volume by 60–80% for typical typing speeds. Implementation: clear the previous setTimeout on each keystroke, set a new one. Also cancel in-flight requests when the user types more (using AbortController in browsers) — the response for "sea" might arrive after "sear" is already typed, causing stale suggestions to flash. Combined with browser-side response caching (store prefix → suggestions in a Map), repeated prefixes are served instantly without any network round trip.” }
}
]
}