Requirements
A typeahead service returns query suggestions as the user types. Google Search autocomplete, Twitter’s hashtag completion, and Stripe’s dashboard search are examples. Requirements: return the top 5-10 suggestions for a given prefix within 100ms; suggestions should be ranked by a combination of popularity and personalization; the system should update to reflect trending queries within minutes. At Google scale: 5 billion searches per day, 10+ keystrokes per query = 50 billion autocomplete requests per day. Even at medium scale (50M daily active users), this is millions of requests per second at typing speed.
Data Structure: Trie with Top-K Precomputation
A trie maps prefixes to matching completions. Naive trie: each leaf is a query string with a frequency count. Finding top-K for a prefix requires DFS over the entire subtree — O(subtree size). Too slow for the serving path. Optimization: at each trie node, precompute and store the top K most popular completions passing through that node. Serving is then O(prefix length) — just traverse to the prefix node and return its stored top-K list. Tradeoff: inserting or updating a query’s frequency requires updating the top-K list at every ancestor node — O(L * K) per update, where L = query length. This is acceptable because updates happen asynchronously (not in the serving path). Trie is built offline and served from memory. In-memory trie for 10M unique queries * avg 30 chars * ~40 bytes per node: ~12GB — fits in a large server or is partitioned across multiple.
Architecture
Two paths: serving (real-time, read-only) and index building (async, write). Serving path: client types a character → request hits the autocomplete API → API queries the in-memory trie service → returns top-K suggestions in < 5ms. Client-side debouncing: send requests at most every 100ms (not on every keystroke). Client-side caching: cache results per prefix for 60 seconds. These two optimizations reduce actual server requests by 80-90%. Index building path: query logs → batch aggregation job (Spark, hourly) → recompute prefix → top-K mapping → serialize new trie → atomic swap (write new trie to a new memory region, then swap the pointer). Zero-downtime updates: the old trie continues serving until the swap. Personalization: blend global popularity (80% weight) with the user's past queries (20% weight) to rerank the global top-K before returning to the client. Personalization happens at the API layer, not in the trie itself.
Trie Partitioning and Scale
A single trie server can handle hundreds of thousands of QPS (reads from memory are fast). For Google scale: partition the trie by prefix range. Shard by first character or first two characters (26 or 676 shards). Each shard handles queries starting with its assigned prefixes. Query routing: the API layer routes based on the query prefix. Consistent hashing: a prefix “ap” always goes to the same shard. Replication: each shard has 3 replicas (active-active). Reads are load-balanced; writes go to all replicas. Hotspot: common prefixes like “a”, “b”, “the” are extremely hot. Solution: assign more servers to hot prefix ranges; or add a result cache (Redis) in front of the trie for the top 10,000 most frequent prefixes (covers ~60% of traffic via Zipf’s law).
Trending Query Updates
A query that goes viral (breaking news, a meme) should appear in autocomplete within minutes, not hours. Real-time pipeline: query stream → Kafka → Flink sliding window (count queries per prefix per 5-minute window) → merge with historical counts → update trie. Sliding window merging: new_count = historical_count * decay_factor + realtime_count. Decay factor (e.g., 0.95 per hour) ensures recent queries get higher weight than old ones. Trie update is still async (not in serving path) but now refreshes every 5 minutes instead of hourly. Safety: new queries don’t appear until they’ve accumulated enough frequency (minimum count threshold) to avoid surfacing rare/offensive one-off queries. Blocklist: maintain a blocklist of phrases that should never appear in autocomplete (profanity, dangerous content). Applied as a filter layer after trie lookup, before returning to the client.
Interview Tips
- Client-side caching and debouncing are essential — mention them early to show you understand the full system, not just the backend.
- Trie with precomputed top-K per node is the standard answer; know the tradeoff with naive DFS.
- Separate the serving path (read-only, in-memory, < 5ms) from the index-building path (async, batch or stream, minutes latency).
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does client-side debouncing reduce autocomplete server load?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Without debouncing: every keystroke fires a request. Typing “search” generates 6 requests (s, se, sea, sear, searc, search). At 50M users typing simultaneously: hundreds of millions of requests per second. With debouncing: the client waits for a pause in typing (typically 100-200ms) before sending a request. Fast typists generate only 1-2 requests per search query instead of 6-10. Combined with client-side caching: if the user types “sea” and then backspaces to “se” (which was already fetched), the cached result is shown immediately without a server request. Implementation: use a debounce function that resets a timer on every keystroke. The request fires only when the timer expires without a new keystroke. In JavaScript: let timer = null; input.addEventListener(“keyup”, () => { clearTimeout(timer); timer = setTimeout(fetchSuggestions, 150); }). This single optimization reduces autocomplete API traffic by 70-80% compared to naive per-keystroke requests.”
}
},
{
“@type”: “Question”,
“name”: “How do you rank autocomplete suggestions to balance popularity and freshness?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Raw popularity (historical query count) favors established queries and lags behind trends. A query from 2 years ago with 10M searches may rank above a trending query with 100K searches this week. Time-decay ranking: score = historical_count * decay_factor + recent_count * (1 – decay_factor), where decay_factor (e.g., 0.7) weights historical data and (1 – decay_factor) weights recent data. Decay on historical counts: multiply historical_count by 0.95 per week — gradually reduces the weight of old queries. Trending boost: queries whose frequency grew > 2x in the past 24 hours relative to the previous 7-day average get a trending multiplier (e.g., 1.5x). Personalization reranking: after computing the global top-K, reorder based on the user’s past query history and click-through rates. User-specific terms should rank higher than global averages. Safe search filtering: apply safe search settings to filter age-restricted or sensitive completions. Different users see different ranked suggestions based on their preferences. A/B testing: test different ranking formulas to measure click-through rate on suggestions and adjust weights.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle multi-word or phrase autocomplete differently from single-word?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Single-word trie works for completing individual tokens (“appl” u2192 “apple”, “application”, “apply”). For phrases (“how to”, “best restaurants in”), the prefix space is much larger and the trie approach applies to the full phrase. Adaptations: (1) Phrase trie: index the entire phrase string in the trie. Works but the trie is much larger. (2) N-gram model: store the most frequent (word1, word2) bigrams and trigrams. When the user types “how to”, look up frequent completions of “how to” as a unit. (3) Query segmentation: split the prefix at the last word boundary. Complete the last word using a word-level trie, then filter completions by the full context (pre-word + completed last word) to ensure the full phrase is popular. (4) Language model: use a neural language model (GPT-style) to generate completions that are contextually appropriate to the full prefix. Used by modern search engines for complex multi-word queries. For most interview purposes: the phrase trie or n-gram completion approach is sufficient. Neural approaches are mentioned as a production enhancement.”
}
},
{
“@type”: “Question”,
“name”: “How would you design the autocomplete for an e-commerce product search?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “E-commerce autocomplete has additional complexity: products change, categories matter, and out-of-stock products should be deprioritized. Suggestions come from three sources: (1) Popular search queries: “iPhone 15 case”, “wireless headphones” — indexed in a query trie with frequency. (2) Product names: “Apple AirPods Pro”, “Sony WH-1000XM5” — indexed separately. Prefix match on product title. (3) Categories and brands: “Apple”, “Nike”, “Headphones” — quick navigation shortcuts. Ranking: blend query popularity (past search behavior) + product inventory (in-stock products rank higher) + revenue (high-margin products slightly boosted). Result type annotations: show the result type in the dropdown (query suggestion vs product vs category). Real-time inventory updates: product out-of-stock is a business event that should remove the product from autocomplete suggestions within minutes. Update via a Kafka event u2192 autocomplete index update u2192 trie refresh. Personalization: for returning customers, boost brands and categories they’ve purchased from before. Implementation: Elasticsearch’s completion suggester handles most of this with configurable contexts and weights, which is production-ready faster than building a custom trie.”
}
},
{
“@type”: “Question”,
“name”: “What is the time and space complexity of a trie vs a sorted list for autocomplete?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Trie prefix lookup: O(P) where P = prefix length. Returning top-K stored at each node: O(1) if precomputed, O(subtree size) if not. Space: O(total characters across all words * branching factor). For 26-character alphabet: each node has up to 26 children pointers. A trie with 1M words averaging 20 chars = up to 20M nodes * ~200 bytes per node = ~4GB worst case (actual usage is much lower due to prefix sharing). Sorted list with binary search: inserting a word is O(n) (shift). Binary search to find first match: O(log n). Finding all K matches starting from that position: O(K). Space: O(total string lengths) — more memory-efficient than a trie for small datasets. Comparison: for large datasets with many shared prefixes (common in natural language), the trie has better time complexity for lookups and uses space proportional to unique characters, not unique words. For small datasets (< 100K words), a sorted list is simpler and may be more practical. For autocomplete at scale: trie with precomputed top-K per node is standard. The precomputation tradeoff: O(1) lookup at query time, O(L * K) update time — acceptable for read-heavy autocomplete where updates are batched."
}
}
]
}
Asked at: LinkedIn Interview Guide
Asked at: Twitter/X Interview Guide
Asked at: Snap Interview Guide
Asked at: Shopify Interview Guide