What Is a Typeahead System?
A typeahead (search autocomplete) system returns a ranked list of query completions as the user types each character. Google Search, Amazon search, and YouTube search all implement this. At scale it must return suggestions in under 100ms for every keystroke from hundreds of millions of users.
Functional Requirements
- Return top-5 completions for a given prefix within 100ms
- Suggestions ranked by historical query frequency
- Support for new trending queries (updated within minutes to hours)
- Prefix length up to 20 characters; support all unicode characters
Data Structure: Trie
A trie (prefix tree) stores strings character by character. Each node represents a character; the path from root to a node spells a prefix. At each node, store the top-K completions for that prefix — precomputed and cached at node creation. This gives O(L) lookup time where L is the query length, independent of the number of stored queries.
class TrieNode:
def __init__(self):
self.children = {}
self.top_queries = [] # list of (query, frequency) tuples, max 5
class Trie:
def search(self, prefix: str):
node = self.root
for ch in prefix:
if ch not in node.children:
return []
node = node.children[ch]
return node.top_queries
At Google scale the trie is too large for one machine. Shard the trie by first two characters: prefix starting with “ab” goes to shard 1, “ac” to shard 2, etc. Each shard fits comfortably in memory.
Caching With Redis
Trie lookups are fast but the trie service is stateful and expensive to scale. Layer Redis in front: cache the top suggestions for every popular prefix. Cache hit rate for common prefixes (“th”, “the”, “how”) approaches 99%. Use a sorted set to store suggestions by score (query frequency).
# Cache key: "autocomplete:{prefix}"
# Value: Redis sorted set of {query: frequency}
def get_suggestions(prefix: str) -> list:
cache_key = f"autocomplete:{prefix}"
cached = redis.zrevrange(cache_key, 0, 4, withscores=True)
if cached:
return cached
results = trie_service.search(prefix)
for query, freq in results:
redis.zadd(cache_key, {query: freq})
redis.expire(cache_key, 3600)
return results
Ranking Algorithm
Basic ranking uses raw query frequency: queries searched more often rank higher. Advanced ranking incorporates: (1) Recency — exponential decay so trending queries rise quickly. Score = frequency * e^(-lambda * days_old). (2) Personalization — if the user previously searched “python tutorials”, python-related completions get a boost. (3) Geographic relevance — “pizza” in Chicago completes differently than in rural areas. (4) Query diversity — avoid returning 5 nearly identical completions.
Data Pipeline: Keeping Suggestions Fresh
New queries enter the system continuously. The pipeline: (1) Every search request is logged to Kafka. (2) A Spark streaming job aggregates query counts in 1-hour windows. (3) Every hour, a trie builder job re-ingests the updated frequency table and rebuilds affected trie nodes. (4) The new trie version is deployed with blue-green switching — all traffic shifts atomically to the new trie, no downtime. For truly real-time trends (breaking news), a separate hot-query service tracks last-5-minute query spikes and injects them into suggestions without rebuilding the full trie.
Client-Side Optimization
Two techniques reduce request volume from the client: (1) Debouncing — only send a request if the user has paused typing for 150ms. A user typing “sys” quickly does not need intermediate results for “s” and “sy”. (2) Prefix caching — the client caches all prefix responses. Typing one more character first checks the local cache. Results for “sys” contain a superset of results for “syste”, so cached longer-prefix results can be filtered locally. This reduces server requests by 30-40%.
Scale Estimation
Google Search: 8.5 billion searches/day = ~100k RPS. Each search triggers ~5 autocomplete requests (5 keystrokes before submitting) = 500k RPS to autocomplete. With 99% cache hit rate in Redis: 5,000 RPS reach the trie service. At 50k RPS capacity per trie server: 1 server handles the trie load easily. The bottleneck is Redis throughput — use a Redis cluster with read replicas to handle 500k RPS. Each Redis node handles ~100k GET/s, so 5 read replicas per primary shard is sufficient.
Interview Tips
- Start by clarifying: prefix-only or substring match? Fuzzy matching? Multi-language?
- The trie data structure is expected — explain top-K precomputation at each node
- Always discuss the update pipeline — interviewers want to know how fresh data flows in
- Client debouncing is a strong signal that you think end-to-end, not just backend
- If asked to handle typos: n-gram matching or edit-distance based fallback
Frequently Asked Questions
How does a typeahead search system work at Google scale?
At Google scale, the typeahead system uses a sharded trie where each shard handles a subset of prefixes (sharded by first two characters). Each trie node stores precomputed top-K completions so lookups are O(L) where L is the prefix length. A Redis caching layer sits in front of the trie service — popular prefixes like "the" or "how" are served entirely from cache with 99%+ hit rate. The trie is rebuilt hourly from a Flink streaming job that aggregates query logs from Kafka. For real-time trending queries (breaking news), a hot-query service separately injects recent high-velocity queries into suggestions. Client-side debouncing (150ms pause) and prefix result caching reduce server requests by 30-40%.
What data structure should you use for search autocomplete?
A trie (prefix tree) is the standard data structure for autocomplete. Each node represents a character; paths from root to nodes spell out prefixes. The key optimization is precomputing the top-K completions at every node — this makes suggestions a single O(L) lookup rather than a subtree traversal. For systems that cannot fit the full trie in memory, alternative approaches include: storing only the top-10K queries in a hash map of prefix->suggestions (works for most traffic), or using a database with a prefix index (WHERE query LIKE "prefix%") behind aggressive caching. Redis sorted sets work well for smaller suggestion sets — store each prefix as a key with suggestions scored by frequency.
How do you keep autocomplete suggestions fresh and up to date?
Query logs flow into Kafka as users search. A stream processing job (Flink or Spark Streaming) aggregates query counts in sliding 1-hour windows. Every hour, an offline job reads the updated frequency table and rebuilds the affected trie nodes, then atomically swaps the old trie version for the new one using blue-green deployment — all traffic shifts to the new version without downtime. For real-time trends that cannot wait an hour, a separate hot-query service monitors query velocity (queries per minute) in a 5-minute window. When a query crosses a spike threshold, it is immediately injected into the top suggestions for matching prefixes, bypassing the hourly rebuild cycle. This hybrid approach gives both freshness and correctness.
{ “@context”: “https://schema.org”, “@type”: “FAQPage”, “mainEntity”: [ { “@type”: “Question”, “name”: “How does a typeahead search system work at Google scale?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “At Google scale, the typeahead system uses a sharded trie where each shard handles a subset of prefixes (sharded by first two characters). Each trie node stores precomputed top-K completions so lookups are O(L) where L is the prefix length. A Redis caching layer sits in front of the trie service — popular prefixes like “the” or “how” are served entirely from cache with 99%+ hit rate. The trie is rebuilt hourly from a Flink streaming job that aggregates query logs from Kafka. For real-time trending queries (breaking news), a hot-query service separately injects recent high-velocity queries into suggestions. Client-side debouncing (150ms pause) and prefix result caching reduce server requests by 30-40%.” } }, { “@type”: “Question”, “name”: “What data structure should you use for search autocomplete?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “A trie (prefix tree) is the standard data structure for autocomplete. Each node represents a character; paths from root to nodes spell out prefixes. The key optimization is precomputing the top-K completions at every node — this makes suggestions a single O(L) lookup rather than a subtree traversal. For systems that cannot fit the full trie in memory, alternative approaches include: storing only the top-10K queries in a hash map of prefix->suggestions (works for most traffic), or using a database with a prefix index (WHERE query LIKE “prefix%”) behind aggressive caching. Redis sorted sets work well for smaller suggestion sets — store each prefix as a key with suggestions scored by frequency.” } }, { “@type”: “Question”, “name”: “How do you keep autocomplete suggestions fresh and up to date?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Query logs flow into Kafka as users search. A stream processing job (Flink or Spark Streaming) aggregates query counts in sliding 1-hour windows. Every hour, an offline job reads the updated frequency table and rebuilds the affected trie nodes, then atomically swaps the old trie version for the new one using blue-green deployment — all traffic shifts to the new version without downtime. For real-time trends that cannot wait an hour, a separate hot-query service monitors query velocity (queries per minute) in a 5-minute window. When a query crosses a spike threshold, it is immediately injected into the top suggestions for matching prefixes, bypassing the hourly rebuild cycle. This hybrid approach gives both freshness and correctness.” } } ] }