System Design Interview: Design Google Search Autocomplete (Typeahead)
Search autocomplete (typeahead) is one of the most common system design questions, asked at Google, Amazon, LinkedIn, Twitter, and virtually every company with a search feature. It tests your understanding of trie data structures, caching, distributed systems, and latency optimization.
Requirements Clarification
Functional Requirements
- As user types a query prefix, return top 5-10 autocomplete suggestions
- Suggestions ranked by search frequency / trending score
- Support for multiple languages and special characters
- Results updated based on real search data (trending queries appear within hours)
- Filter harmful/inappropriate suggestions
Non-Functional Requirements
- Latency: P99 under 100ms (autocomplete feels instant)
- Scale: Google Search — 8.5B searches/day, 5M QPS for autocomplete
- Each keystroke generates a request — high QPS relative to search traffic
- High availability: 99.99%
- Freshness: trending queries appear in suggestions within 1-2 hours
High-Level Architecture
Client (browser/app)
↓ (debounced: 100-300ms after keystroke)
CDN / API Gateway
↓ (cache hit: 80% served from edge)
Autocomplete Service
↓ ↓
Trie Store (Redis) Elasticsearch
↓
Aggregation Pipeline
(Kafka → Spark/Flink → Trie builder)
Core Data Structure: Trie
import heapq
from collections import defaultdict
from typing import Optional
class TrieNode:
def __init__(self):
self.children: dict[str, 'TrieNode'] = {}
self.is_end: bool = False
self.frequency: int = 0
# Optimization: cache top-k suggestions at each node
# Avoids DFS on every query — O(1) lookup vs O(prefix_subtree)
self.top_suggestions: list[tuple[int, str]] = [] # [(freq, word)]
class Trie:
def __init__(self, top_k: int = 10):
self.root = TrieNode()
self.top_k = top_k
def insert(self, word: str, frequency: int):
"""Insert word with frequency, update top-k at each prefix node"""
node = self.root
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
# Update top-k suggestions cached at this node
self._update_top_k(node, word, frequency)
node.is_end = True
node.frequency = frequency
def _update_top_k(self, node: TrieNode, word: str, frequency: int):
"""Maintain top-k heap at each node"""
# Remove existing entry for this word if present
node.top_suggestions = [
(f, w) for f, w in node.top_suggestions if w != word
]
node.top_suggestions.append((frequency, word))
# Keep only top-k by frequency (max-heap by negating freq)
node.top_suggestions = heapq.nlargest(
self.top_k, node.top_suggestions, key=lambda x: x[0]
)
def search(self, prefix: str) -> list[str]:
"""Return top-k suggestions for prefix in O(len(prefix))"""
node = self.root
for char in prefix:
if char not in node.children:
return [] # No suggestions for this prefix
node = node.children[char]
# Return pre-computed top-k sorted by frequency descending
suggestions = sorted(node.top_suggestions, key=lambda x: -x[0])
return [word for _, word in suggestions]
def get_top_k_from_subtree(self, node: TrieNode) -> list[tuple[int, str]]:
"""DFS to collect all words in subtree (used when building index, not serving)"""
results = []
stack = [(node, "")]
while stack:
curr, prefix = stack.pop()
if curr.is_end:
results.append((curr.frequency, prefix))
for char, child in curr.children.items():
stack.append((child, prefix + char))
return heapq.nlargest(self.top_k, results, key=lambda x: x[0])
Distributed Trie with Redis
import redis
import json
class RedisTrieStore:
"""
Store trie as Redis hashes.
Key pattern: trie:{prefix} -> {suggestions: [...], updated_at: timestamp}
For 5M QPS: Redis Cluster with consistent hashing by prefix hash.
Each prefix maps to a shard deterministically.
"""
def __init__(self, redis_cluster):
self.redis = redis_cluster
self.ttl = 3600 # 1 hour TTL — refreshed by aggregation pipeline
def get_suggestions(self, prefix: str, top_k: int = 10) -> list[str]:
"""O(1) Redis GET — all computation done at build time"""
key = f"trie:{prefix.lower().strip()}"
cached = self.redis.get(key)
if cached:
data = json.loads(cached)
return data['suggestions'][:top_k]
return []
def update_suggestions(self, prefix: str, suggestions: list[str]):
"""Called by aggregation pipeline when rebuilding trie"""
key = f"trie:{prefix.lower().strip()}"
self.redis.setex(key, self.ttl, json.dumps({
'suggestions': suggestions,
}))
def bulk_update(self, prefix_suggestions: dict[str, list[str]]):
"""Bulk update for trie rebuild — pipeline for efficiency"""
pipe = self.redis.pipeline()
for prefix, suggestions in prefix_suggestions.items():
key = f"trie:{prefix.lower()}"
pipe.setex(key, self.ttl, json.dumps({'suggestions': suggestions}))
pipe.execute()
Aggregation Pipeline: Making Suggestions Fresh
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.datastream.window import TumblingEventTimeWindows
from pyflink.common.time import Time
# Simplified Flink job for real-time query frequency aggregation
class SearchQueryAggregator:
"""
Reads search queries from Kafka.
Aggregates frequency with 1-hour tumbling windows.
Outputs top-k queries per prefix to trie builder.
"""
def process_queries(self, kafka_consumer):
env = StreamExecutionEnvironment.get_execution_environment()
stream = env.add_source(kafka_consumer) # SearchEvent{query, timestamp, user_id}
# Filter: deduplicate (one count per user per query per session)
# Aggregate: count per query in 1-hour windows
# Output: {query: str, count: int, window_end: timestamp}
aggregated = (stream
.key_by(lambda e: e.query)
.window(TumblingEventTimeWindows.of(Time.hours(1)))
.reduce(lambda a, b: SearchEvent(a.query, a.count + b.count))
)
return aggregated
class TrieBuilder:
"""
Receives aggregated counts, rebuilds trie prefix→suggestions mapping.
Runs hourly, takes ~10 minutes for full rebuild.
"""
def rebuild(self, query_counts: list[tuple[str, int]],
redis_store: RedisTrieStore, top_k: int = 10):
"""
For each query, generate all prefixes and update suggestions.
query_counts: [(query_string, frequency)]
"""
# Build prefix → sorted suggestions
prefix_to_candidates: dict[str, list[tuple[int, str]]] = defaultdict(list)
for query, count in query_counts:
# Generate all prefixes for this query
for length in range(1, len(query) + 1):
prefix = query[:length]
prefix_to_candidates[prefix].append((count, query))
# For each prefix, keep only top-k
prefix_suggestions = {}
for prefix, candidates in prefix_to_candidates.items():
top = heapq.nlargest(top_k, candidates, key=lambda x: x[0])
prefix_suggestions[prefix] = [query for _, query in top]
# Bulk write to Redis
redis_store.bulk_update(prefix_suggestions)
print(f"Rebuilt trie: {len(prefix_suggestions)} prefixes updated")
API and Caching Layers
from flask import Flask, request, jsonify
import time
app = Flask(__name__)
redis_store = RedisTrieStore(redis_cluster)
local_cache = {} # In-process LRU cache for hottest prefixes
@app.route('/autocomplete')
def autocomplete():
prefix = request.args.get('q', '').strip().lower()
if not prefix or len(prefix) > 100:
return jsonify({'suggestions': []})
# Three-tier caching
# Tier 1: Local in-process cache (sub-millisecond)
if prefix in local_cache:
entry = local_cache[prefix]
if time.time() - entry['time'] list[str]:
"""
Elasticsearch for long-tail queries missing from Redis trie.
Uses prefix query on 'query_text' field with popularity scoring.
"""
query = {
"query": {
"bool": {
"must": [{"prefix": {"query_text": prefix}}],
}
},
"sort": [{"frequency": {"order": "desc"}}],
"size": size
}
results = es_client.search(index="queries", body=query)
return [hit['_source']['query_text'] for hit in results['hits']['hits']]
Client-Side Optimizations
// Debouncing: don't send request on every keystroke
let debounceTimer = null;
function onInput(event) {
clearTimeout(debounceTimer);
debounceTimer = setTimeout(() => {
const prefix = event.target.value;
if (prefix.length >= 2) { // min 2 chars before querying
fetchSuggestions(prefix);
}
}, 150); // 150ms debounce — feels instant, reduces requests by ~70%
}
// Cache on client too
const clientCache = new Map();
async function fetchSuggestions(prefix) {
if (clientCache.has(prefix)) {
showSuggestions(clientCache.get(prefix));
return;
}
const response = await fetch('/autocomplete?q=' + encodeURIComponent(prefix));
const data = await response.json();
clientCache.set(prefix, data.suggestions);
// Prefetch: when user types "py", prefetch "pyt", "pyw", etc.
// Not practical for arbitrary prefixes but useful for common ones
showSuggestions(data.suggestions);
}
Scale and Sharding
- Prefix space partitioning: Shard Redis by first 2 characters of prefix. All “go*” prefixes → shard 1, “py*” → shard 2. Consistent hashing ensures even distribution.
- Hot prefix issue: “i”, “a”, “th” are queried billions of times/day. Handle with local in-process caching — once cached in each server’s memory, Redis is bypassed entirely.
- Trie size: English top 10M queries × avg 5 chars prefix length = 50M prefix entries. Each entry with 10 suggestions × 30 bytes = ~15GB — fits in Redis cluster.
- CDN for autocomplete: Cache most popular prefixes at CDN edge. High cache hit rate for common prefixes (80%+ served from CDN, ~5ms latency vs 50ms from origin).
Interview Tips
- Start with trie data structure — explain O(prefix_length) lookup
- Key optimization: cache top-k at each node (avoid DFS per query)
- Discuss freshness vs latency trade-off: rebuild trie hourly (fresh) vs real-time (complex)
- Mention client-side debouncing — reduces QPS by 5-10x
- Three-tier caching: local cache → Redis → Elasticsearch
- For Google scale: mention that at 5M QPS, even Redis is bottleneck → local in-process cache essential