Typeahead search (also called autocomplete) is one of the most latency-sensitive features in a modern product. Users expect suggestions to appear in under 100 ms, which means the system must be pre-computed, cached aggressively, and sharded carefully. This post walks through a production-grade low-level design.
Requirements
- Return top-K (default 5) ranked suggestions for a given prefix in <100 ms p99.
- Handle 10,000 QPS at peak with personalized and global suggestions.
- Support at least 500 million distinct phrases in the global index.
- Personalization: weight the user’s own recent queries 2× over global frequency.
- Minimum prefix length: 2 characters (shorter inputs return nothing).
Core Data Structure: Trie with Embedded Top-K
A plain trie lets you find all words sharing a prefix, but traversing all descendants to rank them is O(n) per query — too slow. Instead, each trie node caches its own top-K list, maintained at index-build time.
class TrieNode:
children: Map[char, TrieNode]
top_k: List[SearchSuggestion] # precomputed, sorted by score desc
# top_k is updated during batch rebuild, not on every query
Query time becomes O(prefix_length) — you walk the trie to the prefix node, then read its top_k list directly. No traversal of descendants needed.
Data Models
SearchSuggestion:
phrase: String # e.g. "machine learning"
score: Float # global frequency score (normalized 0–1)
type: Enum # GLOBAL | TRENDING | PERSONAL
UserQueryHistory:
user_id: UUID
phrase: String
queried_at: Timestamp
count: Int # how many times this user typed this phrase
API
GET /suggest?q={prefix}&limit={k}&user_id={uid}
Response 200:
[
{"phrase": "machine learning", "score": 0.92},
{"phrase": "machine learning interview", "score": 0.87},
...
]
The user_id is optional. When absent, only global results are returned.
Ranking Formula
final_score(phrase, user) =
global_frequency_score(phrase) # base: log(query_count) / log(max_count)
+ recency_boost(phrase) # +0.1 if phrase trended in last 24 h
+ personalization_boost(phrase, user) # user_query_count * 2 * personal_weight
Personal weight is typically 0.15 — enough to surface a user’s own history without completely overriding global popularity.
Caching Strategy
- Global prefix cache: Redis hash. Key =
suggest:{prefix}, value = serialized top-K list. TTL = 60 seconds. Warmed on trie rebuild. - Per-user personalized cache: Redis hash. Key =
suggest:{user_id}:{prefix}, TTL = 30 seconds. Written lazily on first personalized request for a prefix. - Fallback: if personalized cache misses, merge global top-K with user’s recent history in-process and cache the result.
Trie Update Schedule
Updating the trie on every query would cause write contention across hundreds of nodes. Instead:
- Hourly batch rebuild: Spark job reads query logs from the last 24 h, aggregates phrase frequencies, rebuilds the full trie, and pushes it to all trie servers atomically via a blue/green swap.
- 5-minute hot-phrase update: A lighter job re-scores the top 10,000 phrases by recent velocity and patches only those nodes, avoiding a full rebuild.
Sharding
A single trie for 500 M phrases won’t fit in one machine’s memory (~50 GB+). Shard by the first character of the prefix (26 shards for a–z, plus one for digits/other). The query router hashes the first character and routes to the correct shard. For hot letters (e.g., “s”), further split by the first two characters using consistent hashing.
Edge Cases
- Profanity filter: Deny-list applied at index-build time; flagged phrases never enter the trie.
- Trending terms: Injected into top-K lists out-of-band by a trend-detection service that monitors query velocity spikes.
- Cold start for new users: No personal history — serve global results only.
- Unicode / CJK: Tokenize by character rather than byte; each CJK character is a valid prefix unit.
Google system design interviews cover search autocomplete at scale. See common questions for Google interview: search autocomplete and typeahead system design.
LinkedIn system design covers typeahead and people search. Review design patterns for LinkedIn interview: typeahead search and people search system design.
Uber system design includes location autocomplete and search. See design patterns for Uber interview: location search and autocomplete system design.
See also: Coinbase Interview Guide