Question 1

How do you store autocomplete suggestions efficiently for fast prefix lookup?

Accepted Answer

Use a trie where each node stores the top-K (typically 5-10) suggestions for all queries passing through it. On insertion, propagate the new phrase up the trie and evict the lowest-scoring suggestion from each node's top-K list if it overflows. This means a lookup for any prefix requires only a single trie traversal to the prefix node - no further subtree scan needed. The top-K at each node is maintained as a min-heap. Storage: for a vocabulary of 1M phrases with average 8-character length, the trie has ~5M nodes; with top-10 at each node, total memory is manageable at ~500MB. Shard by first letter or consistent hash of prefix across multiple servers.

Question 2

How do you rank autocomplete suggestions?

Accepted Answer

Ranking score = global_frequency_score + recency_boost + personalization_bonus. global_frequency_score: log(1 + query_count) normalized to 0-1 range. recency_boost: decay function on recent query counts (last 24h weighted 3x, last week 1.5x). personalization_bonus: if the user has typed or clicked this phrase before, add a fixed bonus (e.g., +0.3). Clamp total score to [0, 1]. Store scores in the trie nodes. Rebuild scores hourly from query logs using a batch job. For trending queries (last 1h spike > 5x baseline), apply a trending multiplier. Present personalized results first if available, fall back to global ranking.

Question 3

How do you cache autocomplete responses at scale?

Accepted Answer

Two-level cache: (1) Global prefix cache: Redis sorted set per prefix, key=suggest:{prefix}, members=phrases, scores=ranking score, TTL=60s. Covers 80%+ of traffic since popular prefixes repeat constantly. (2) Per-user personalized cache: Redis hash per user, key=usersuggest:{user_id}:{prefix}, value=JSON list, TTL=5 minutes. Only cache if the user has enough query history to make personalization meaningful. CDN-level caching is not practical for personalized results. For cold-start prefixes not in cache, query the trie service directly and populate the cache. Evict stale entries on trie rebuild.

Question 4

How do you update the trie with new queries in real time?

Accepted Answer

Do not update the trie on every query - at 10K QPS that would cause write contention. Instead: stream all search queries to Kafka. A batch consumer aggregates query counts over 5-minute windows and writes to a QueryFrequency table. A scheduler runs a trie rebuild job every hour: reads all phrases above a frequency threshold, computes updated scores, and builds a new trie in memory. The new trie is atomically swapped with the live trie (pointer swap, old trie garbage collected). For truly trending queries (spike detection), a real-time path runs every 5 minutes on a hot-phrases subset, patching only the changed nodes rather than full rebuild.

Question 5

How do you handle typeahead search at 10,000 QPS with sub-100ms latency?

Accepted Answer

Architecture: stateless suggestion API servers behind a load balancer. Each server holds the full trie shard for its assigned prefix range in memory. Request routing: hash the prefix to determine the shard. Response: trie lookup is O(L) where L is prefix length - typically <10ms. Bottleneck is network + serialization. Optimizations: keep the trie in memory (not disk), use a lightweight serialization format (MessagePack, not JSON) for internal service calls, return only phrase strings and scores (no metadata). Deploy multiple replicas of each shard for redundancy. Total infrastructure: 8 shards x 3 replicas = 24 trie servers, each holding ~60MB of trie data. Load balancer routes by prefix hash to the correct shard.

Typeahead Search (Autocomplete) System Low-Level Design

Requirements

Core Data Structure: Trie with Embedded Top-K

Data Models

API

Ranking Formula

Caching Strategy

Trie Update Schedule

Sharding

Edge Cases