Question 1

How does a trie-based autocomplete system work?

Accepted Answer

A trie (prefix tree) stores all searchable queries where each node represents a character and paths from root represent prefixes. For autocomplete: pre-compute and store the top-K suggestions at each trie node based on query frequency. When the user types a prefix, traverse the trie following the prefix characters. At the final node, return the stored top-K suggestions. Lookup is O(L) where L is the prefix length -- independent of total queries. Building: aggregate query logs for frequency, insert queries into the trie, compute top-K at each node using a heap. Updates: rebuild the trie periodically (hourly) from updated logs. For trending queries, maintain a separate small real-time trie and merge results at query time. Memory: 100 million unique queries use approximately 10-50 GB. Shard by prefix range (a-m on shard 1, n-z on shard 2) for larger datasets.

Question 2

How do you rank autocomplete suggestions?

Accepted Answer

Multiple ranking signals combined: (1) Query frequency -- more popular queries rank higher. python tutorial over python terrarium. (2) Recency -- exponential decay: score = frequency * decay^(days_since_last_search). Recent queries score higher. (3) Trending -- detect frequency spikes using sliding windows. If earthquake searches increased 10x in the last hour, boost it. (4) Personalization -- the user own search history from Redis. If a user frequently searches python pandas, boost it for that user. (5) Query quality -- filter offensive, misspelled, or low-quality queries via blocklist and spell-checker. Combined score: log(frequency) * recency_decay + trending_boost + personalization_boost. Return top-K by combined score. Advanced: train an ML model on click-through data (which suggestion did users select?) to learn optimal ranking weights.

Question 3

How does client-side debouncing reduce autocomplete request volume?

Accepted Answer

Without debouncing, every keystroke triggers an API request. Typing how in 150ms sends 3 requests (h, ho, how). Debouncing waits 100-200ms after the last keystroke before sending a request. If the user types how in 150ms, only one request for how is sent. This reduces request volume by 60-80%. Additional client optimizations: (1) Cache suggestions by prefix. If how returns suggestions, filter cached results locally for how t before making a new request. (2) Pre-fetch the next likely prefix (e.g., how with a trailing space) in the background. (3) Cancel in-flight requests when the user types faster than responses arrive -- stale responses are useless. (4) Minimum prefix length of 2-3 characters -- single-character prefixes return too many generic results. These optimizations reduce server load dramatically while improving perceived responsiveness.

Question 4

Should you use a custom trie or Elasticsearch for autocomplete?

Accepted Answer

Elasticsearch completion suggester: builds an in-memory FST optimized for prefix lookups. Returns top-K matches in under 5ms. Handles distribution, replication, failover, and relevance scoring. Easy to update (index new documents). Edge ngram approach in Elasticsearch supports fuzzy matching. Custom trie: sub-5ms latency (1-2ms typical). Maximum control over ranking and personalization. Requires custom sharding, replication, and deployment infrastructure. Recommendation: use Elasticsearch for most applications. It provides sufficient performance (10-30ms) with much less operational complexity. Build a custom trie only at extreme scale (Google, Bing) where sub-5ms latency and custom ranking logic justify the engineering investment. Many production autocomplete systems at mid-size companies use Elasticsearch successfully.

System Design: Search Autocomplete/Typeahead — Trie, Elasticsearch, Ranking, Debouncing, Personalization, Trending

Requirements and Scale

Trie-Based Approach

Elasticsearch-Based Approach

Ranking Suggestions

Client-Side Optimization

System Architecture