Search Autocomplete: Low-Level Design – Tech Interview Dot Org

A search autocomplete system provides real-time suggestions as users type, ranking completions by popularity or relevance. At Google or Amazon scale, this must respond in < 100ms to millions of simultaneous queries. The core data structure is a trie, but production systems often replace it with a more scalable alternative — a sorted hash of prefix → top-k completions, precomputed offline.

The Trie Data Structure

A trie (prefix tree) stores strings by their common prefixes. Each node represents a character; each path from root to a marked node represents a string. For autocomplete: each node stores the top-k completions for the prefix represented by that node’s path. Given prefix “app”, traverse the trie to the node for “app” and return its stored top-k list.

Maintaining the top-k list in each node: when inserting “apple” with frequency 1000, update all ancestor nodes (root → “a” → “ap” → “app” → “appl” → “apple”) to include “apple” in their top-k if its frequency ranks. This makes reads O(prefix_length) but writes expensive — updating a popular term updates O(term_length) nodes, and each node maintains a sorted top-k list. For a large vocabulary, this is the primary scalability challenge of trie-based systems.

Prefix Hash Table: Production Alternative

Many production systems use a precomputed hash table: prefix → [top-10 completions with scores]. Built offline from query logs, stored in Redis or an in-memory store. Query: O(1) hash lookup for the exact prefix. Advantages: simple, fast, horizontally scalable (Redis Cluster shards by prefix hash). Disadvantages: rigid — the completions are precomputed; new trending terms appear only after the next build cycle (typically hourly or daily). Build process: aggregate query logs → compute top-10 per prefix → write to Redis. Only generate entries for prefixes of length >= 2 (single-character prefixes have too many completions and too much storage).

Ranking Signals

Simple frequency ranking (most popular globally) is insufficient for a good autocomplete. Additional ranking signals: (1) personalization — if the user previously searched “python interview questions”, rank Python-related completions higher when they type “py”; (2) recency — a trending topic this hour should rank above a historically popular term with higher lifetime frequency; (3) geographic relevance — “weather in Seattle” is more relevant to Seattle users typing “weather in S”; (4) freshness — weight recent queries more heavily than historical ones using exponential decay. Combine signals with a weighted scoring function: score = freq_weight * frequency + recency_weight * recency_score + personalization_weight * user_affinity.

Latency Optimization

Target: < 50ms end-to-end (client sends request, completion list renders). Budget: ~5ms network (CDN edge), ~10ms lookup, ~5ms network back, ~30ms render. Serve autocomplete from CDN edge nodes with cached prefixes: the top 10,000 most popular prefixes cover ~80% of user queries. Edge caches these results with a 5-minute TTL — serving them without hitting origin. For cache misses (tail prefixes): request goes to origin, result is cached at the edge for subsequent requests. Client-side debouncing: only send an autocomplete request after the user pauses typing for 200ms — reduces request rate by 5-10x without hurting UX.

Handling Personalization at Scale

Personalized autocomplete requires per-user signals in the completion ranking. Architecture: the global top-k list provides the base candidates; a personalization layer re-ranks the top 50 candidates based on the user’s search history and preferences (using a lightweight scoring model). The personalization layer runs server-side at < 5ms: precompute user affinity vectors hourly and cache them in Redis (user_id → affinity_map). On each autocomplete query, fetch the user's affinity map and adjust scores before returning the top-10. Never include the personalization step in the CDN edge path — personalized results cannot be cached globally.