Typeahead / Autocomplete Service: Low-Level Design

A typeahead (autocomplete) service returns ranked completions for a query prefix in under 100ms. Google Search completes queries as you type; Spotify autocompletes song names; GitHub autocompletes repository names. The design must index millions of strings, support prefix search efficiently, and rank results by popularity or relevance — at latencies that feel instant to users.

Trie-Based Prefix Indexing

A trie (prefix tree) stores strings as paths from the root: each node represents one character; a path from root to a leaf represents a string. Prefix search: traverse the trie following the query characters; collect all strings in the subtree below. With a branching factor of 26 (English alphabet), a trie with 1M strings uses ~50-100MB in memory. The problem: collecting all strings in a subtree is O(k) where k is the number of matching strings — too slow if a common prefix (“a”) matches 500,000 strings. Optimization: at each trie node, store the top-K (e.g., top-20) completions by score, precomputed. Prefix query returns these pre-ranked results immediately — O(prefix_length) rather than O(subtree_size). Score update propagates from leaf to root, updating the top-K list at each ancestor node. This top-K trie is the core data structure for most production autocomplete systems.

Alternative: Sorted Set Prefix Search in Redis

For smaller datasets or simpler deployments: store all completions in a Redis sorted set with lexicographic ordering. ZADD completions 0 “apple” — all members have score 0; ordering is lexicographic. ZRANGEBYLEX completions “[app” “(appxff” returns all members starting with “app” — efficient for prefix search. Rank by score: a separate sorted set stores (completion → popularity_score); the lexicographic set is used for prefix filtering, then a lookup fetches scores for result ranking. Limitation: ZRANGEBYLEX returns all matching completions — still requires post-filtering to top-K. Pre-suffix approach: index every suffix of every phrase (Google indexes “google maps” as both “google maps” and “maps” → enables mid-string matching). Redis sorted sets handle up to ~10M entries comfortably; beyond that, a trie service is more appropriate.

Ranking and Personalization

Ranking signals for autocomplete results: (1) Global popularity: click-through rate from the autocomplete result, number of searches, number of times the exact string was typed. Aggregated from search logs in a daily batch job. (2) Recency: trending queries from the last 24 hours (breaking news appears immediately). Computed from a streaming pipeline (Kafka → Flink → trie update). (3) Personalization: user’s previous searches, location, language. Personalization is layered on top of global results: re-rank the top-K global results using user-specific signals. Do not build a separate per-user trie — the global trie provides the candidates; a lightweight re-ranking layer applies personalization. (4) Query context: “java” → programming or coffee? Use session context (what the user has been searching for) to disambiguate and rank accordingly.

Serving Architecture

Autocomplete must respond in < 100ms (often < 50ms). Architecture: (1) Trie service: in-memory trie with top-K precomputed at each node. Queries are pure memory lookups — 1-5ms. Serialize the trie to disk periodically; load into memory on startup. (2) Caching: the top queries are extremely repetitive (80% of queries are the same 1000 prefixes). Cache prefix → top-K results in Redis with a 5-minute TTL. CDN edge caching for popular prefixes (a GET request with the query prefix as a parameter). (3) Tiered query: first check CDN edge cache (sub-millisecond) → Redis cache → trie service. Only novel prefixes reach the trie service. (4) Debouncing on the client: don't send a request on every keystroke — wait until the user pauses typing for 50-100ms. Reduces request rate by 60-70%.

Index Updates

The trie must reflect new popular queries. Two update modes: (1) Daily batch rebuild: aggregate the last 24 hours of search logs, compute popularity scores, rebuild the trie from scratch. Swap the new trie into service with zero downtime (blue-green or atomic pointer swap). Simple and consistent; new queries appear the next day. (2) Streaming incremental update: a stream processor (Flink) consumes search events in real time, updates popularity scores, and propagates score changes up the trie. More complex but enables trending queries to appear within minutes. Production systems combine both: daily batch for stability plus streaming for trending content. Rate-limit trie updates to prevent write amplification — a single viral query triggering a cascade of updates to every ancestor node up to the root.

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale

See also: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering

See also: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture

See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety

See also: Atlassian Interview Guide

See also: Coinbase Interview Guide

See also: Shopify Interview Guide

See also: Snap Interview Guide

See also: Lyft Interview Guide 2026: Rideshare Engineering, Real-Time Dispatch, and Safety Systems

See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems

Scroll to Top