Typeahead Search (Autocomplete) System Low-Level Design

Typeahead search (also called autocomplete) is one of the most latency-sensitive features in a modern product. Users expect suggestions to appear in under 100 ms, which means the system must be pre-computed, cached aggressively, and sharded carefully. This post walks through a production-grade low-level design.

Requirements

  • Return top-K (default 5) ranked suggestions for a given prefix in <100 ms p99.
  • Handle 10,000 QPS at peak with personalized and global suggestions.
  • Support at least 500 million distinct phrases in the global index.
  • Personalization: weight the user’s own recent queries 2× over global frequency.
  • Minimum prefix length: 2 characters (shorter inputs return nothing).

Core Data Structure: Trie with Embedded Top-K

A plain trie lets you find all words sharing a prefix, but traversing all descendants to rank them is O(n) per query — too slow. Instead, each trie node caches its own top-K list, maintained at index-build time.

class TrieNode:
    children: Map[char, TrieNode]
    top_k: List[SearchSuggestion]   # precomputed, sorted by score desc
    # top_k is updated during batch rebuild, not on every query

Query time becomes O(prefix_length) — you walk the trie to the prefix node, then read its top_k list directly. No traversal of descendants needed.

Data Models

SearchSuggestion:
    phrase:     String        # e.g. "machine learning"
    score:      Float         # global frequency score (normalized 0–1)
    type:       Enum          # GLOBAL | TRENDING | PERSONAL

UserQueryHistory:
    user_id:    UUID
    phrase:     String
    queried_at: Timestamp
    count:      Int           # how many times this user typed this phrase

API

GET /suggest?q={prefix}&limit={k}&user_id={uid}

Response 200:
[
  {"phrase": "machine learning", "score": 0.92},
  {"phrase": "machine learning interview", "score": 0.87},
  ...
]

The user_id is optional. When absent, only global results are returned.

Ranking Formula

final_score(phrase, user) =
    global_frequency_score(phrase)          # base: log(query_count) / log(max_count)
  + recency_boost(phrase)                   # +0.1 if phrase trended in last 24 h
  + personalization_boost(phrase, user)     # user_query_count * 2 * personal_weight

Personal weight is typically 0.15 — enough to surface a user’s own history without completely overriding global popularity.

Caching Strategy

  • Global prefix cache: Redis hash. Key = suggest:{prefix}, value = serialized top-K list. TTL = 60 seconds. Warmed on trie rebuild.
  • Per-user personalized cache: Redis hash. Key = suggest:{user_id}:{prefix}, TTL = 30 seconds. Written lazily on first personalized request for a prefix.
  • Fallback: if personalized cache misses, merge global top-K with user’s recent history in-process and cache the result.

Trie Update Schedule

Updating the trie on every query would cause write contention across hundreds of nodes. Instead:

  • Hourly batch rebuild: Spark job reads query logs from the last 24 h, aggregates phrase frequencies, rebuilds the full trie, and pushes it to all trie servers atomically via a blue/green swap.
  • 5-minute hot-phrase update: A lighter job re-scores the top 10,000 phrases by recent velocity and patches only those nodes, avoiding a full rebuild.

Sharding

A single trie for 500 M phrases won’t fit in one machine’s memory (~50 GB+). Shard by the first character of the prefix (26 shards for a–z, plus one for digits/other). The query router hashes the first character and routes to the correct shard. For hot letters (e.g., “s”), further split by the first two characters using consistent hashing.

Edge Cases

  • Profanity filter: Deny-list applied at index-build time; flagged phrases never enter the trie.
  • Trending terms: Injected into top-K lists out-of-band by a trend-detection service that monitors query velocity spikes.
  • Cold start for new users: No personal history — serve global results only.
  • Unicode / CJK: Tokenize by character rather than byte; each CJK character is a valid prefix unit.

Google system design interviews cover search autocomplete at scale. See common questions for Google interview: search autocomplete and typeahead system design.

LinkedIn system design covers typeahead and people search. Review design patterns for LinkedIn interview: typeahead search and people search system design.

Uber system design includes location autocomplete and search. See design patterns for Uber interview: location search and autocomplete system design.

See also: Coinbase Interview Guide

Scroll to Top