Search Suggestions (Autocomplete) System Low-Level Design

Requirements

  • Return top-k (k=10) query suggestions as a user types, within 100ms
  • Suggestions ranked by search frequency or relevance score
  • Handle 10M daily active users, 1B queries/day
  • Update suggestions as queries trend (near-real-time, not necessarily instant)
  • Personalized suggestions (recent searches, user-specific boosting)

Trie-Based Architecture

The core data structure is a prefix trie where each node stores the top-k suggestions for that prefix. On each query, traverse from root using the typed prefix — the suggestions are pre-computed at each node, so lookup is O(L) where L is the prefix length.

class TrieNode:
    def __init__(self):
        self.children = {}
        self.top_k = []  # pre-sorted list of (score, query) tuples

class SearchTrie:
    def search(self, prefix):
        node = self.root
        for c in prefix:
            if c not in node.children:
                return []
            node = node.children[c]
        return [q for score, q in node.top_k]

Pre-computing top-k at every node means suggestion retrieval is O(L) — just traverse the prefix. The tradeoff: updates require recomputing top-k at every node along the path from root to the updated term (O(L * k) per update).

Data Flow

User types → Suggestion API → Redis (prefix → top-k list, TTL=10min)
                            → Trie Service (if Redis miss)

Search query submitted → Kafka (query log)
                       → Frequency Aggregator (Spark/Flink, hourly batch)
                       → Trie Builder (rebuilds trie from aggregated frequencies)
                       → Trie stored in Redis + replicated to Suggestion API servers

Redis Caching Layer

Store precomputed suggestions in Redis: key=suggest:{prefix}, value=JSON array of top-10 queries, TTL=10 minutes. The suggestion API checks Redis first (O(1) lookup). On miss: query the trie service. Cache hit rate is very high for common prefixes (short prefixes cover 80%+ of queries). For a prefix of length 1-3 characters, cache permanently (high frequency, low count). For longer prefixes, TTL=10min.

Trie Update Strategy

Option A — Real-time updates: on each completed search, update the query frequency and recompute top-k at each trie node along the path. O(L * k log k) per query. At 10K QPS, this is too expensive for a single trie instance. Option B — Periodic batch rebuild (recommended for scale): aggregate query frequencies over 1-hour windows using Spark. Rebuild the full trie from the aggregated data. Swap the trie (blue-green deploy): all suggestion API servers hot-reload the new trie from shared storage (S3). Suggestions lag by up to 1 hour but the trie is always consistent. For trending queries (last 10 minutes), apply a small real-time boost using a separate counter (Redis ZINCRBY on a sorted set of recent queries).

Personalization

Two layers: (1) Global suggestions from the trie (population-level frequency). (2) Personal layer: store each user’s recent searches in Redis (key=user_recents:{user_id}, sorted set by timestamp, max 20 entries). Merge: start with personal recent searches that match the prefix, then fill remaining slots from global suggestions. Personal results are fetched from Redis in O(log n + k), merged client-side or in the suggestion API.

Query Normalization

Before inserting into the trie: lowercase, trim whitespace, strip punctuation, handle spelling variants. Spell correction: maintain a dictionary of common misspellings → canonical forms. On suggestion miss for the prefix, try normalized/corrected prefix. Phonetic matching (Soundex, Metaphone) for fuzzy suggestions is expensive at scale — prefer exact prefix match + spell correction for speed.

Key Design Decisions

  • Pre-compute top-k at each trie node — O(L) lookup, not O(L + n) search
  • Redis prefix cache — eliminates trie traversal for 90%+ of requests
  • Batch trie rebuild hourly — consistent data without per-query update overhead
  • Real-time trending boost via Redis sorted set — catches viral queries within minutes
  • Personal layer merged at serving time — no per-user trie, shared infrastructure

Google system design interviews frequently cover search autocomplete. See common questions for Google interview: search autocomplete and typeahead system design.

Amazon system design covers product search autocomplete. Review patterns for Amazon interview: search suggestions and product autocomplete design.

LinkedIn system design covers people search and autocomplete. See design patterns for LinkedIn interview: search autocomplete and people suggestion design.

Scroll to Top