Search Suggestions (Autocomplete) System Low-Level Design

Requirements

  • Return top-k (k=10) query suggestions as a user types, within 100ms
  • Suggestions ranked by search frequency or relevance score
  • Handle 10M daily active users, 1B queries/day
  • Update suggestions as queries trend (near-real-time, not necessarily instant)
  • Personalized suggestions (recent searches, user-specific boosting)

Trie-Based Architecture

The core data structure is a prefix trie where each node stores the top-k suggestions for that prefix. On each query, traverse from root using the typed prefix — the suggestions are pre-computed at each node, so lookup is O(L) where L is the prefix length.

class TrieNode:
    def __init__(self):
        self.children = {}
        self.top_k = []  # pre-sorted list of (score, query) tuples

class SearchTrie:
    def search(self, prefix):
        node = self.root
        for c in prefix:
            if c not in node.children:
                return []
            node = node.children[c]
        return [q for score, q in node.top_k]

Pre-computing top-k at every node means suggestion retrieval is O(L) — just traverse the prefix. The tradeoff: updates require recomputing top-k at every node along the path from root to the updated term (O(L * k) per update).

Data Flow

User types → Suggestion API → Redis (prefix → top-k list, TTL=10min)
                            → Trie Service (if Redis miss)

Search query submitted → Kafka (query log)
                       → Frequency Aggregator (Spark/Flink, hourly batch)
                       → Trie Builder (rebuilds trie from aggregated frequencies)
                       → Trie stored in Redis + replicated to Suggestion API servers

Redis Caching Layer

Store precomputed suggestions in Redis: key=suggest:{prefix}, value=JSON array of top-10 queries, TTL=10 minutes. The suggestion API checks Redis first (O(1) lookup). On miss: query the trie service. Cache hit rate is very high for common prefixes (short prefixes cover 80%+ of queries). For a prefix of length 1-3 characters, cache permanently (high frequency, low count). For longer prefixes, TTL=10min.

Trie Update Strategy

Option A — Real-time updates: on each completed search, update the query frequency and recompute top-k at each trie node along the path. O(L * k log k) per query. At 10K QPS, this is too expensive for a single trie instance. Option B — Periodic batch rebuild (recommended for scale): aggregate query frequencies over 1-hour windows using Spark. Rebuild the full trie from the aggregated data. Swap the trie (blue-green deploy): all suggestion API servers hot-reload the new trie from shared storage (S3). Suggestions lag by up to 1 hour but the trie is always consistent. For trending queries (last 10 minutes), apply a small real-time boost using a separate counter (Redis ZINCRBY on a sorted set of recent queries).

Personalization

Two layers: (1) Global suggestions from the trie (population-level frequency). (2) Personal layer: store each user’s recent searches in Redis (key=user_recents:{user_id}, sorted set by timestamp, max 20 entries). Merge: start with personal recent searches that match the prefix, then fill remaining slots from global suggestions. Personal results are fetched from Redis in O(log n + k), merged client-side or in the suggestion API.

Query Normalization

Before inserting into the trie: lowercase, trim whitespace, strip punctuation, handle spelling variants. Spell correction: maintain a dictionary of common misspellings → canonical forms. On suggestion miss for the prefix, try normalized/corrected prefix. Phonetic matching (Soundex, Metaphone) for fuzzy suggestions is expensive at scale — prefer exact prefix match + spell correction for speed.

Key Design Decisions

  • Pre-compute top-k at each trie node — O(L) lookup, not O(L + n) search
  • Redis prefix cache — eliminates trie traversal for 90%+ of requests
  • Batch trie rebuild hourly — consistent data without per-query update overhead
  • Real-time trending boost via Redis sorted set — catches viral queries within minutes
  • Personal layer merged at serving time — no per-user trie, shared infrastructure


{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does a trie support autocomplete suggestions in O(L) time?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A trie (prefix tree) stores strings character by character. Each node represents a prefix; traversal from root to any node spells out that prefix. For autocomplete, the naive approach is to traverse to the prefix node and then DFS to find all completions — O(n) in the worst case. The optimized approach pre-computes the top-k suggestions at every trie node. Each node stores a sorted list of (score, query) pairs for the best k queries with that prefix. Lookup is now O(L) — just traverse L characters to reach the prefix node, then return the pre-stored top-k list. The tradeoff: updates require recomputing top-k at each ancestor node (O(L * k) per update). This is why most production systems rebuild the trie in batch rather than updating it per query.”}},{“@type”:”Question”,”name”:”How does a Redis sorted set accelerate autocomplete suggestions?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Store precomputed suggestions as a Redis sorted set: ZADD suggest:{prefix} {score} {query_text}. ZREVRANGEBYSCORE returns the top-k queries for a prefix in O(log n + k). Alternatively, store as a Redis string (key=suggest:{prefix}, value=JSON array). Cache common prefixes permanently (1-3 characters). For longer prefixes, set TTL=10 minutes. On cache miss: query the trie service and populate Redis. Cache hit rate is very high for short prefixes since they account for most searches. Redis handles 100K+ lookups/second. This layer eliminates trie traversal for the majority of requests and is the key to achieving <50ms suggestion latency at scale.”}},{“@type”:”Question”,”name”:”How do you handle trending queries that spike within minutes?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The batch trie rebuild happens hourly, so a query that starts trending at 2:05pm won't appear in suggestions until 3:00pm. Trending boost layer: maintain a Redis sorted set trend_queries with ZINCRBY trend_queries 1 {query_text} on each search. TTL on the sorted set: 1 hour (sliding window). ZREVRANGEBYSCORE trend_queries +inf 0 LIMIT 0 100 returns the top trending queries. At serving time: fetch base suggestions from trie/Redis cache, then merge with trending queries that match the prefix. Give trending queries a score boost. This catches viral events (breaking news, celebrity mentions) within minutes without rebuilding the full trie.”}},{“@type”:”Question”,”name”:”What is the "Justin Bieber problem" in autocomplete and how do you solve it?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The Justin Bieber problem (named from a Twitter engineering post) applies to autocomplete the same way it applies to social feeds: celebrities or viral events generate such high query volume that their terms dominate all prefix suggestions. Example: "ju" should show "jump", "jupiter", "just" — but if Justin Bieber is trending, "justin bieber" floods every prefix from "j" to "justin". Solutions: (1) Frequency cap per query — no single query contributes more than X% of a prefix's score. (2) Diversity injection — ensure top-k suggestions include variety across categories (people, places, actions). (3) Separate trending layer — keep celebrity/viral queries in the trending sorted set rather than baking them into the trie, then merge at serving time with a capped weight. This prevents viral events from permanently distorting the trie.”}},{“@type”:”Question”,”name”:”How does personalized autocomplete work at scale?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Two-layer architecture: (1) Global layer — trie/Redis suggestions based on population-level query frequency. Shared across all users. (2) Personal layer — per-user recent searches stored in Redis: ZADD user_recents:{user_id} {timestamp} {query_text}. On autocomplete request: fetch global top-k for the prefix + personal recent searches matching the prefix. Merge: personal matches ranked first (recency boosted), then global suggestions fill remaining slots. Personal layer is fetched from Redis in O(log n + k) — fast enough to merge inline with the global results. Limit personal history to 20-50 queries per user; ZREMRANGEBYRANK to prune old entries. No per-user trie — shared infrastructure with a personal overlay.”}}]}

Google system design interviews frequently cover search autocomplete. See common questions for Google interview: search autocomplete and typeahead system design.

Amazon system design covers product search autocomplete. Review patterns for Amazon interview: search suggestions and product autocomplete design.

LinkedIn system design covers people search and autocomplete. See design patterns for LinkedIn interview: search autocomplete and people suggestion design.

Scroll to Top