System Design Interview: Design Google Search Autocomplete (Typeahead)

System Design Interview: Design Google Search Autocomplete (Typeahead)

Search autocomplete (typeahead) is one of the most common system design questions, asked at Google, Amazon, LinkedIn, Twitter, and virtually every company with a search feature. It tests your understanding of trie data structures, caching, distributed systems, and latency optimization.

Requirements Clarification

Functional Requirements

  • As user types a query prefix, return top 5-10 autocomplete suggestions
  • Suggestions ranked by search frequency / trending score
  • Support for multiple languages and special characters
  • Results updated based on real search data (trending queries appear within hours)
  • Filter harmful/inappropriate suggestions

Non-Functional Requirements

  • Latency: P99 under 100ms (autocomplete feels instant)
  • Scale: Google Search — 8.5B searches/day, 5M QPS for autocomplete
  • Each keystroke generates a request — high QPS relative to search traffic
  • High availability: 99.99%
  • Freshness: trending queries appear in suggestions within 1-2 hours

High-Level Architecture

Client (browser/app)
    ↓ (debounced: 100-300ms after keystroke)
CDN / API Gateway
    ↓ (cache hit: 80% served from edge)
Autocomplete Service
    ↓                    ↓
Trie Store (Redis)   Elasticsearch
    ↓
Aggregation Pipeline
(Kafka → Spark/Flink → Trie builder)

Core Data Structure: Trie

import heapq
from collections import defaultdict
from typing import Optional

class TrieNode:
    def __init__(self):
        self.children: dict[str, 'TrieNode'] = {}
        self.is_end: bool = False
        self.frequency: int = 0
        # Optimization: cache top-k suggestions at each node
        # Avoids DFS on every query — O(1) lookup vs O(prefix_subtree)
        self.top_suggestions: list[tuple[int, str]] = []  # [(freq, word)]

class Trie:
    def __init__(self, top_k: int = 10):
        self.root = TrieNode()
        self.top_k = top_k

    def insert(self, word: str, frequency: int):
        """Insert word with frequency, update top-k at each prefix node"""
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
            # Update top-k suggestions cached at this node
            self._update_top_k(node, word, frequency)
        node.is_end = True
        node.frequency = frequency

    def _update_top_k(self, node: TrieNode, word: str, frequency: int):
        """Maintain top-k heap at each node"""
        # Remove existing entry for this word if present
        node.top_suggestions = [
            (f, w) for f, w in node.top_suggestions if w != word
        ]
        node.top_suggestions.append((frequency, word))
        # Keep only top-k by frequency (max-heap by negating freq)
        node.top_suggestions = heapq.nlargest(
            self.top_k, node.top_suggestions, key=lambda x: x[0]
        )

    def search(self, prefix: str) -> list[str]:
        """Return top-k suggestions for prefix in O(len(prefix))"""
        node = self.root
        for char in prefix:
            if char not in node.children:
                return []  # No suggestions for this prefix
            node = node.children[char]

        # Return pre-computed top-k sorted by frequency descending
        suggestions = sorted(node.top_suggestions, key=lambda x: -x[0])
        return [word for _, word in suggestions]

    def get_top_k_from_subtree(self, node: TrieNode) -> list[tuple[int, str]]:
        """DFS to collect all words in subtree (used when building index, not serving)"""
        results = []
        stack = [(node, "")]

        while stack:
            curr, prefix = stack.pop()
            if curr.is_end:
                results.append((curr.frequency, prefix))
            for char, child in curr.children.items():
                stack.append((child, prefix + char))

        return heapq.nlargest(self.top_k, results, key=lambda x: x[0])

Distributed Trie with Redis

import redis
import json

class RedisTrieStore:
    """
    Store trie as Redis hashes.
    Key pattern: trie:{prefix} -> {suggestions: [...], updated_at: timestamp}
    
    For 5M QPS: Redis Cluster with consistent hashing by prefix hash.
    Each prefix maps to a shard deterministically.
    """

    def __init__(self, redis_cluster):
        self.redis = redis_cluster
        self.ttl = 3600  # 1 hour TTL — refreshed by aggregation pipeline

    def get_suggestions(self, prefix: str, top_k: int = 10) -> list[str]:
        """O(1) Redis GET — all computation done at build time"""
        key = f"trie:{prefix.lower().strip()}"
        cached = self.redis.get(key)
        if cached:
            data = json.loads(cached)
            return data['suggestions'][:top_k]
        return []

    def update_suggestions(self, prefix: str, suggestions: list[str]):
        """Called by aggregation pipeline when rebuilding trie"""
        key = f"trie:{prefix.lower().strip()}"
        self.redis.setex(key, self.ttl, json.dumps({
            'suggestions': suggestions,
        }))

    def bulk_update(self, prefix_suggestions: dict[str, list[str]]):
        """Bulk update for trie rebuild — pipeline for efficiency"""
        pipe = self.redis.pipeline()
        for prefix, suggestions in prefix_suggestions.items():
            key = f"trie:{prefix.lower()}"
            pipe.setex(key, self.ttl, json.dumps({'suggestions': suggestions}))
        pipe.execute()

Aggregation Pipeline: Making Suggestions Fresh

from pyflink.datastream import StreamExecutionEnvironment
from pyflink.datastream.window import TumblingEventTimeWindows
from pyflink.common.time import Time

# Simplified Flink job for real-time query frequency aggregation
class SearchQueryAggregator:
    """
    Reads search queries from Kafka.
    Aggregates frequency with 1-hour tumbling windows.
    Outputs top-k queries per prefix to trie builder.
    """

    def process_queries(self, kafka_consumer):
        env = StreamExecutionEnvironment.get_execution_environment()

        stream = env.add_source(kafka_consumer)  # SearchEvent{query, timestamp, user_id}

        # Filter: deduplicate (one count per user per query per session)
        # Aggregate: count per query in 1-hour windows
        # Output: {query: str, count: int, window_end: timestamp}
        aggregated = (stream
            .key_by(lambda e: e.query)
            .window(TumblingEventTimeWindows.of(Time.hours(1)))
            .reduce(lambda a, b: SearchEvent(a.query, a.count + b.count))
        )

        return aggregated

class TrieBuilder:
    """
    Receives aggregated counts, rebuilds trie prefix→suggestions mapping.
    Runs hourly, takes ~10 minutes for full rebuild.
    """

    def rebuild(self, query_counts: list[tuple[str, int]],
                redis_store: RedisTrieStore, top_k: int = 10):
        """
        For each query, generate all prefixes and update suggestions.
        query_counts: [(query_string, frequency)]
        """
        # Build prefix → sorted suggestions
        prefix_to_candidates: dict[str, list[tuple[int, str]]] = defaultdict(list)

        for query, count in query_counts:
            # Generate all prefixes for this query
            for length in range(1, len(query) + 1):
                prefix = query[:length]
                prefix_to_candidates[prefix].append((count, query))

        # For each prefix, keep only top-k
        prefix_suggestions = {}
        for prefix, candidates in prefix_to_candidates.items():
            top = heapq.nlargest(top_k, candidates, key=lambda x: x[0])
            prefix_suggestions[prefix] = [query for _, query in top]

        # Bulk write to Redis
        redis_store.bulk_update(prefix_suggestions)
        print(f"Rebuilt trie: {len(prefix_suggestions)} prefixes updated")

API and Caching Layers

from flask import Flask, request, jsonify
import time

app = Flask(__name__)
redis_store = RedisTrieStore(redis_cluster)
local_cache = {}  # In-process LRU cache for hottest prefixes

@app.route('/autocomplete')
def autocomplete():
    prefix = request.args.get('q', '').strip().lower()
    if not prefix or len(prefix) > 100:
        return jsonify({'suggestions': []})

    # Three-tier caching
    # Tier 1: Local in-process cache (sub-millisecond)
    if prefix in local_cache:
        entry = local_cache[prefix]
        if time.time() - entry['time']  list[str]:
    """
    Elasticsearch for long-tail queries missing from Redis trie.
    Uses prefix query on 'query_text' field with popularity scoring.
    """
    query = {
        "query": {
            "bool": {
                "must": [{"prefix": {"query_text": prefix}}],
            }
        },
        "sort": [{"frequency": {"order": "desc"}}],
        "size": size
    }
    results = es_client.search(index="queries", body=query)
    return [hit['_source']['query_text'] for hit in results['hits']['hits']]

Client-Side Optimizations

// Debouncing: don't send request on every keystroke
let debounceTimer = null;

function onInput(event) {
    clearTimeout(debounceTimer);
    debounceTimer = setTimeout(() => {
        const prefix = event.target.value;
        if (prefix.length >= 2) {  // min 2 chars before querying
            fetchSuggestions(prefix);
        }
    }, 150);  // 150ms debounce — feels instant, reduces requests by ~70%
}

// Cache on client too
const clientCache = new Map();

async function fetchSuggestions(prefix) {
    if (clientCache.has(prefix)) {
        showSuggestions(clientCache.get(prefix));
        return;
    }

    const response = await fetch('/autocomplete?q=' + encodeURIComponent(prefix));
    const data = await response.json();
    clientCache.set(prefix, data.suggestions);

    // Prefetch: when user types "py", prefetch "pyt", "pyw", etc.
    // Not practical for arbitrary prefixes but useful for common ones
    showSuggestions(data.suggestions);
}

Scale and Sharding

  • Prefix space partitioning: Shard Redis by first 2 characters of prefix. All “go*” prefixes → shard 1, “py*” → shard 2. Consistent hashing ensures even distribution.
  • Hot prefix issue: “i”, “a”, “th” are queried billions of times/day. Handle with local in-process caching — once cached in each server’s memory, Redis is bypassed entirely.
  • Trie size: English top 10M queries × avg 5 chars prefix length = 50M prefix entries. Each entry with 10 suggestions × 30 bytes = ~15GB — fits in Redis cluster.
  • CDN for autocomplete: Cache most popular prefixes at CDN edge. High cache hit rate for common prefixes (80%+ served from CDN, ~5ms latency vs 50ms from origin).

Interview Tips

  • Start with trie data structure — explain O(prefix_length) lookup
  • Key optimization: cache top-k at each node (avoid DFS per query)
  • Discuss freshness vs latency trade-off: rebuild trie hourly (fresh) vs real-time (complex)
  • Mention client-side debouncing — reduces QPS by 5-10x
  • Three-tier caching: local cache → Redis → Elasticsearch
  • For Google scale: mention that at 5M QPS, even Redis is bottleneck → local in-process cache essential
Scroll to Top