Typeahead Autocomplete Low-Level Design: Trie Structure, Prefix Scoring, and Real-Time Suggestions

Typeahead Autocomplete System: Overview

A typeahead (autocomplete) system returns ranked query completions as the user types, with latency under 50ms at p99. Low-level design covers the trie data structure, prefix scoring, Redis sorted set caching, personalization, and the offline/online pipeline that keeps suggestions fresh.

Trie Data Structure

A trie (prefix tree) is the canonical data structure for prefix matching. Each node represents one character. Traversing from root along the characters of a prefix reaches the node representing that prefix, from which all completions are descendants.

Naive trie traversal requires visiting all descendant nodes to collect completions. The optimization: each node stores a pre-computed list of the top-K completions (by score) for its prefix. This makes lookup O(prefix_length) — traverse to the prefix node and return the cached list.

Memory tradeoff: storing top-K at every node multiplies storage by K. Typical values: K=10, which is acceptable for tries with millions of nodes if compressed (Patricia trie / radix tree) and kept in memory.

Scoring Function

Each completion candidate has a composite score:

score = query_frequency * recency_weight * personalization_boost
  • query_frequency: raw count of times this query was submitted globally, normalized to [0, 1].
  • recency_weight: decay factor — queries submitted more recently score higher. Use exponential decay: recency = e^(-lambda * days_since_last_seen).
  • personalization_boost: multiplier based on user history. If the user has submitted or clicked this query before, boost by 1.2-2.0x. Cold users get boost = 1.0 (global ranking).

Redis Sorted Set for Prefix Cache

For ultra-low latency, common prefixes (2-5 characters covering 80% of traffic) are pre-cached in Redis sorted sets:

  • Key: typeahead:{prefix}
  • Member: query string (e.g., “python tutorial”)
  • Score: composite score (higher = better)

Fetch top-K: ZREVRANGE typeahead:{prefix} 0 {K-1} WITHSCORES

Cache is rebuilt nightly from the trie. Real-time score increments update both the trie and the Redis cache: ZINCRBY typeahead:{prefix} {delta} {query}.

Trie Rebuild Pipeline

An offline batch job runs nightly:

  1. Read QueryFrequency table — all queries with counts and last_seen timestamps.
  2. Compute composite scores for all candidates.
  3. Build the in-memory trie, populating top-K at each node via a bottom-up sweep.
  4. Serialize the trie to a binary format and upload to object storage.
  5. Serving nodes download the new trie at startup or via hot-reload signal.
  6. Pre-warm Redis sorted sets for the top 10,000 prefixes.

Real-Time Score Updates

When a user submits a query:

  1. Increment QueryFrequency.count for the submitted query (via a write-behind queue to avoid hot-row contention).
  2. For each prefix of the query (length 1 through query_length): ZINCRBY typeahead:{prefix} 1 {query} in Redis.
  3. If a query rises above the top-K threshold, the next nightly rebuild will reflect it in the trie.

Debounce on the client side: do not send a typeahead request until the user has paused for 200ms. This reduces request volume by ~60% on fast typists without noticeable UX degradation.

Personalization Layer

Global scores are blended with user-specific history at serve time:

final_score = alpha * global_score + (1 - alpha) * personal_score

Where alpha = 0.7 for cold users (sparse personal history) and decreases toward 0.3 for power users with rich history. Personal scores are fetched from a user-level Redis hash at serve time and blended in-process before returning results.

Latency Budget

Target: <50ms p99 end-to-end:

  • Redis cache hit: ~1-2ms network + lookup
  • Trie lookup (cache miss): ~5-10ms in-process
  • Personalization blend: ~1ms
  • Network (client to edge): ~10-30ms (CDN edge nodes reduce this)

Deploy typeahead servers at CDN edge locations with regionally replicated Redis to keep network latency under 15ms for 95% of users.

SQL Schema

-- Global query frequency table
CREATE TABLE QueryFrequency (
    query_text    TEXT PRIMARY KEY,
    count         BIGINT NOT NULL DEFAULT 1,
    last_seen_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_queryfreq_count ON QueryFrequency(count DESC);

-- Per-user personalized query history
CREATE TABLE PersonalizedQuery (
    user_id       BIGINT NOT NULL,
    query_text    TEXT NOT NULL,
    count         INT NOT NULL DEFAULT 1,
    last_used_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (user_id, query_text)
);
CREATE INDEX idx_personalquery_user ON PersonalizedQuery(user_id, count DESC);

-- Materialized cache for serving (populated by offline job)
CREATE TABLE TypeaheadCache (
    prefix       VARCHAR(32) PRIMARY KEY,
    suggestions  JSONB NOT NULL,   -- [{query, score}, ...]
    cached_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Python Implementation

import redis
import math
from typing import List, Optional
from dataclasses import dataclass

r = redis.Redis(host='localhost', port=6379, decode_responses=True)

ALPHA_COLD = 0.7
ALPHA_POWER = 0.3
DEBOUNCE_MS = 200
TOP_K = 10

@dataclass
class Suggestion:
    query: str
    score: float

def get_suggestions(prefix: str, user_id: Optional[int] = None, limit: int = 10) -> List[Suggestion]:
    """Return ranked suggestions for prefix, blending global and personal scores."""
    prefix = prefix.lower().strip()
    cache_key = f"typeahead:{prefix}"

    # Try Redis cache first
    raw = r.zrevrange(cache_key, 0, limit - 1, withscores=True)

    if not raw:
        # Fall back to trie lookup (in-process)
        raw = trie_lookup(prefix, limit)

    global_suggestions = [Suggestion(query=q, score=s) for q, s in raw]

    if user_id is None:
        return global_suggestions[:limit]

    # Blend with personalization
    personal_scores = get_personal_scores(user_id, [s.query for s in global_suggestions])
    personal_count = sum(1 for v in personal_scores.values() if v > 0)
    alpha = ALPHA_COLD if personal_count  dict:
    """Fetch per-user query scores from Redis hash."""
    if not queries:
        return {}
    user_key = f"personal:{user_id}"
    values = r.hmget(user_key, queries)
    return {q: float(v) if v else 0.0 for q, v in zip(queries, values)}

def update_frequency(query_text: str, user_id: Optional[int] = None) -> None:
    """Increment global and personal query frequency, update Redis prefix caches."""
    query_text = query_text.lower().strip()

    # Async DB increment (via queue in production)
    db.execute(
        "INSERT INTO QueryFrequency(query_text, count, last_seen_at)"
        " VALUES(%s, 1, NOW())"
        " ON CONFLICT (query_text) DO UPDATE"
        "   SET count = QueryFrequency.count + 1, last_seen_at = NOW()",
        (query_text,)
    )

    if user_id:
        db.execute(
            "INSERT INTO PersonalizedQuery(user_id, query_text, count, last_used_at)"
            " VALUES(%s, %s, 1, NOW())"
            " ON CONFLICT (user_id, query_text) DO UPDATE"
            "   SET count = PersonalizedQuery.count + 1, last_used_at = NOW()",
            (user_id, query_text)
        )
        r.hincrbyfloat(f"personal:{user_id}", query_text, 1.0)

    # Update Redis prefix caches for all prefixes of this query
    pipe = r.pipeline(transaction=False)
    for i in range(1, len(query_text) + 1):
        prefix = query_text[:i]
        pipe.zincrby(f"typeahead:{prefix}", 1.0, query_text)
    pipe.execute()

def rebuild_trie() -> None:
    """Nightly job: recompute scores and rebuild trie from QueryFrequency table."""
    import time
    now = time.time()
    LAMBDA = 0.01  # decay rate per day
    rows = db.execute(
        "SELECT query_text, count, last_seen_at FROM QueryFrequency ORDER BY count DESC"
    ).fetchall()

    trie = {}
    for query_text, count, last_seen in rows:
        age_days = (now - last_seen.timestamp()) / 86400
        recency = math.exp(-LAMBDA * age_days)
        score = count * recency
        # Insert into trie structure (simplified)
        for i in range(1, len(query_text) + 1):
            prefix = query_text[:i]
            if prefix not in trie:
                trie[prefix] = []
            trie[prefix].append((score, query_text))

    # Keep only top-K per prefix and sort
    for prefix in trie:
        trie[prefix].sort(reverse=True)
        trie[prefix] = trie[prefix][:TOP_K]

    # Pre-warm Redis
    pipe = r.pipeline(transaction=False)
    for prefix, candidates in trie.items():
        key = f"typeahead:{prefix}"
        pipe.delete(key)
        for score, query in candidates:
            pipe.zadd(key, {query: score})
    pipe.execute()

def trie_lookup(prefix: str, limit: int) -> list:
    """Fallback: query TypeaheadCache table if Redis miss."""
    row = db.execute(
        "SELECT suggestions FROM TypeaheadCache WHERE prefix=%s", (prefix,)
    ).fetchone()
    if not row:
        return []
    import json
    suggestions = json.loads(row[0])
    return [(s["query"], s["score"]) for s in suggestions[:limit]]

Key Design Decisions Summary

  • Top-K pre-computation at each trie node reduces lookup to O(prefix_length) instead of O(subtree_size).
  • Redis sorted set caching for hot prefixes keeps p99 under 5ms on the server side.
  • 200ms client-side debounce eliminates the majority of mid-keystroke requests without UX impact.
  • Personalization blending uses alpha weighting to gracefully handle cold users with no history.
  • Nightly trie rebuild keeps global rankings fresh while real-time ZINCRBY handles trending queries intra-day.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale

Scroll to Top