How do you optimize trie memory usage for a large typeahead system?

Use a compressed trie (Patricia trie or radix tree) that merges single-child nodes into one edge, reducing node count by 50-80% for typical query sets. Store only the top-K completions (K=5 to 10) at each node rather than all descendants. Serialize the trie using a compact binary format (e.g., protocol buffers) for serving nodes to load. For very large vocabularies, shard the trie by first character across multiple serving nodes.

How does a Redis sorted set store typeahead suggestions per prefix?

Each prefix maps to a Redis sorted set with key typeahead:{prefix}. Members are query strings (e.g., 'python tutorial') and scores are composite suggestion scores (higher is better). ZREVRANGE fetches the top-K in O(log N + K) time. Real-time updates use ZINCRBY to atomically increment a query's score across all its prefix keys. Common prefixes (covering most traffic) are pre-warmed at trie rebuild time.

How is personalization blended with global typeahead rankings?

At serve time, global scores are blended with user-specific scores using a weighted average: final = alpha * global + (1 - alpha) * personal. Alpha is higher (0.7) for cold users with sparse history, giving more weight to global rankings. As users accumulate query history, alpha decreases (toward 0.3), increasing the influence of personal scores. Personal scores are stored in a per-user Redis hash for sub-millisecond access.

How does a typeahead system handle misspellings?

Basic trie-based typeahead does not handle misspellings — it only matches exact prefixes. To handle typos, add a spell-correction layer upstream: detect when the prefix returns zero suggestions, apply edit distance candidate generation to find close valid prefixes, and re-query with the corrected prefix. Alternatively, store phonetic encodings (Soundex, Metaphone) of queries as secondary keys in the trie to catch phonetic misspellings.

How is a trie used for prefix matching in typeahead?

Each character of a prefix traverses one level of the trie; the node at the end of the prefix stores the top-K completions sorted by score; traversal is O(L) where L is the prefix length, independent of the number of completions.

How is the Redis sorted set structured for typeahead caching?

A sorted set keyed by prefix stores completions as members with their score as the sort key; ZREVRANGE returns the top-K completions in O(log N + K) time where N is the number of completions for that prefix.

How is personalization blended with global frequency scores?

The final score = alpha * global_score + (1 - alpha) * personal_score where alpha is tuned (e.g., 0.7); personal scores come from the user's own query history stored in PersonalizedQuery.

How does debouncing reduce typeahead server load?

The client waits 200ms after the last keystroke before sending a request; this eliminates intermediate-character requests and reduces server load by ~5x for typical typing speeds.

Typeahead Autocomplete Low-Level Design: Trie Structure, Prefix Scoring, and Real-Time Suggestions

⏱ 9 min read

Typeahead Autocomplete System: Overview

A typeahead (autocomplete) system returns ranked query completions as the user types, with latency under 50ms at p99. Low-level design covers the trie data structure, prefix scoring, Redis sorted set caching, personalization, and the offline/online pipeline that keeps suggestions fresh.

Trie Data Structure

A trie (prefix tree) is the canonical data structure for prefix matching. Each node represents one character. Traversing from root along the characters of a prefix reaches the node representing that prefix, from which all completions are descendants.

Naive trie traversal requires visiting all descendant nodes to collect completions. The optimization: each node stores a pre-computed list of the top-K completions (by score) for its prefix. This makes lookup O(prefix_length) — traverse to the prefix node and return the cached list.

Memory tradeoff: storing top-K at every node multiplies storage by K. Typical values: K=10, which is acceptable for tries with millions of nodes if compressed (Patricia trie / radix tree) and kept in memory.

Scoring Function

Each completion candidate has a composite score:

score = query_frequency * recency_weight * personalization_boost

query_frequency: raw count of times this query was submitted globally, normalized to [0, 1].
recency_weight: decay factor — queries submitted more recently score higher. Use exponential decay: recency = e^(-lambda * days_since_last_seen).
personalization_boost: multiplier based on user history. If the user has submitted or clicked this query before, boost by 1.2-2.0x. Cold users get boost = 1.0 (global ranking).

Redis Sorted Set for Prefix Cache

For ultra-low latency, common prefixes (2-5 characters covering 80% of traffic) are pre-cached in Redis sorted sets:

Key: typeahead:{prefix}
Member: query string (e.g., “python tutorial”)
Score: composite score (higher = better)

Fetch top-K: ZREVRANGE typeahead:{prefix} 0 {K-1} WITHSCORES

Cache is rebuilt nightly from the trie. Real-time score increments update both the trie and the Redis cache: ZINCRBY typeahead:{prefix} {delta} {query}.

Trie Rebuild Pipeline

An offline batch job runs nightly:

Read QueryFrequency table — all queries with counts and last_seen timestamps.
Compute composite scores for all candidates.
Build the in-memory trie, populating top-K at each node via a bottom-up sweep.
Serialize the trie to a binary format and upload to object storage.
Serving nodes download the new trie at startup or via hot-reload signal.
Pre-warm Redis sorted sets for the top 10,000 prefixes.

Real-Time Score Updates

When a user submits a query:

Increment QueryFrequency.count for the submitted query (via a write-behind queue to avoid hot-row contention).
For each prefix of the query (length 1 through query_length): ZINCRBY typeahead:{prefix} 1 {query} in Redis.
If a query rises above the top-K threshold, the next nightly rebuild will reflect it in the trie.

Debounce on the client side: do not send a typeahead request until the user has paused for 200ms. This reduces request volume by ~60% on fast typists without noticeable UX degradation.

Personalization Layer

Global scores are blended with user-specific history at serve time:

final_score = alpha * global_score + (1 - alpha) * personal_score

Where alpha = 0.7 for cold users (sparse personal history) and decreases toward 0.3 for power users with rich history. Personal scores are fetched from a user-level Redis hash at serve time and blended in-process before returning results.

Latency Budget

Target: <50ms p99 end-to-end:

Redis cache hit: ~1-2ms network + lookup
Trie lookup (cache miss): ~5-10ms in-process
Personalization blend: ~1ms
Network (client to edge): ~10-30ms (CDN edge nodes reduce this)

Deploy typeahead servers at CDN edge locations with regionally replicated Redis to keep network latency under 15ms for 95% of users.

SQL Schema

-- Global query frequency table
CREATE TABLE QueryFrequency (
    query_text    TEXT PRIMARY KEY,
    count         BIGINT NOT NULL DEFAULT 1,
    last_seen_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_queryfreq_count ON QueryFrequency(count DESC);

-- Per-user personalized query history
CREATE TABLE PersonalizedQuery (
    user_id       BIGINT NOT NULL,
    query_text    TEXT NOT NULL,
    count         INT NOT NULL DEFAULT 1,
    last_used_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (user_id, query_text)
);
CREATE INDEX idx_personalquery_user ON PersonalizedQuery(user_id, count DESC);

-- Materialized cache for serving (populated by offline job)
CREATE TABLE TypeaheadCache (
    prefix       VARCHAR(32) PRIMARY KEY,
    suggestions  JSONB NOT NULL,   -- [{query, score}, ...]
    cached_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Python Implementation

import redis
import math
from typing import List, Optional
from dataclasses import dataclass

r = redis.Redis(host='localhost', port=6379, decode_responses=True)

ALPHA_COLD = 0.7
ALPHA_POWER = 0.3
DEBOUNCE_MS = 200
TOP_K = 10

@dataclass
class Suggestion:
    query: str
    score: float

def get_suggestions(prefix: str, user_id: Optional[int] = None, limit: int = 10) -> List[Suggestion]:
    """Return ranked suggestions for prefix, blending global and personal scores."""
    prefix = prefix.lower().strip()
    cache_key = f"typeahead:{prefix}"

    # Try Redis cache first
    raw = r.zrevrange(cache_key, 0, limit - 1, withscores=True)

    if not raw:
        # Fall back to trie lookup (in-process)
        raw = trie_lookup(prefix, limit)

    global_suggestions = [Suggestion(query=q, score=s) for q, s in raw]

    if user_id is None:
        return global_suggestions[:limit]

    # Blend with personalization
    personal_scores = get_personal_scores(user_id, [s.query for s in global_suggestions])
    personal_count = sum(1 for v in personal_scores.values() if v > 0)
    alpha = ALPHA_COLD if personal_count  dict:
    """Fetch per-user query scores from Redis hash."""
    if not queries:
        return {}
    user_key = f"personal:{user_id}"
    values = r.hmget(user_key, queries)
    return {q: float(v) if v else 0.0 for q, v in zip(queries, values)}

def update_frequency(query_text: str, user_id: Optional[int] = None) -> None:
    """Increment global and personal query frequency, update Redis prefix caches."""
    query_text = query_text.lower().strip()

    # Async DB increment (via queue in production)
    db.execute(
        "INSERT INTO QueryFrequency(query_text, count, last_seen_at)"
        " VALUES(%s, 1, NOW())"
        " ON CONFLICT (query_text) DO UPDATE"
        "   SET count = QueryFrequency.count + 1, last_seen_at = NOW()",
        (query_text,)
    )

    if user_id:
        db.execute(
            "INSERT INTO PersonalizedQuery(user_id, query_text, count, last_used_at)"
            " VALUES(%s, %s, 1, NOW())"
            " ON CONFLICT (user_id, query_text) DO UPDATE"
            "   SET count = PersonalizedQuery.count + 1, last_used_at = NOW()",
            (user_id, query_text)
        )
        r.hincrbyfloat(f"personal:{user_id}", query_text, 1.0)

    # Update Redis prefix caches for all prefixes of this query
    pipe = r.pipeline(transaction=False)
    for i in range(1, len(query_text) + 1):
        prefix = query_text[:i]
        pipe.zincrby(f"typeahead:{prefix}", 1.0, query_text)
    pipe.execute()

def rebuild_trie() -> None:
    """Nightly job: recompute scores and rebuild trie from QueryFrequency table."""
    import time
    now = time.time()
    LAMBDA = 0.01  # decay rate per day
    rows = db.execute(
        "SELECT query_text, count, last_seen_at FROM QueryFrequency ORDER BY count DESC"
    ).fetchall()

    trie = {}
    for query_text, count, last_seen in rows:
        age_days = (now - last_seen.timestamp()) / 86400
        recency = math.exp(-LAMBDA * age_days)
        score = count * recency
        # Insert into trie structure (simplified)
        for i in range(1, len(query_text) + 1):
            prefix = query_text[:i]
            if prefix not in trie:
                trie[prefix] = []
            trie[prefix].append((score, query_text))

    # Keep only top-K per prefix and sort
    for prefix in trie:
        trie[prefix].sort(reverse=True)
        trie[prefix] = trie[prefix][:TOP_K]

    # Pre-warm Redis
    pipe = r.pipeline(transaction=False)
    for prefix, candidates in trie.items():
        key = f"typeahead:{prefix}"
        pipe.delete(key)
        for score, query in candidates:
            pipe.zadd(key, {query: score})
    pipe.execute()

def trie_lookup(prefix: str, limit: int) -> list:
    """Fallback: query TypeaheadCache table if Redis miss."""
    row = db.execute(
        "SELECT suggestions FROM TypeaheadCache WHERE prefix=%s", (prefix,)
    ).fetchone()
    if not row:
        return []
    import json
    suggestions = json.loads(row[0])
    return [(s["query"], s["score"]) for s in suggestions[:limit]]

Key Design Decisions Summary

Top-K pre-computation at each trie node reduces lookup to O(prefix_length) instead of O(subtree_size).
Redis sorted set caching for hot prefixes keeps p99 under 5ms on the server side.
200ms client-side debounce eliminates the majority of mid-keystroke requests without UX impact.
Personalization blending uses alpha weighting to gracefully handle cold users with no history.
Nightly trie rebuild keeps global rankings fresh while real-time ZINCRBY handles trending queries intra-day.