Typeahead Autocomplete System: Overview
A typeahead (autocomplete) system returns ranked query completions as the user types, with latency under 50ms at p99. Low-level design covers the trie data structure, prefix scoring, Redis sorted set caching, personalization, and the offline/online pipeline that keeps suggestions fresh.
Trie Data Structure
A trie (prefix tree) is the canonical data structure for prefix matching. Each node represents one character. Traversing from root along the characters of a prefix reaches the node representing that prefix, from which all completions are descendants.
Naive trie traversal requires visiting all descendant nodes to collect completions. The optimization: each node stores a pre-computed list of the top-K completions (by score) for its prefix. This makes lookup O(prefix_length) — traverse to the prefix node and return the cached list.
Memory tradeoff: storing top-K at every node multiplies storage by K. Typical values: K=10, which is acceptable for tries with millions of nodes if compressed (Patricia trie / radix tree) and kept in memory.
Scoring Function
Each completion candidate has a composite score:
score = query_frequency * recency_weight * personalization_boost
- query_frequency: raw count of times this query was submitted globally, normalized to [0, 1].
- recency_weight: decay factor — queries submitted more recently score higher. Use exponential decay:
recency = e^(-lambda * days_since_last_seen). - personalization_boost: multiplier based on user history. If the user has submitted or clicked this query before, boost by 1.2-2.0x. Cold users get boost = 1.0 (global ranking).
Redis Sorted Set for Prefix Cache
For ultra-low latency, common prefixes (2-5 characters covering 80% of traffic) are pre-cached in Redis sorted sets:
- Key:
typeahead:{prefix} - Member: query string (e.g., “python tutorial”)
- Score: composite score (higher = better)
Fetch top-K: ZREVRANGE typeahead:{prefix} 0 {K-1} WITHSCORES
Cache is rebuilt nightly from the trie. Real-time score increments update both the trie and the Redis cache: ZINCRBY typeahead:{prefix} {delta} {query}.
Trie Rebuild Pipeline
An offline batch job runs nightly:
- Read QueryFrequency table — all queries with counts and last_seen timestamps.
- Compute composite scores for all candidates.
- Build the in-memory trie, populating top-K at each node via a bottom-up sweep.
- Serialize the trie to a binary format and upload to object storage.
- Serving nodes download the new trie at startup or via hot-reload signal.
- Pre-warm Redis sorted sets for the top 10,000 prefixes.
Real-Time Score Updates
When a user submits a query:
- Increment
QueryFrequency.countfor the submitted query (via a write-behind queue to avoid hot-row contention). - For each prefix of the query (length 1 through query_length):
ZINCRBY typeahead:{prefix} 1 {query}in Redis. - If a query rises above the top-K threshold, the next nightly rebuild will reflect it in the trie.
Debounce on the client side: do not send a typeahead request until the user has paused for 200ms. This reduces request volume by ~60% on fast typists without noticeable UX degradation.
Personalization Layer
Global scores are blended with user-specific history at serve time:
final_score = alpha * global_score + (1 - alpha) * personal_score
Where alpha = 0.7 for cold users (sparse personal history) and decreases toward 0.3 for power users with rich history. Personal scores are fetched from a user-level Redis hash at serve time and blended in-process before returning results.
Latency Budget
Target: <50ms p99 end-to-end:
- Redis cache hit: ~1-2ms network + lookup
- Trie lookup (cache miss): ~5-10ms in-process
- Personalization blend: ~1ms
- Network (client to edge): ~10-30ms (CDN edge nodes reduce this)
Deploy typeahead servers at CDN edge locations with regionally replicated Redis to keep network latency under 15ms for 95% of users.
SQL Schema
-- Global query frequency table
CREATE TABLE QueryFrequency (
query_text TEXT PRIMARY KEY,
count BIGINT NOT NULL DEFAULT 1,
last_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_queryfreq_count ON QueryFrequency(count DESC);
-- Per-user personalized query history
CREATE TABLE PersonalizedQuery (
user_id BIGINT NOT NULL,
query_text TEXT NOT NULL,
count INT NOT NULL DEFAULT 1,
last_used_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (user_id, query_text)
);
CREATE INDEX idx_personalquery_user ON PersonalizedQuery(user_id, count DESC);
-- Materialized cache for serving (populated by offline job)
CREATE TABLE TypeaheadCache (
prefix VARCHAR(32) PRIMARY KEY,
suggestions JSONB NOT NULL, -- [{query, score}, ...]
cached_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Python Implementation
import redis
import math
from typing import List, Optional
from dataclasses import dataclass
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
ALPHA_COLD = 0.7
ALPHA_POWER = 0.3
DEBOUNCE_MS = 200
TOP_K = 10
@dataclass
class Suggestion:
query: str
score: float
def get_suggestions(prefix: str, user_id: Optional[int] = None, limit: int = 10) -> List[Suggestion]:
"""Return ranked suggestions for prefix, blending global and personal scores."""
prefix = prefix.lower().strip()
cache_key = f"typeahead:{prefix}"
# Try Redis cache first
raw = r.zrevrange(cache_key, 0, limit - 1, withscores=True)
if not raw:
# Fall back to trie lookup (in-process)
raw = trie_lookup(prefix, limit)
global_suggestions = [Suggestion(query=q, score=s) for q, s in raw]
if user_id is None:
return global_suggestions[:limit]
# Blend with personalization
personal_scores = get_personal_scores(user_id, [s.query for s in global_suggestions])
personal_count = sum(1 for v in personal_scores.values() if v > 0)
alpha = ALPHA_COLD if personal_count dict:
"""Fetch per-user query scores from Redis hash."""
if not queries:
return {}
user_key = f"personal:{user_id}"
values = r.hmget(user_key, queries)
return {q: float(v) if v else 0.0 for q, v in zip(queries, values)}
def update_frequency(query_text: str, user_id: Optional[int] = None) -> None:
"""Increment global and personal query frequency, update Redis prefix caches."""
query_text = query_text.lower().strip()
# Async DB increment (via queue in production)
db.execute(
"INSERT INTO QueryFrequency(query_text, count, last_seen_at)"
" VALUES(%s, 1, NOW())"
" ON CONFLICT (query_text) DO UPDATE"
" SET count = QueryFrequency.count + 1, last_seen_at = NOW()",
(query_text,)
)
if user_id:
db.execute(
"INSERT INTO PersonalizedQuery(user_id, query_text, count, last_used_at)"
" VALUES(%s, %s, 1, NOW())"
" ON CONFLICT (user_id, query_text) DO UPDATE"
" SET count = PersonalizedQuery.count + 1, last_used_at = NOW()",
(user_id, query_text)
)
r.hincrbyfloat(f"personal:{user_id}", query_text, 1.0)
# Update Redis prefix caches for all prefixes of this query
pipe = r.pipeline(transaction=False)
for i in range(1, len(query_text) + 1):
prefix = query_text[:i]
pipe.zincrby(f"typeahead:{prefix}", 1.0, query_text)
pipe.execute()
def rebuild_trie() -> None:
"""Nightly job: recompute scores and rebuild trie from QueryFrequency table."""
import time
now = time.time()
LAMBDA = 0.01 # decay rate per day
rows = db.execute(
"SELECT query_text, count, last_seen_at FROM QueryFrequency ORDER BY count DESC"
).fetchall()
trie = {}
for query_text, count, last_seen in rows:
age_days = (now - last_seen.timestamp()) / 86400
recency = math.exp(-LAMBDA * age_days)
score = count * recency
# Insert into trie structure (simplified)
for i in range(1, len(query_text) + 1):
prefix = query_text[:i]
if prefix not in trie:
trie[prefix] = []
trie[prefix].append((score, query_text))
# Keep only top-K per prefix and sort
for prefix in trie:
trie[prefix].sort(reverse=True)
trie[prefix] = trie[prefix][:TOP_K]
# Pre-warm Redis
pipe = r.pipeline(transaction=False)
for prefix, candidates in trie.items():
key = f"typeahead:{prefix}"
pipe.delete(key)
for score, query in candidates:
pipe.zadd(key, {query: score})
pipe.execute()
def trie_lookup(prefix: str, limit: int) -> list:
"""Fallback: query TypeaheadCache table if Redis miss."""
row = db.execute(
"SELECT suggestions FROM TypeaheadCache WHERE prefix=%s", (prefix,)
).fetchone()
if not row:
return []
import json
suggestions = json.loads(row[0])
return [(s["query"], s["score"]) for s in suggestions[:limit]]
Key Design Decisions Summary
- Top-K pre-computation at each trie node reduces lookup to O(prefix_length) instead of O(subtree_size).
- Redis sorted set caching for hot prefixes keeps p99 under 5ms on the server side.
- 200ms client-side debounce eliminates the majority of mid-keystroke requests without UX impact.
- Personalization blending uses alpha weighting to gracefully handle cold users with no history.
- Nightly trie rebuild keeps global rankings fresh while real-time ZINCRBY handles trending queries intra-day.
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering