The Core Requirement
Autocomplete must return the top-K relevant completions for a given prefix within 50ms p99 from the moment the user stops typing. This latency requirement is strict — any slower and the suggestions feel laggy and distracting rather than helpful. The architecture must prioritize read performance above all else.
Trie-Based Prefix Index
A trie (prefix tree) is the conceptual model for autocomplete. Each node represents one character. A path from root to a node represents a prefix. Each node stores the top-K completions for that prefix along with their scores:
root → "i" → "ip" → "iph" → "ipho" → "iphon" → "iphone"
top-K: ["iphone 13", "iphone 14 pro", ...]
To retrieve suggestions for prefix “iph”, look up the “iph” node and return its pre-computed top-K list. Lookup is O(length of prefix) — extremely fast. The tradeoff: every node must maintain its own top-K list, and updating a suggestion's score requires updating all prefix nodes from root to the suggestion's full length.
Redis Sorted Set Implementation
In production, tries are often implemented using Redis sorted sets rather than an in-process data structure, for persistence, replication, and horizontal scaling:
Key: prefix:{prefix_string}
Value: suggestion string
Score: popularity score (higher = better)
For suggestion “iphone 13” with score 95000, index all prefix keys:
ZADD prefix:i 95000 "iphone 13"
ZADD prefix:ip 95000 "iphone 13"
ZADD prefix:iph 95000 "iphone 13"
...
ZADD prefix:iphone_13 95000 "iphone 13"
Query: ZREVRANGE prefix:iph 0 4 returns top-5 suggestions for “iph” in O(log N + K) time.
Memory cost: each suggestion is stored in L prefix sets, where L = length of the suggestion. For a suggestion of average length 15 characters, this is 15 Redis entries per suggestion. Acceptable for a corpus of millions of suggestions.
Scoring Model
The score assigned to each suggestion determines its ranking. Components:
- Query frequency: how often users have searched for this exact query. Primary signal.
- Click-through rate: how often users clicked a result after searching. High CTR = high intent signal.
- Recency boost: trending queries receive a temporary score multiplier. Decays over time.
- Composite score:
score = base_freq * ctr_multiplier * recency_boost
Scores are recomputed periodically (hourly or daily) from query log aggregations and written back to Redis in bulk. For trending queries, an incremental update pipeline (Kafka → Flink → Redis) pushes score updates within minutes.
Top-K Maintenance
When a suggestion's score changes, all its prefix keys in Redis must be updated. For a corpus of millions of suggestions updated hourly, this is a significant write load. Mitigations:
- Batch score updates: pipeline all ZADD commands for a suggestion's prefixes in a single Redis pipeline call.
- Only update when score changes significantly (e.g., >5% change) to reduce write volume.
- For each prefix key, cap the sorted set at top-K * 2 entries (ZREMRANGEBYRANK to trim). Prevents unbounded memory growth while keeping enough candidates for personalization re-ranking.
Personalization Layer
Global rankings work well for most users but miss personal context. A user who frequently searches for Python documentation should see Python-related completions ranked above unrelated popular results. Personalization blends two signals:
final_score = 0.7 * global_score + 0.3 * personal_score
The personal score comes from the user's own search history, stored in a per-user Redis sorted set of recent queries. At query time, fetch both global top-K (from prefix:{prefix}) and user's recent matching queries (ZSCAN user_history:{user_id} matching prefix), then re-rank the combined set using the blending formula before returning top-K.
Personal score decays with time — queries from last week score higher than queries from last year. Use a time-weighted score: personal_score = base * e^(-λ * days_ago).
Latency Architecture
To hit <50ms p99:
- L1: In-process LRU cache. Each autocomplete service instance maintains an in-memory LRU cache of hot prefixes (top 10K by query volume). A cache hit returns results with ~1ms latency. Hit rate for popular prefixes (2–3 characters) is very high.
- L2: Redis. For L1 misses, query Redis. Redis latency is 1–3ms on a local network. This handles the long tail of prefixes not in L1 cache.
- Client-side debouncing. The client fires the autocomplete request only after 150ms of no typing, reducing request volume by 3–5x and filtering out mid-keystroke intermediate states.
Prefix Tree Sharding
A single Redis instance cannot hold all prefix keys for a large corpus. Shard by the first 2 characters of the prefix: all keys starting with “ip” live on shard 1, “iq” on shard 2, etc. This gives ~676 natural shards (26 * 26), which can be distributed across a Redis cluster. Query routing is deterministic — a request for prefix “iphone” always goes to the shard for “ip”.
Handling Special Characters and Normalization
User input must be normalized before prefix lookup:
- Lowercase all input: “IPhone” → “iphone”
- Strip leading/trailing whitespace
- Normalize accented characters: “café” → “cafe” (or maintain both forms)
- Remove control characters and special punctuation
Normalization must be applied identically at index time (when building prefix keys) and query time (when looking up prefixes), or queries will miss indexed suggestions.
Spell Correction Integration
When a prefix returns zero results (e.g., user typed “iphne”), fall back to a spell correction layer. Generate candidate corrections (edit distance 1–2) and look up each in the prefix index. Return the suggestions from the highest-scoring correction. Limit spell correction to prefixes above a minimum length (e.g., >4 characters) to avoid false corrections on short prefixes.
Summary
- Redis sorted sets keyed by prefix string; score = composite popularity metric.
- Each suggestion indexed at all prefix lengths (O(L) keys per suggestion).
- Scoring: query frequency * CTR multiplier * recency decay.
- Personalization: blend 70% global + 30% user history score, re-rank at query time.
- L1 in-process LRU cache for hot prefixes; L2 Redis for long tail.
- Client-side 150ms debounce to reduce request volume.
- Shard by first 2 characters of prefix across Redis cluster.
- Normalize input identically at index and query time; spell correct on zero-result prefixes.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does a Redis sorted set implement prefix-based autocomplete?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each suggestion is stored as a member in a sorted set with its popularity score; prefix lookup is performed via ZRANGEBYLEX on a lexicographically ordered set where all members are stored with a score of 0. By scanning the range [prefix, prefixxefxbfxbd] (using the Unicode max codepoint as a sentinel) the engine retrieves all members sharing the prefix in O(log N + M) time.”
}
},
{
“@type”: “Question”,
“name”: “How are autocomplete suggestions scored and ranked?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Suggestions are scored using a combination of global query frequency, recency-decayed click-through rate, and optional personalization signals, then blended into a single float score stored as the sorted set score. Time-decay is commonly applied as score = raw_count * e^(-λ * days_since_last_seen) to demote stale queries.”
}
},
{
“@type”: “Question”,
“name”: “How is personalization blended with global popularity in autocomplete?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A two-stage retrieval merges a global candidate list (top-k by popularity) with a personalized candidate list (top-k from the user's recent query history or interest embeddings), then re-ranks the union using a linear combination of global score and personalization score weighted by a tunable alpha. The blending weight is often learned offline per user segment to balance exploration of popular results with exploitation of personal preferences.”
}
},
{
“@type”: “Question”,
“name”: “How is the autocomplete service scaled to handle millions of prefixes?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The prefix index is sharded by consistent hashing on the first N characters of the prefix so that requests for a given prefix always route to the same shard, enabling hot-key isolation and targeted cache warming. Each shard maintains an in-memory sorted set (Redis or a custom trie) behind a read-through cache, with background jobs periodically refreshing scores from the batch-computed popularity pipeline.”
}
}
]
}
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering