Question 1

Why is a database LIKE query insufficient for real-time autocomplete?

Accepted Answer

SELECT * FROM Product WHERE name LIKE 'app%' ORDER BY popularity DESC LIMIT 8. This query does use a B-tree index for prefix scans, but it has two problems at scale: (1) Latency — even with an index, a 10M-row table scan for "a%" returns millions of candidates before applying LIMIT. At 1,000 autocomplete requests/second, the DB is overwhelmed. (2) Flexibility — LIKE '%app%' (infix match for "apple" when typing "pple") requires a full table scan (no index). Redis sorted sets solve both: O(log N) lookup regardless of dataset size, sub-millisecond response, and the prefix set pre-limits candidates to the top 20 by score. The database is only queried to enrich the top results with current metadata.

Question 2

How do you keep autocomplete suggestions fresh as new content is created?

Accepted Answer

Two update paths: (1) Real-time: when a new product, article, or user is created, immediately index it into Redis sorted sets. The background write takes ~5ms and does not block the creation flow. Set the initial score to a default (e.g., new product score = median popularity). (2) Batch score updates: nightly, query SearchEvent for the most popular queries and update Redis scores. High-frequency queries float to the top of each prefix set; stale queries sink and eventually get evicted by zremrangebyrank when the set exceeds the cap (top 20). (3) Deletion: when content is removed, delete all its prefix entries. This is O(max_prefix_length) Redis writes — acceptable for occasional deletes.

Question 3

How do you add fuzzy matching to catch typos like "iphone" typed as "iphoen"?

Accepted Answer

The Redis sorted set approach only handles exact prefixes — "iphoen" will not match "iphone" entries. Two strategies: (1) Client-side deferral — after 300ms with no exact match, fall back to Elasticsearch's completion suggester with fuzziness:1 (one-character edit distance). ES handles ~50-200ms queries. This two-tier approach keeps common queries fast (Redis) while covering typos (ES). (2) Phonetic indexing — compute a phonetic code (Metaphone, Soundex) for each word and index by phonetic prefix in addition to literal prefix. "iphoen" and "iphone" share the same phonetic code and match the same prefix entries. Complex to implement; use ES fuzzy as the pragmatic solution.

Question 4

How do you personalize suggestions based on a user's search history?

Accepted Answer

Global suggestions use popularity scores. Personalized suggestions blend global with personal: at query time, fetch both global suggestions (from Redis) and user-specific suggestions (from a user-scoped sorted set: suggest:{user_id}:{prefix}). Merge the two lists: personal_score = 0.7 * global_score + 0.3 * user_history_boost. Surface user-boosted entries first, then global entries not already shown. The user-specific sorted sets are populated by recording each completed search: when a user submits query "running shoes", increment their personal score for all prefixes of that query. Personal sets need a shorter TTL (7 days) since user interests evolve. Limit personal set size to top 50 per prefix. This adds one additional Redis lookup per keystroke — typically 0.5ms, acceptable.

Question 5

How much memory does a Redis autocomplete index require?

Accepted Answer

Estimate: assume 1 million unique suggestions, average 20 characters each, average 10 prefix lengths per suggestion (minimum 2 chars up to ~12 chars). Each sorted set entry: member string (avg 20 bytes) + score (8 bytes) + Redis overhead (~70 bytes) ≈ 100 bytes. 1M suggestions × 10 prefixes × 100 bytes = 1GB Redis memory. In practice: cap each prefix set at 20 entries (zremrangebyrank) to bound memory — this reduces actual memory significantly since most prefixes share the same top-20 entries. Real-world systems at this scale use ~200-500MB for the suggestion index. A dedicated Redis instance with 1-2GB RAM is sufficient. Compress by lowercasing and deduplicating the member strings before storing.

Search Suggestion (Autocomplete) Low-Level Design: Trie, Redis, and Elasticsearch

Core Data Model (Prefix Table Approach)

Redis Sorted Set Approach (Production Standard)

Elasticsearch Completion Suggester

Updating Scores from Search Analytics

Key Interview Points