Typeahead Service Low-Level Design: Prefix Matching, Debounce, and Distributed Index

Typeahead Service System Design Overview

A typeahead service extends basic search suggest by adding session context awareness, distributed prefix indexing across multiple data domains, and intelligent debounce batching to reduce server load. Where a simple suggest system returns global completions, a typeahead service learns from the current session and nearby entity context to surface the most relevant completions first.

Requirements

Functional Requirements

  • Return ranked completions for user, hashtag, location, and query entity types.
  • Boost results from entities the user has recently interacted with in the current session.
  • Batch client-side keystrokes with debounce to reduce unnecessary server round trips.
  • Support wildcard prefix matching for partial-word queries.
  • Allow product teams to register new entity type indexes without service redeployment.

Non-Functional Requirements

  • P99 latency under 80ms from server receipt to response.
  • Support 100,000 concurrent active typeahead sessions.
  • Index update propagation under 30 seconds for new entities.
  • Horizontal scalability with no shared mutable state between service instances.

Data Model

  • prefix_index: Redis sorted set per (entity_type, locale, prefix) storing top-100 entity_ids by base_score.
  • entity_metadata: Redis hash per entity_id: display_name, entity_type, avatar_url, follower_count, base_score.
  • session_context: Redis hash per session_id: recently viewed entity_ids with interaction timestamps, TTL 30 minutes.
  • index_registry: Postgres table (entity_type, shard_key, index_version, last_rebuilt_at) for operational visibility.

Core Algorithms

Distributed Prefix Index

Each entity type maintains its own prefix index sharded by the first character of the prefix. This spreads load across 26 shards per entity type and keeps each sorted set small enough for sub-millisecond ZREVRANGE lookups. Index writes are funneled through a Kafka topic per entity type. A pool of index writer workers consumes events, computes all prefixes for the entity name (all substrings from position 0 up to the full name, capped at 20 characters), and issues ZADD commands to the appropriate Redis shard. Writes are idempotent: re-indexing an entity with an updated score simply overwrites the existing ZADD entry.

Debounce Batching

The client SDK implements a 150ms debounce window. Keystrokes within 150ms of each other are collapsed into a single request carrying the latest prefix. The server also performs server-side coalescing: if two requests from the same session_id arrive within 50ms with overlapping prefixes, the earlier request is dropped and only the longer prefix is processed. This is implemented via a Redis key (session_id:inflight) set to the current prefix with a 50ms TTL; if the key already exists with the same or shorter prefix when a new request arrives, the old response future is cancelled.

Session-Context Boosting

Before returning results, the service reads the sessions recently interacted entity_ids from Redis. For each candidate entity in the result set, an affinity boost of up to 0.5 is added to the base_score if the entity appears in session context, scaled by recency (linear decay over 30 minutes). This ensures that if a user has just visited a profile, that profile appears at the top of typeahead when they begin typing its name again.

Cache Warming

Cold starts after index rebuilds cause latency spikes on the first request for each prefix. A warming job runs immediately after each index rebuild and issues synthetic requests for the top 5,000 prefixes per entity type, populating the local in-process LRU cache (512MB per service instance) with freshly fetched result sets. The LRU cache uses a 10-second TTL to stay consistent with index updates while absorbing repeated identical prefix lookups within the debounce window.

Scalability

Service instances are stateless; session context is stored exclusively in Redis. Horizontal scaling is triggered by CPU utilization above 70%, with a minimum of 10 instances and a maximum of 200. The Redis prefix index cluster uses 64 shards with replication factor 2. A circuit breaker per entity type trips after 5 consecutive timeouts, returning empty results for that type while continuing to serve other entity types, degrading gracefully rather than failing entirely.

API Design

GET /v1/typeahead

  • Query params: q (prefix), types (comma-separated entity types, default all), session_id, locale, limit (default 8 per type)
  • Response: grouped result object: {users: […], hashtags: […], locations: […], queries: […]}

POST /v1/typeahead/context

  • Body: session_id, entity_id, entity_type, interaction_type (view, click, select)
  • Response: 204 No Content
  • Called by client on every entity interaction to update session context.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale

Scroll to Top