Problem Statement and Requirements
A pub/sub service enables one-to-many message delivery: publishers send messages to topics without knowing who will receive them, and subscribers receive messages matching their subscriptions. Unlike a message broker focused on point-to-point or consumer-group delivery, pub/sub emphasizes fanout — a single published message may be delivered to thousands of independent subscribers. The key design challenges are efficient fanout, attribute-based filtering to avoid unnecessary delivery, and reliable dead letter handling for messages that cannot be processed.
Functional Requirements
- Publishers send messages to named topics; messages include a body and key-value attributes
- Subscribers create subscriptions on topics; subscriptions support push (HTTP webhook) and pull delivery modes
- Attribute-based filtering: subscriptions specify filter expressions; only matching messages are delivered
- At-least-once delivery with configurable retry policy (max attempts, backoff schedule)
- Dead letter topics: messages that exceed retry attempts are forwarded to a configured dead letter subscription
- Message ordering: optional per-publisher ordering key guarantees ordered delivery within a key
Non-Functional Requirements
- Fanout: a single publish to a topic with 10,000 subscriptions must complete delivery initiation within 1 second
- End-to-end latency (publish to push delivery): p99 < 500 ms under normal conditions
- Message retention: undelivered messages retained for up to 7 days
- Throughput: 1 million publishes/sec globally with horizontal scaling
Core Data Model
Topics and Subscriptions
topics (
topic_id TEXT PRIMARY KEY,
project_id TEXT NOT NULL,
message_retention_days INTEGER DEFAULT 7,
schema_id TEXT, -- optional schema enforcement
created_at TIMESTAMPTZ NOT NULL
)
subscriptions (
subscription_id TEXT PRIMARY KEY,
topic_id TEXT NOT NULL REFERENCES topics(topic_id),
delivery_type ENUM('push','pull'),
push_endpoint TEXT, -- HTTPS URL for push delivery
ack_deadline_sec INTEGER DEFAULT 10,
filter_expr TEXT, -- attribute filter expression
ordering_key BOOLEAN DEFAULT FALSE,
dead_letter_topic TEXT,
max_delivery_attempts INTEGER DEFAULT 5,
retry_policy JSONB, -- {min_backoff, max_backoff}
created_at TIMESTAMPTZ NOT NULL
)
Message Storage
messages (
message_id TEXT PRIMARY KEY, -- globally unique ID assigned by broker
topic_id TEXT NOT NULL,
data BYTEA NOT NULL, -- base64-encoded body
attributes JSONB NOT NULL DEFAULT '{}',
ordering_key TEXT,
publish_time TIMESTAMPTZ NOT NULL,
expires_at TIMESTAMPTZ NOT NULL
)
delivery_attempts (
attempt_id BIGSERIAL PRIMARY KEY,
message_id TEXT NOT NULL,
subscription_id TEXT NOT NULL,
attempt_number INTEGER NOT NULL,
status ENUM('pending','delivering','acked','nacked','dead_lettered'),
next_delivery_at TIMESTAMPTZ NOT NULL,
ack_id TEXT UNIQUE, -- opaque token returned to subscriber for ack
delivered_at TIMESTAMPTZ,
acked_at TIMESTAMPTZ
)
Key Algorithms and Logic
Fanout on Publish
When a message is published, the broker must create a delivery record for every matching subscription. For topics with thousands of subscriptions, synchronous fanout in the publish path would be too slow. The broker uses a two-stage approach: the publish API writes the message to durable storage and enqueues a fanout task, returning a message ID to the publisher immediately (acknowledgment of receipt, not delivery). A fanout worker reads the topic's subscription list, evaluates filter expressions, and writes delivery_attempts rows in batches. For topics with >100 subscriptions, fanout workers run in parallel across a worker pool.
Attribute Filter Evaluation
Filter expressions match against message attributes using a restricted expression language: exact match (attributes.env = "prod"), existence (hasAttribute("priority")), and prefix match (attributes.topic_prefix: "payments/"). Filters are parsed at subscription creation time into an AST and stored. At fanout time, the compiled filter is evaluated in-memory against the message's attributes map. Invalid filters are rejected at subscription creation with a 400 error. Filters that match all messages (empty filter) are represented as a null AST for fast-path evaluation.
Push Delivery with Retry and Backoff
For push subscriptions, a delivery worker picks up pending delivery_attempts where next_delivery_at <= now() and issues an HTTPS POST to the configured endpoint with the message payload. A 2xx response triggers an ack (update status to acked). Any non-2xx or timeout triggers a nack: attempt_number increments and next_delivery_at is set to now() + min(max_backoff, min_backoff * 2^attempt_number) (exponential backoff with jitter). After max_delivery_attempts, the message is written to the dead letter topic (as a new publish) and the delivery_attempt is marked dead_lettered.
Pull Delivery and Ack Deadline
Pull subscribers call a Pull API to receive a batch of messages. Each message comes with an ack_id and the subscriber has ack_deadline_sec seconds to acknowledge it. If the deadline expires without an ack, the message becomes available for re-delivery (delivered again to any puller). Subscribers that need more processing time call ModifyAckDeadline to extend it. The ack deadline mechanism is implemented via the next_delivery_at field: setting it to now() + ack_deadline_sec on delivery, and re-queuing if it passes without an ack.
Ordered Delivery
When a subscription has ordering enabled, messages with the same ordering_key are delivered in publish order. The broker assigns messages with the same ordering key to the same delivery shard (consistent hashing on the key). Within a shard, messages are delivered strictly sequentially — the next message for a key is not delivered until the prior one is acked. This reduces throughput but guarantees ordering. A nack on an ordered message pauses delivery for that key until the message is acked or dead-lettered.
API Design
POST /v1/projects/{project}/topics/{topic}/publish: publish one or more messages; returns array of message IDsPOST /v1/projects/{project}/subscriptions/{sub}/pull?maxMessages=N: pull up to N pending messages; returns messages with ack IDsPOST /v1/projects/{project}/subscriptions/{sub}/acknowledge: ack a list of ack IDsPOST /v1/projects/{project}/subscriptions/{sub}/modifyAckDeadline: extend deadline for ack IDsPUT /v1/projects/{project}/subscriptions/{sub}: create or update subscription with filter, delivery config, dead letter configGET /v1/projects/{project}/subscriptions/{sub}/snapshot: get current delivery position for time-travel replayPOST /v1/projects/{project}/subscriptions/{sub}/seek: replay from a snapshot or timestamp
Scalability Considerations
Fanout Worker Scaling
Fanout workers are stateless and scale horizontally. Topic-to-worker assignment uses consistent hashing so that all publishes to a topic go to the same worker pool, enabling local caching of the subscription list. The subscription list is cached in memory with a short TTL (30 seconds) and invalidated on subscription create/delete events via a control plane notification. For topics with 10,000+ subscriptions, a dedicated high-fanout tier of workers is used with larger batch sizes for delivery_attempts inserts.
Dead Letter Handling Operations
The dead letter topic is a regular topic, allowing operators to inspect, replay, or discard failed messages using the same APIs. A common operational pattern is a dead letter monitor subscription that alerts on new dead letter messages and a remediation workflow that fixes the consumer bug, then re-publishes dead letter messages to the original topic via a seek-based replay or manual re-publish script.
Interview Discussion Points
- How do you prevent a slow push subscriber from blocking fanout for other subscriptions on the same topic?
- What are the trade-offs between push and pull delivery? (Push is lower latency; pull allows consumer-controlled rate)
- How do you implement exactly-once delivery semantics in a pub/sub system?
- How would you handle a subscriber endpoint that returns 200 but does not actually process the message (silent discard)?
- How do attribute-based filters interact with message ordering — can filters cause out-of-order delivery for a subscriber?
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering