What Is a Fan-Out Service?
A Fan-Out Service takes a single event produced by one source and delivers it to many downstream consumers. Think of a social media post that must appear in thousands of followers' feeds, a config change that must propagate to hundreds of microservices, or a payment event that triggers emails, analytics, fraud checks, and ledger updates simultaneously. Fan-out decouples the producer from all consumers and ensures each gets its own copy of the event.
Data Model
Fan-out requires tracking both the source event and the per-consumer delivery state.
CREATE TABLE events (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
topic VARCHAR(128) NOT NULL,
payload JSON NOT NULL,
produced_at DATETIME DEFAULT NOW(),
INDEX idx_topic_produced (topic, produced_at)
);
CREATE TABLE subscriptions (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
topic VARCHAR(128) NOT NULL,
subscriber_id VARCHAR(128) NOT NULL,
endpoint VARCHAR(512) NOT NULL, -- HTTP URL, queue ARN, etc.
active BOOLEAN DEFAULT TRUE,
UNIQUE KEY uq_topic_sub (topic, subscriber_id)
);
CREATE TABLE deliveries (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
event_id BIGINT NOT NULL REFERENCES events(id),
subscriber_id VARCHAR(128) NOT NULL,
status ENUM('pending', 'in_flight', 'delivered', 'failed') DEFAULT 'pending',
attempts INT DEFAULT 0,
next_attempt DATETIME DEFAULT NOW(),
INDEX idx_delivery_pending (status, next_attempt)
);
One row in deliveries per (event, subscriber) pair gives fine-grained retry control: a failure to one subscriber never blocks delivery to others.
Core Workflow
Step 1 — Produce: The source service inserts a row into events and returns to the caller immediately.
Step 2 — Fan-out dispatch: A dispatcher process (or DB trigger / stream consumer) reads the new event, queries subscriptions for all active subscribers on that topic, and inserts one deliveries row per subscriber with status = pending.
Step 3 — Deliver: Worker threads poll deliveries for pending rows, lease them (same UPDATE/locked_until pattern as a job queue), and push the event payload to each subscriber's endpoint.
Step 4 — Acknowledge: On HTTP 200 (or equivalent ACK), set status = delivered. On failure, increment attempts and set next_attempt with exponential backoff.
Failure Handling
At-Least-Once Delivery
Each delivery row is independently retried. A crashed worker releases its lease via the timeout sweeper. Subscribers must be idempotent — the same event may arrive more than once. Include the event_id in the payload so subscribers can deduplicate.
Idempotency
Subscribers should use event_id + subscriber_id as a natural idempotency key. On receipt, check a local processed_events table before applying side effects. Return 200 on duplicate receipt so the fan-out service stops retrying.
Poison Messages
If a subscriber's endpoint is persistently down or returns errors, deliveries exhaust retries and enter status = failed. The fan-out service should alert operators and optionally pause that subscription to stop wasting resources, while continuing delivery to healthy subscribers.
Ordering Guarantees
Per-subscriber ordering is preserved if you process that subscriber's deliveries sequentially (single worker or partition key). Cross-subscriber ordering is not guaranteed and usually not required. If strict ordering matters, use a Kafka partition per subscriber.
Scalability Considerations
- Write fan-out vs. read fan-out — writing N delivery rows at publish time (write fan-out) is simple but expensive for high-subscriber topics. Read fan-out (subscribers query a shared event log using a cursor/offset) eliminates N writes but requires each subscriber to manage its own offset.
- Hybrid approach — use write fan-out for small subscriber counts and read fan-out (e.g., Kafka consumer groups) for topics with thousands of consumers.
- Broker delegation — for very high throughput, replace the deliveries table with SNS/SQS fan-out, Kafka topics, or Google Pub/Sub. The data model above becomes the durable audit layer, not the hot path.
- Back-pressure — slow subscribers must not block fast ones. Per-subscriber delivery queues with independent depth monitoring prevent one lagging consumer from affecting others.
- Batching — group multiple events into a single HTTP call to reduce per-request overhead when subscriber endpoints support it.
Summary
A Fan-Out Service decouples event producers from their consumers through a dispatch layer that creates independent, retryable delivery records per subscriber. The critical design choices are write fan-out vs. read fan-out, per-subscriber retry isolation, and whether to use a database-backed queue or a dedicated broker for the delivery hot path. In interviews, be ready to compare SNS+SQS fan-out (managed, push) against Kafka consumer groups (pull, offset-based) and explain when each is appropriate.
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering