What Is a Fan-Out Service?
A Fan-Out Service takes a single event produced by one source and delivers it to many downstream consumers. Think of a social media post that must appear in thousands of followers' feeds, a config change that must propagate to hundreds of microservices, or a payment event that triggers emails, analytics, fraud checks, and ledger updates simultaneously. Fan-out decouples the producer from all consumers and ensures each gets its own copy of the event.
Data Model
Fan-out requires tracking both the source event and the per-consumer delivery state.
CREATE TABLE events (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
topic VARCHAR(128) NOT NULL,
payload JSON NOT NULL,
produced_at DATETIME DEFAULT NOW(),
INDEX idx_topic_produced (topic, produced_at)
);
CREATE TABLE subscriptions (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
topic VARCHAR(128) NOT NULL,
subscriber_id VARCHAR(128) NOT NULL,
endpoint VARCHAR(512) NOT NULL, -- HTTP URL, queue ARN, etc.
active BOOLEAN DEFAULT TRUE,
UNIQUE KEY uq_topic_sub (topic, subscriber_id)
);
CREATE TABLE deliveries (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
event_id BIGINT NOT NULL REFERENCES events(id),
subscriber_id VARCHAR(128) NOT NULL,
status ENUM('pending', 'in_flight', 'delivered', 'failed') DEFAULT 'pending',
attempts INT DEFAULT 0,
next_attempt DATETIME DEFAULT NOW(),
INDEX idx_delivery_pending (status, next_attempt)
);
One row in deliveries per (event, subscriber) pair gives fine-grained retry control: a failure to one subscriber never blocks delivery to others.
Core Workflow
Step 1 — Produce: The source service inserts a row into events and returns to the caller immediately.
Step 2 — Fan-out dispatch: A dispatcher process (or DB trigger / stream consumer) reads the new event, queries subscriptions for all active subscribers on that topic, and inserts one deliveries row per subscriber with status = pending.
Step 3 — Deliver: Worker threads poll deliveries for pending rows, lease them (same UPDATE/locked_until pattern as a job queue), and push the event payload to each subscriber's endpoint.
Step 4 — Acknowledge: On HTTP 200 (or equivalent ACK), set status = delivered. On failure, increment attempts and set next_attempt with exponential backoff.
Failure Handling
At-Least-Once Delivery
Each delivery row is independently retried. A crashed worker releases its lease via the timeout sweeper. Subscribers must be idempotent — the same event may arrive more than once. Include the event_id in the payload so subscribers can deduplicate.
Idempotency
Subscribers should use event_id + subscriber_id as a natural idempotency key. On receipt, check a local processed_events table before applying side effects. Return 200 on duplicate receipt so the fan-out service stops retrying.
Poison Messages
If a subscriber's endpoint is persistently down or returns errors, deliveries exhaust retries and enter status = failed. The fan-out service should alert operators and optionally pause that subscription to stop wasting resources, while continuing delivery to healthy subscribers.
Ordering Guarantees
Per-subscriber ordering is preserved if you process that subscriber's deliveries sequentially (single worker or partition key). Cross-subscriber ordering is not guaranteed and usually not required. If strict ordering matters, use a Kafka partition per subscriber.
Scalability Considerations
- Write fan-out vs. read fan-out — writing N delivery rows at publish time (write fan-out) is simple but expensive for high-subscriber topics. Read fan-out (subscribers query a shared event log using a cursor/offset) eliminates N writes but requires each subscriber to manage its own offset.
- Hybrid approach — use write fan-out for small subscriber counts and read fan-out (e.g., Kafka consumer groups) for topics with thousands of consumers.
- Broker delegation — for very high throughput, replace the deliveries table with SNS/SQS fan-out, Kafka topics, or Google Pub/Sub. The data model above becomes the durable audit layer, not the hot path.
- Back-pressure — slow subscribers must not block fast ones. Per-subscriber delivery queues with independent depth monitoring prevent one lagging consumer from affecting others.
- Batching — group multiple events into a single HTTP call to reduce per-request overhead when subscriber endpoints support it.
Summary
A Fan-Out Service decouples event producers from their consumers through a dispatch layer that creates independent, retryable delivery records per subscriber. The critical design choices are write fan-out vs. read fan-out, per-subscriber retry isolation, and whether to use a database-backed queue or a dedicated broker for the delivery hot path. In interviews, be ready to compare SNS+SQS fan-out (managed, push) against Kafka consumer groups (pull, offset-based) and explain when each is appropriate.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is a fan-out service in system design?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A fan-out service takes a single event or message and delivers copies of it to many recipients or downstream systems simultaneously. It is a core pattern in news feed generation, notification dispatch, and event-driven architectures used by companies like Meta and Amazon.”
}
},
{
“@type”: “Question”,
“name”: “What are the two main fan-out strategies — push vs. pull?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Push fan-out (fan-out on write) precomputes and writes a copy of the event to every recipient’s inbox at publish time, giving fast reads but expensive writes for users with large follower counts. Pull fan-out (fan-out on read) fetches and merges data at read time, which is cheaper to write but slower to read. Hybrid approaches apply push for normal users and pull for celebrities or high-follower accounts.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle hot users or celebrities in a fan-out system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “For accounts with millions of followers, pure push fan-out is prohibitively expensive. The standard solution is a hybrid model: the system identifies high-follower accounts and switches them to pull-based delivery, merging their posts at read time with the pre-populated feed. A follower-count threshold (e.g., 10 000) typically triggers this special handling.”
}
},
{
“@type”: “Question”,
“name”: “How do you scale a fan-out service to handle millions of events per second?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “High-throughput fan-out relies on a message broker (e.g., Kafka) to buffer events, a fleet of stateless worker processes that consume partitions in parallel, and async writes to recipient inboxes stored in a distributed key-value store or cache. Rate limiting, back-pressure mechanisms, and per-region sharding prevent overload on downstream consumers.”
}
}
]
}
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering