What Is Webhook Fan-Out?
Webhook fan-out is the process of delivering a single event to multiple registered external endpoints (webhooks). When a payment platform processes a charge, it may need to notify a fraud system, an accounting service, a CRM, and a customer dashboard — all registered as webhook endpoints for the “charge.succeeded” event type. The fan-out system receives the event once and creates independent, isolated delivery attempts for each matching endpoint.
The design challenges of webhook fan-out differ from general message queues: endpoints are external HTTP servers with unpredictable latency and reliability, the number of registered endpoints per event type varies widely (from 1 to thousands), and slow or failing endpoints must never delay delivery to healthy endpoints.
Event Routing
Each WebhookEndpoint registers a topic_filter — a pattern matching the event types it wants to receive. Filters can be exact matches (“charge.succeeded”), prefix matches (“charge.*”), or wildcard patterns (“*.succeeded”). When an event arrives, the routing layer queries WebhookEndpoint for all endpoints whose topic_filter matches the event type. This lookup should be cached (the set of active endpoints rarely changes) and filtered in memory after loading from the database.
For high-throughput platforms with thousands of registered endpoints per event type, the routing query should use a denormalized or pre-computed mapping: at endpoint registration time, insert rows into a TopicEndpointMap table keyed by (event_type, endpoint_id). This makes routing a simple indexed lookup rather than a pattern-matching scan.
Fan-Out Worker and Parallel Delivery
The fan-out worker receives events from the event bus (Kafka, SQS, or internal queue) and, for each matching endpoint, creates a WebhookDelivery record and enqueues a delivery task. Delivery tasks are processed by a pool of delivery workers — HTTP client threads that POST the event payload to the endpoint URL and record the result.
Fan-out and delivery are intentionally separated:
- Fan-out is fast (database writes only, no HTTP calls) and can create thousands of delivery tasks per event in milliseconds.
- Delivery is slow and unpredictable (external HTTP calls, variable latency), but isolated per endpoint.
This separation ensures that fan-out throughput is bounded by database write capacity, not by the slowest endpoint's HTTP response time.
Queue Isolation Per Endpoint
Without queue isolation, a slow endpoint can back up the shared delivery queue and delay delivery to all other endpoints. The solution: each endpoint has its own virtual delivery queue — in practice, a delivery worker pool partitioned by endpoint_id, or a priority queue where tasks are dequeued in a round-robin or least-recently-served order per endpoint.
In Celery or similar task queues, this is implemented by routing delivery tasks to per-endpoint queues (endpoint_{id}) or by using a broker that supports per-key rate limiting and fair scheduling. In Kafka, use a topic partitioned by endpoint_id — each partition is consumed by a dedicated delivery worker, and a slow partition does not block other partitions.
Per-Endpoint Rate Limiting and Concurrency Control
Sending too many concurrent requests to a slow endpoint exhausts its connection pool and causes cascade failures. Per-endpoint rate limiting caps the number of in-flight delivery attempts for a given endpoint. Implementation options:
- Semaphore-based: each delivery worker checks a Redis counter for the endpoint before starting a delivery. If the counter equals max_concurrency, the task is requeued with a short delay.
- Queue depth cap: if the endpoint's delivery queue depth exceeds a threshold, reject new fan-out tasks for that endpoint (returning a backpressure signal to the fan-out worker).
max_concurrency is stored in WebhookEndpoint.max_concurrency (e.g., 5 concurrent deliveries per endpoint) and respected by all delivery workers.
Backpressure
When an endpoint queue depth exceeds a configured threshold (e.g., 10,000 pending deliveries), the fan-out worker stops enqueuing new tasks for that endpoint. New events for that endpoint are either dropped (with a log entry), written to a backpressure buffer (a secondary queue with lower priority), or throttled at the fan-out layer. The endpoint owner is notified via an alert that their endpoint is falling behind. Backpressure prevents the delivery queue from growing unboundedly, which would cause memory exhaustion and delay recovery when the endpoint becomes healthy again.
HMAC Request Signing
Webhook receivers need to verify that requests come from the legitimate sender, not a third party. The standard approach: HMAC-SHA256 of the request body using a shared secret, sent as the X-Webhook-Signature header. The endpoint registers a secret at creation time (stored as a hash in the database, or in a secret manager). The delivery worker computes HMAC-SHA256(secret, payload_bytes) and includes it in the delivery request. The receiver recomputes the HMAC with its stored secret and compares using a constant-time comparison function (to prevent timing attacks).
Delivery Observability
Per-endpoint operational metrics are critical for diagnosing delivery issues before they escalate. Key metrics tracked per endpoint per time window:
- success_rate — fraction of deliveries that received a 2xx HTTP response
- avg_latency_ms — mean HTTP response time for successful deliveries
- queue_depth — current number of pending delivery tasks
- error_rate — fraction of deliveries that failed after all retries
- retry_rate — fraction of deliveries that required at least one retry
These metrics are aggregated into hourly DeliveryMetric rows and surfaced in a per-endpoint observability dashboard. Anomaly detection on success_rate and queue_depth can trigger automatic endpoint suspension (stop delivering to an endpoint that has been failing for 24 hours) and alert the endpoint owner.
SQL Data Model
-- Registered webhook endpoints
CREATE TABLE WebhookEndpoint (
id BIGSERIAL PRIMARY KEY,
url TEXT NOT NULL,
topic_filter VARCHAR(255) NOT NULL, -- event type pattern (e.g., "charge.*")
secret_hash TEXT NOT NULL, -- HMAC secret stored as hash or in secret manager
max_concurrency INT NOT NULL DEFAULT 5,
status VARCHAR(32) NOT NULL DEFAULT 'active', -- active, suspended, disabled
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Individual delivery attempts
CREATE TABLE WebhookDelivery (
id BIGSERIAL PRIMARY KEY,
endpoint_id BIGINT NOT NULL REFERENCES WebhookEndpoint(id),
event_id VARCHAR(255) NOT NULL,
payload JSONB NOT NULL,
status VARCHAR(32) NOT NULL DEFAULT 'pending', -- pending, delivered, failed
attempts INT NOT NULL DEFAULT 0,
next_retry_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
delivered_at TIMESTAMPTZ
);
-- Hourly delivery metrics per endpoint
CREATE TABLE DeliveryMetric (
endpoint_id BIGINT NOT NULL REFERENCES WebhookEndpoint(id),
hour TIMESTAMPTZ NOT NULL,
success_count BIGINT NOT NULL DEFAULT 0,
failure_count BIGINT NOT NULL DEFAULT 0,
avg_latency_ms NUMERIC(10,2),
PRIMARY KEY (endpoint_id, hour)
);
CREATE INDEX idx_delivery_endpoint_status ON WebhookDelivery(endpoint_id, status, next_retry_at);
CREATE INDEX idx_delivery_event ON WebhookDelivery(event_id);
CREATE INDEX idx_metric_endpoint_hour ON DeliveryMetric(endpoint_id, hour DESC);
Python Implementation Sketch
import hashlib, hmac, time, json
import urllib.request, urllib.error
from dataclasses import dataclass
from typing import Optional
@dataclass
class DeliveryResult:
success: bool
status_code: Optional[int]
latency_ms: float
error: Optional[str] = None
def fan_out_event(event_type: str, payload: dict, event_id: str,
endpoints_db: list[dict], delivery_queue: list) -> int:
"""Create delivery tasks for all matching endpoints. Returns task count."""
created = 0
for endpoint in endpoints_db:
if not _matches_topic(event_type, endpoint["topic_filter"]):
continue
if endpoint.get("status") != "active":
continue
delivery_task = {
"endpoint_id": endpoint["id"],
"event_id": event_id,
"payload": payload,
"url": endpoint["url"],
"secret": endpoint["secret_hash"],
"attempts": 0,
}
delivery_queue.append(delivery_task)
created += 1
return created
def _matches_topic(event_type: str, pattern: str) -> bool:
"""Simple glob matching: exact, prefix with *, or full wildcard."""
if pattern == "*":
return True
if pattern.endswith(".*"):
return event_type.startswith(pattern[:-2])
return event_type == pattern
def deliver_to_endpoint(delivery: dict) -> DeliveryResult:
"""HTTP POST payload to endpoint URL with HMAC signature."""
payload_bytes = json.dumps(delivery["payload"]).encode()
signature = compute_hmac(delivery["secret"], payload_bytes)
headers = {
"Content-Type": "application/json",
"X-Webhook-Signature": f"sha256={signature}",
"X-Event-ID": delivery["event_id"],
"User-Agent": "WebhookDelivery/1.0",
}
start = time.monotonic()
try:
req = urllib.request.Request(
delivery["url"], data=payload_bytes, headers=headers, method="POST"
)
with urllib.request.urlopen(req, timeout=10) as resp:
latency_ms = (time.monotonic() - start) * 1000
return DeliveryResult(
success=200 <= resp.status str:
"""Compute HMAC-SHA256 of payload with secret."""
return hmac.new(secret.encode(), payload, hashlib.sha256).hexdigest()
def rate_limit_check(endpoint_id: int, in_flight_counts: dict,
max_concurrency: int) -> bool:
"""Return True if delivery is allowed under rate limit."""
current = in_flight_counts.get(endpoint_id, 0)
return current dict:
"""Compute success rate, avg latency, failure count for an endpoint."""
cutoff = time.time() - window_hours * 3600
recent = [d for d in deliveries
if d["endpoint_id"] == endpoint_id and d.get("created_at", 0) >= cutoff]
if not recent:
return {"success_rate": None, "avg_latency_ms": None,
"success_count": 0, "failure_count": 0}
successes = [d for d in recent if d.get("status") == "delivered"]
failures = [d for d in recent if d.get("status") == "failed"]
latencies = [d["latency_ms"] for d in successes if "latency_ms" in d]
return {
"success_rate": len(successes) / len(recent),
"avg_latency_ms": sum(latencies) / len(latencies) if latencies else None,
"success_count": len(successes),
"failure_count": len(failures),
}
Frequently Asked Questions
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “Why is queue isolation per endpoint important in webhook fan-out?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Without queue isolation, all delivery tasks for all endpoints share a single queue. A slow or consistently failing endpoint causes its tasks to pile up and consume worker capacity, delaying delivery to all other healthy endpoints. Queue isolation gives each endpoint its own virtual queue or worker partition. A failing endpoint can back up its own queue to thousands of tasks while other endpoints continue receiving deliveries in near-real-time. In Kafka-based implementations, partitioning by endpoint_id provides natural isolation: each partition is consumed independently, and a backed-up partition does not affect others.”
}
},
{
“@type”: “Question”,
“name”: “How does per-endpoint rate limiting work in practice?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Per-endpoint rate limiting caps the number of concurrent in-flight HTTP deliveries for a given endpoint. Implementation: before starting a delivery, the worker atomically increments a Redis counter for the endpoint (INCR endpoint:{id}:inflight). If the counter exceeds max_concurrency, the worker requeues the task with a short delay and decrements the counter. After a delivery completes (success or failure), the counter is decremented (DECR). This prevents overwhelming slow receivers with concurrent requests. max_concurrency is configured per endpoint based on the receiver's capacity and stored in WebhookEndpoint.max_concurrency.”
}
},
{
“@type”: “Question”,
“name”: “How should receivers verify HMAC webhook signatures?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The receiver reads the raw request body bytes (before any JSON parsing that might reorder keys) and recomputes HMAC-SHA256 using its stored shared secret. It then compares the computed signature with the value in the X-Webhook-Signature header using a constant-time comparison function (hmac.compare_digest in Python, crypto.timingSafeEqual in Node.js). Constant-time comparison is required to prevent timing attacks: a naive string equality check returns early on the first mismatched character, allowing an attacker to measure response time and brute-force the signature one byte at a time. The receiver should also reject requests whose timestamp (if included in a signed header) is more than 5 minutes old, to prevent replay attacks.”
}
},
{
“@type”: “Question”,
“name”: “What observability metrics matter most for webhook delivery?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The most actionable webhook delivery metrics are: success_rate (the fraction of delivery attempts that received a 2xx HTTP response — a drop below 95% warrants investigation), queue_depth (the number of pending tasks for an endpoint — steady growth indicates the endpoint cannot keep up), avg_latency_ms (slow receivers increase worker occupation time and reduce throughput), and retry_rate (high retry rates indicate intermittent failures that may become permanent). At the system level, track fan-out lag (time from event creation to fan-out worker processing) and delivery lag (time from task creation to first delivery attempt). Alert on queue_depth exceeding a threshold and on endpoints with success_rate below 80% for more than 30 minutes.”
}
}
]
}
Queue Isolation Design
Without isolation, a failing endpoint backs up the shared queue and delays delivery to all healthy endpoints. Per-endpoint virtual queues (or Kafka partitioning by endpoint_id) ensure each endpoint's failure is contained — its tasks back up only its own queue while other endpoints continue unaffected.
Per-Endpoint Rate Limiting
Before each delivery, atomically increment a Redis counter for the endpoint. If counter exceeds max_concurrency, requeue the task with a short delay. Decrement the counter on delivery completion. This prevents overwhelming slow receivers with concurrent requests. max_concurrency is configured per endpoint and stored in the WebhookEndpoint table.
HMAC Verification
Receivers compute HMAC-SHA256 of the raw request body bytes using the shared secret, then compare with the X-Webhook-Signature header using constant-time comparison (hmac.compare_digest). Constant-time comparison prevents timing attacks. Reject requests with timestamps more than 5 minutes old to prevent replay attacks.
Observability Metrics
Most actionable metrics: success_rate (alert below 95%), queue_depth (steady growth means endpoint cannot keep up), avg_latency_ms (slow receivers reduce worker throughput), retry_rate (high values signal intermittent failures becoming permanent). Also track fan-out lag and delivery lag at the system level.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does queue isolation prevent a slow endpoint from blocking others?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each endpoint has its own virtual delivery queue; delivery workers dequeue from per-endpoint queues independently; a slow or failing endpoint's queue backs up without consuming workers from other endpoint queues.”
}
},
{
“@type”: “Question”,
“name”: “How is per-endpoint rate limiting enforced?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A semaphore or Redis counter tracks active concurrent deliveries per endpoint; a new delivery task checks the counter before starting; if the endpoint is at max_concurrency, the task waits or is requeued with a short delay.”
}
},
{
“@type”: “Question”,
“name”: “How is the HMAC signature verified at the subscriber endpoint?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The gateway computes HMAC-SHA256(secret, raw_payload_bytes) and sends it as the X-Webhook-Signature header; the subscriber independently computes the same HMAC and uses hmac.compare_digest to compare in constant time, preventing timing attacks.”
}
},
{
“@type”: “Question”,
“name”: “How are delivery metrics aggregated for the observability dashboard?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A background job aggregates WebhookDelivery outcomes per endpoint per hour into DeliveryMetric rows; the dashboard reads these pre-aggregated rows for success_rate, avg_latency, and error_rate charts without scanning the full delivery log.”
}
}
]
}
See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering