Webhook Fan-Out System Low-Level Design: Event Routing, Parallel Delivery, Backpressure, and Observability

What Is Webhook Fan-Out?

Webhook fan-out is the process of delivering a single event to multiple registered external endpoints (webhooks). When a payment platform processes a charge, it may need to notify a fraud system, an accounting service, a CRM, and a customer dashboard — all registered as webhook endpoints for the “charge.succeeded” event type. The fan-out system receives the event once and creates independent, isolated delivery attempts for each matching endpoint.

The design challenges of webhook fan-out differ from general message queues: endpoints are external HTTP servers with unpredictable latency and reliability, the number of registered endpoints per event type varies widely (from 1 to thousands), and slow or failing endpoints must never delay delivery to healthy endpoints.

Event Routing

Each WebhookEndpoint registers a topic_filter — a pattern matching the event types it wants to receive. Filters can be exact matches (“charge.succeeded”), prefix matches (“charge.*”), or wildcard patterns (“*.succeeded”). When an event arrives, the routing layer queries WebhookEndpoint for all endpoints whose topic_filter matches the event type. This lookup should be cached (the set of active endpoints rarely changes) and filtered in memory after loading from the database.

For high-throughput platforms with thousands of registered endpoints per event type, the routing query should use a denormalized or pre-computed mapping: at endpoint registration time, insert rows into a TopicEndpointMap table keyed by (event_type, endpoint_id). This makes routing a simple indexed lookup rather than a pattern-matching scan.

Fan-Out Worker and Parallel Delivery

The fan-out worker receives events from the event bus (Kafka, SQS, or internal queue) and, for each matching endpoint, creates a WebhookDelivery record and enqueues a delivery task. Delivery tasks are processed by a pool of delivery workers — HTTP client threads that POST the event payload to the endpoint URL and record the result.

Fan-out and delivery are intentionally separated:

Fan-out is fast (database writes only, no HTTP calls) and can create thousands of delivery tasks per event in milliseconds.
Delivery is slow and unpredictable (external HTTP calls, variable latency), but isolated per endpoint.

This separation ensures that fan-out throughput is bounded by database write capacity, not by the slowest endpoint's HTTP response time.

Queue Isolation Per Endpoint

Without queue isolation, a slow endpoint can back up the shared delivery queue and delay delivery to all other endpoints. The solution: each endpoint has its own virtual delivery queue — in practice, a delivery worker pool partitioned by endpoint_id, or a priority queue where tasks are dequeued in a round-robin or least-recently-served order per endpoint.

In Celery or similar task queues, this is implemented by routing delivery tasks to per-endpoint queues (endpoint_{id}) or by using a broker that supports per-key rate limiting and fair scheduling. In Kafka, use a topic partitioned by endpoint_id — each partition is consumed by a dedicated delivery worker, and a slow partition does not block other partitions.

Per-Endpoint Rate Limiting and Concurrency Control

Sending too many concurrent requests to a slow endpoint exhausts its connection pool and causes cascade failures. Per-endpoint rate limiting caps the number of in-flight delivery attempts for a given endpoint. Implementation options:

Semaphore-based: each delivery worker checks a Redis counter for the endpoint before starting a delivery. If the counter equals max_concurrency, the task is requeued with a short delay.
Queue depth cap: if the endpoint's delivery queue depth exceeds a threshold, reject new fan-out tasks for that endpoint (returning a backpressure signal to the fan-out worker).

max_concurrency is stored in WebhookEndpoint.max_concurrency (e.g., 5 concurrent deliveries per endpoint) and respected by all delivery workers.

Backpressure

When an endpoint queue depth exceeds a configured threshold (e.g., 10,000 pending deliveries), the fan-out worker stops enqueuing new tasks for that endpoint. New events for that endpoint are either dropped (with a log entry), written to a backpressure buffer (a secondary queue with lower priority), or throttled at the fan-out layer. The endpoint owner is notified via an alert that their endpoint is falling behind. Backpressure prevents the delivery queue from growing unboundedly, which would cause memory exhaustion and delay recovery when the endpoint becomes healthy again.

HMAC Request Signing

Webhook receivers need to verify that requests come from the legitimate sender, not a third party. The standard approach: HMAC-SHA256 of the request body using a shared secret, sent as the X-Webhook-Signature header. The endpoint registers a secret at creation time (stored as a hash in the database, or in a secret manager). The delivery worker computes HMAC-SHA256(secret, payload_bytes) and includes it in the delivery request. The receiver recomputes the HMAC with its stored secret and compares using a constant-time comparison function (to prevent timing attacks).

Delivery Observability

Per-endpoint operational metrics are critical for diagnosing delivery issues before they escalate. Key metrics tracked per endpoint per time window:

success_rate — fraction of deliveries that received a 2xx HTTP response
avg_latency_ms — mean HTTP response time for successful deliveries
queue_depth — current number of pending delivery tasks
error_rate — fraction of deliveries that failed after all retries
retry_rate — fraction of deliveries that required at least one retry

These metrics are aggregated into hourly DeliveryMetric rows and surfaced in a per-endpoint observability dashboard. Anomaly detection on success_rate and queue_depth can trigger automatic endpoint suspension (stop delivering to an endpoint that has been failing for 24 hours) and alert the endpoint owner.

SQL Data Model

-- Registered webhook endpoints
CREATE TABLE WebhookEndpoint (
    id              BIGSERIAL PRIMARY KEY,
    url             TEXT NOT NULL,
    topic_filter    VARCHAR(255) NOT NULL,  -- event type pattern (e.g., "charge.*")
    secret_hash     TEXT NOT NULL,          -- HMAC secret stored as hash or in secret manager
    max_concurrency INT NOT NULL DEFAULT 5,
    status          VARCHAR(32) NOT NULL DEFAULT 'active',  -- active, suspended, disabled
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Individual delivery attempts
CREATE TABLE WebhookDelivery (
    id              BIGSERIAL PRIMARY KEY,
    endpoint_id     BIGINT NOT NULL REFERENCES WebhookEndpoint(id),
    event_id        VARCHAR(255) NOT NULL,
    payload         JSONB NOT NULL,
    status          VARCHAR(32) NOT NULL DEFAULT 'pending',  -- pending, delivered, failed
    attempts        INT NOT NULL DEFAULT 0,
    next_retry_at   TIMESTAMPTZ,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    delivered_at    TIMESTAMPTZ
);

-- Hourly delivery metrics per endpoint
CREATE TABLE DeliveryMetric (
    endpoint_id         BIGINT NOT NULL REFERENCES WebhookEndpoint(id),
    hour                TIMESTAMPTZ NOT NULL,
    success_count       BIGINT NOT NULL DEFAULT 0,
    failure_count       BIGINT NOT NULL DEFAULT 0,
    avg_latency_ms      NUMERIC(10,2),
    PRIMARY KEY (endpoint_id, hour)
);

CREATE INDEX idx_delivery_endpoint_status ON WebhookDelivery(endpoint_id, status, next_retry_at);
CREATE INDEX idx_delivery_event ON WebhookDelivery(event_id);
CREATE INDEX idx_metric_endpoint_hour ON DeliveryMetric(endpoint_id, hour DESC);

Python Implementation Sketch

import hashlib, hmac, time, json
import urllib.request, urllib.error
from dataclasses import dataclass
from typing import Optional

@dataclass
class DeliveryResult:
    success: bool
    status_code: Optional[int]
    latency_ms: float
    error: Optional[str] = None

def fan_out_event(event_type: str, payload: dict, event_id: str,
                  endpoints_db: list[dict], delivery_queue: list) -> int:
    """Create delivery tasks for all matching endpoints. Returns task count."""
    created = 0
    for endpoint in endpoints_db:
        if not _matches_topic(event_type, endpoint["topic_filter"]):
            continue
        if endpoint.get("status") != "active":
            continue
        delivery_task = {
            "endpoint_id": endpoint["id"],
            "event_id": event_id,
            "payload": payload,
            "url": endpoint["url"],
            "secret": endpoint["secret_hash"],
            "attempts": 0,
        }
        delivery_queue.append(delivery_task)
        created += 1
    return created

def _matches_topic(event_type: str, pattern: str) -> bool:
    """Simple glob matching: exact, prefix with *, or full wildcard."""
    if pattern == "*":
        return True
    if pattern.endswith(".*"):
        return event_type.startswith(pattern[:-2])
    return event_type == pattern

def deliver_to_endpoint(delivery: dict) -> DeliveryResult:
    """HTTP POST payload to endpoint URL with HMAC signature."""
    payload_bytes = json.dumps(delivery["payload"]).encode()
    signature = compute_hmac(delivery["secret"], payload_bytes)
    headers = {
        "Content-Type": "application/json",
        "X-Webhook-Signature": f"sha256={signature}",
        "X-Event-ID": delivery["event_id"],
        "User-Agent": "WebhookDelivery/1.0",
    }
    start = time.monotonic()
    try:
        req = urllib.request.Request(
            delivery["url"], data=payload_bytes, headers=headers, method="POST"
        )
        with urllib.request.urlopen(req, timeout=10) as resp:
            latency_ms = (time.monotonic() - start) * 1000
            return DeliveryResult(
                success=200 <= resp.status  str:
    """Compute HMAC-SHA256 of payload with secret."""
    return hmac.new(secret.encode(), payload, hashlib.sha256).hexdigest()

def rate_limit_check(endpoint_id: int, in_flight_counts: dict,
                     max_concurrency: int) -> bool:
    """Return True if delivery is allowed under rate limit."""
    current = in_flight_counts.get(endpoint_id, 0)
    return current  dict:
    """Compute success rate, avg latency, failure count for an endpoint."""
    cutoff = time.time() - window_hours * 3600
    recent = [d for d in deliveries
              if d["endpoint_id"] == endpoint_id and d.get("created_at", 0) >= cutoff]
    if not recent:
        return {"success_rate": None, "avg_latency_ms": None,
                "success_count": 0, "failure_count": 0}
    successes = [d for d in recent if d.get("status") == "delivered"]
    failures = [d for d in recent if d.get("status") == "failed"]
    latencies = [d["latency_ms"] for d in successes if "latency_ms" in d]
    return {
        "success_rate": len(successes) / len(recent),
        "avg_latency_ms": sum(latencies) / len(latencies) if latencies else None,
        "success_count": len(successes),
        "failure_count": len(failures),
    }

Frequently Asked Questions

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “Why is queue isolation per endpoint important in webhook fan-out?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Without queue isolation, all delivery tasks for all endpoints share a single queue. A slow or consistently failing endpoint causes its tasks to pile up and consume worker capacity, delaying delivery to all other healthy endpoints. Queue isolation gives each endpoint its own virtual queue or worker partition. A failing endpoint can back up its own queue to thousands of tasks while other endpoints continue receiving deliveries in near-real-time. In Kafka-based implementations, partitioning by endpoint_id provides natural isolation: each partition is consumed independently, and a backed-up partition does not affect others.”
}
},
{
“@type”: “Question”,
“name”: “How does per-endpoint rate limiting work in practice?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Per-endpoint rate limiting caps the number of concurrent in-flight HTTP deliveries for a given endpoint. Implementation: before starting a delivery, the worker atomically increments a Redis counter for the endpoint (INCR endpoint:{id}:inflight). If the counter exceeds max_concurrency, the worker requeues the task with a short delay and decrements the counter. After a delivery completes (success or failure), the counter is decremented (DECR). This prevents overwhelming slow receivers with concurrent requests. max_concurrency is configured per endpoint based on the receiver's capacity and stored in WebhookEndpoint.max_concurrency.”
}
},
{
“@type”: “Question”,
“name”: “How should receivers verify HMAC webhook signatures?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The receiver reads the raw request body bytes (before any JSON parsing that might reorder keys) and recomputes HMAC-SHA256 using its stored shared secret. It then compares the computed signature with the value in the X-Webhook-Signature header using a constant-time comparison function (hmac.compare_digest in Python, crypto.timingSafeEqual in Node.js). Constant-time comparison is required to prevent timing attacks: a naive string equality check returns early on the first mismatched character, allowing an attacker to measure response time and brute-force the signature one byte at a time. The receiver should also reject requests whose timestamp (if included in a signed header) is more than 5 minutes old, to prevent replay attacks.”
}
},
{
“@type”: “Question”,
“name”: “What observability metrics matter most for webhook delivery?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The most actionable webhook delivery metrics are: success_rate (the fraction of delivery attempts that received a 2xx HTTP response — a drop below 95% warrants investigation), queue_depth (the number of pending tasks for an endpoint — steady growth indicates the endpoint cannot keep up), avg_latency_ms (slow receivers increase worker occupation time and reduce throughput), and retry_rate (high retry rates indicate intermittent failures that may become permanent). At the system level, track fan-out lag (time from event creation to fan-out worker processing) and delivery lag (time from task creation to first delivery attempt). Alert on queue_depth exceeding a threshold and on endpoints with success_rate below 80% for more than 30 minutes.”
}
}
]
}

Queue Isolation Design

Without isolation, a failing endpoint backs up the shared queue and delays delivery to all healthy endpoints. Per-endpoint virtual queues (or Kafka partitioning by endpoint_id) ensure each endpoint's failure is contained — its tasks back up only its own queue while other endpoints continue unaffected.

Per-Endpoint Rate Limiting

Before each delivery, atomically increment a Redis counter for the endpoint. If counter exceeds max_concurrency, requeue the task with a short delay. Decrement the counter on delivery completion. This prevents overwhelming slow receivers with concurrent requests. max_concurrency is configured per endpoint and stored in the WebhookEndpoint table.

HMAC Verification

Receivers compute HMAC-SHA256 of the raw request body bytes using the shared secret, then compare with the X-Webhook-Signature header using constant-time comparison (hmac.compare_digest). Constant-time comparison prevents timing attacks. Reject requests with timestamps more than 5 minutes old to prevent replay attacks.

Observability Metrics

Most actionable metrics: success_rate (alert below 95%), queue_depth (steady growth means endpoint cannot keep up), avg_latency_ms (slow receivers reduce worker throughput), retry_rate (high values signal intermittent failures becoming permanent). Also track fan-out lag and delivery lag at the system level.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does queue isolation prevent a slow endpoint from blocking others?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each endpoint has its own virtual delivery queue; delivery workers dequeue from per-endpoint queues independently; a slow or failing endpoint's queue backs up without consuming workers from other endpoint queues.”
}
},
{
“@type”: “Question”,
“name”: “How is per-endpoint rate limiting enforced?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A semaphore or Redis counter tracks active concurrent deliveries per endpoint; a new delivery task checks the counter before starting; if the endpoint is at max_concurrency, the task waits or is requeued with a short delay.”
}
},
{
“@type”: “Question”,
“name”: “How is the HMAC signature verified at the subscriber endpoint?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The gateway computes HMAC-SHA256(secret, raw_payload_bytes) and sends it as the X-Webhook-Signature header; the subscriber independently computes the same HMAC and uses hmac.compare_digest to compare in constant time, preventing timing attacks.”
}
},
{
“@type”: “Question”,
“name”: “How are delivery metrics aggregated for the observability dashboard?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A background job aggregates WebhookDelivery outcomes per endpoint per hour into DeliveryMetric rows; the dashboard reads these pre-aggregated rows for success_rate, avg_latency, and error_rate charts without scanning the full delivery log.”
}
}
]
}