Webhook Fan-Out System Low-Level Design: Event Routing, Parallel Delivery, Backpressure, and Observability

⏱ 11 min read

What Is Webhook Fan-Out?

Webhook fan-out is the process of delivering a single event to multiple registered external endpoints (webhooks). When a payment platform processes a charge, it may need to notify a fraud system, an accounting service, a CRM, and a customer dashboard — all registered as webhook endpoints for the “charge.succeeded” event type. The fan-out system receives the event once and creates independent, isolated delivery attempts for each matching endpoint.

The design challenges of webhook fan-out differ from general message queues: endpoints are external HTTP servers with unpredictable latency and reliability, the number of registered endpoints per event type varies widely (from 1 to thousands), and slow or failing endpoints must never delay delivery to healthy endpoints.

Event Routing

Each WebhookEndpoint registers a topic_filter — a pattern matching the event types it wants to receive. Filters can be exact matches (“charge.succeeded”), prefix matches (“charge.*”), or wildcard patterns (“*.succeeded”). When an event arrives, the routing layer queries WebhookEndpoint for all endpoints whose topic_filter matches the event type. This lookup should be cached (the set of active endpoints rarely changes) and filtered in memory after loading from the database.

For high-throughput platforms with thousands of registered endpoints per event type, the routing query should use a denormalized or pre-computed mapping: at endpoint registration time, insert rows into a TopicEndpointMap table keyed by (event_type, endpoint_id). This makes routing a simple indexed lookup rather than a pattern-matching scan.

Fan-Out Worker and Parallel Delivery

The fan-out worker receives events from the event bus (Kafka, SQS, or internal queue) and, for each matching endpoint, creates a WebhookDelivery record and enqueues a delivery task. Delivery tasks are processed by a pool of delivery workers — HTTP client threads that POST the event payload to the endpoint URL and record the result.

Fan-out and delivery are intentionally separated:

Fan-out is fast (database writes only, no HTTP calls) and can create thousands of delivery tasks per event in milliseconds.
Delivery is slow and unpredictable (external HTTP calls, variable latency), but isolated per endpoint.

This separation ensures that fan-out throughput is bounded by database write capacity, not by the slowest endpoint's HTTP response time.

Queue Isolation Per Endpoint

Without queue isolation, a slow endpoint can back up the shared delivery queue and delay delivery to all other endpoints. The solution: each endpoint has its own virtual delivery queue — in practice, a delivery worker pool partitioned by endpoint_id, or a priority queue where tasks are dequeued in a round-robin or least-recently-served order per endpoint.

In Celery or similar task queues, this is implemented by routing delivery tasks to per-endpoint queues (endpoint_{id}) or by using a broker that supports per-key rate limiting and fair scheduling. In Kafka, use a topic partitioned by endpoint_id — each partition is consumed by a dedicated delivery worker, and a slow partition does not block other partitions.

Per-Endpoint Rate Limiting and Concurrency Control

Sending too many concurrent requests to a slow endpoint exhausts its connection pool and causes cascade failures. Per-endpoint rate limiting caps the number of in-flight delivery attempts for a given endpoint. Implementation options:

Semaphore-based: each delivery worker checks a Redis counter for the endpoint before starting a delivery. If the counter equals max_concurrency, the task is requeued with a short delay.
Queue depth cap: if the endpoint's delivery queue depth exceeds a threshold, reject new fan-out tasks for that endpoint (returning a backpressure signal to the fan-out worker).

max_concurrency is stored in WebhookEndpoint.max_concurrency (e.g., 5 concurrent deliveries per endpoint) and respected by all delivery workers.

Backpressure

When an endpoint queue depth exceeds a configured threshold (e.g., 10,000 pending deliveries), the fan-out worker stops enqueuing new tasks for that endpoint. New events for that endpoint are either dropped (with a log entry), written to a backpressure buffer (a secondary queue with lower priority), or throttled at the fan-out layer. The endpoint owner is notified via an alert that their endpoint is falling behind. Backpressure prevents the delivery queue from growing unboundedly, which would cause memory exhaustion and delay recovery when the endpoint becomes healthy again.

HMAC Request Signing

Webhook receivers need to verify that requests come from the legitimate sender, not a third party. The standard approach: HMAC-SHA256 of the request body using a shared secret, sent as the X-Webhook-Signature header. The endpoint registers a secret at creation time (stored as a hash in the database, or in a secret manager). The delivery worker computes HMAC-SHA256(secret, payload_bytes) and includes it in the delivery request. The receiver recomputes the HMAC with its stored secret and compares using a constant-time comparison function (to prevent timing attacks).

Delivery Observability

Per-endpoint operational metrics are critical for diagnosing delivery issues before they escalate. Key metrics tracked per endpoint per time window:

success_rate — fraction of deliveries that received a 2xx HTTP response
avg_latency_ms — mean HTTP response time for successful deliveries
queue_depth — current number of pending delivery tasks
error_rate — fraction of deliveries that failed after all retries
retry_rate — fraction of deliveries that required at least one retry

These metrics are aggregated into hourly DeliveryMetric rows and surfaced in a per-endpoint observability dashboard. Anomaly detection on success_rate and queue_depth can trigger automatic endpoint suspension (stop delivering to an endpoint that has been failing for 24 hours) and alert the endpoint owner.

SQL Data Model

-- Registered webhook endpoints
CREATE TABLE WebhookEndpoint (
    id              BIGSERIAL PRIMARY KEY,
    url             TEXT NOT NULL,
    topic_filter    VARCHAR(255) NOT NULL,  -- event type pattern (e.g., "charge.*")
    secret_hash     TEXT NOT NULL,          -- HMAC secret stored as hash or in secret manager
    max_concurrency INT NOT NULL DEFAULT 5,
    status          VARCHAR(32) NOT NULL DEFAULT 'active',  -- active, suspended, disabled
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Individual delivery attempts
CREATE TABLE WebhookDelivery (
    id              BIGSERIAL PRIMARY KEY,
    endpoint_id     BIGINT NOT NULL REFERENCES WebhookEndpoint(id),
    event_id        VARCHAR(255) NOT NULL,
    payload         JSONB NOT NULL,
    status          VARCHAR(32) NOT NULL DEFAULT 'pending',  -- pending, delivered, failed
    attempts        INT NOT NULL DEFAULT 0,
    next_retry_at   TIMESTAMPTZ,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    delivered_at    TIMESTAMPTZ
);

-- Hourly delivery metrics per endpoint
CREATE TABLE DeliveryMetric (
    endpoint_id         BIGINT NOT NULL REFERENCES WebhookEndpoint(id),
    hour                TIMESTAMPTZ NOT NULL,
    success_count       BIGINT NOT NULL DEFAULT 0,
    failure_count       BIGINT NOT NULL DEFAULT 0,
    avg_latency_ms      NUMERIC(10,2),
    PRIMARY KEY (endpoint_id, hour)
);

CREATE INDEX idx_delivery_endpoint_status ON WebhookDelivery(endpoint_id, status, next_retry_at);
CREATE INDEX idx_delivery_event ON WebhookDelivery(event_id);
CREATE INDEX idx_metric_endpoint_hour ON DeliveryMetric(endpoint_id, hour DESC);

Python Implementation Sketch

import hashlib, hmac, time, json
import urllib.request, urllib.error
from dataclasses import dataclass
from typing import Optional

@dataclass
class DeliveryResult:
    success: bool
    status_code: Optional[int]
    latency_ms: float
    error: Optional[str] = None

def fan_out_event(event_type: str, payload: dict, event_id: str,
                  endpoints_db: list[dict], delivery_queue: list) -> int:
    """Create delivery tasks for all matching endpoints. Returns task count."""
    created = 0
    for endpoint in endpoints_db:
        if not _matches_topic(event_type, endpoint["topic_filter"]):
            continue
        if endpoint.get("status") != "active":
            continue
        delivery_task = {
            "endpoint_id": endpoint["id"],
            "event_id": event_id,
            "payload": payload,
            "url": endpoint["url"],
            "secret": endpoint["secret_hash"],
            "attempts": 0,
        }
        delivery_queue.append(delivery_task)
        created += 1
    return created

def _matches_topic(event_type: str, pattern: str) -> bool:
    """Simple glob matching: exact, prefix with *, or full wildcard."""
    if pattern == "*":
        return True
    if pattern.endswith(".*"):
        return event_type.startswith(pattern[:-2])
    return event_type == pattern

def deliver_to_endpoint(delivery: dict) -> DeliveryResult:
    """HTTP POST payload to endpoint URL with HMAC signature."""
    payload_bytes = json.dumps(delivery["payload"]).encode()
    signature = compute_hmac(delivery["secret"], payload_bytes)
    headers = {
        "Content-Type": "application/json",
        "X-Webhook-Signature": f"sha256={signature}",
        "X-Event-ID": delivery["event_id"],
        "User-Agent": "WebhookDelivery/1.0",
    }
    start = time.monotonic()
    try:
        req = urllib.request.Request(
            delivery["url"], data=payload_bytes, headers=headers, method="POST"
        )
        with urllib.request.urlopen(req, timeout=10) as resp:
            latency_ms = (time.monotonic() - start) * 1000
            return DeliveryResult(
                success=200 <= resp.status  str:
    """Compute HMAC-SHA256 of payload with secret."""
    return hmac.new(secret.encode(), payload, hashlib.sha256).hexdigest()

def rate_limit_check(endpoint_id: int, in_flight_counts: dict,
                     max_concurrency: int) -> bool:
    """Return True if delivery is allowed under rate limit."""
    current = in_flight_counts.get(endpoint_id, 0)
    return current  dict:
    """Compute success rate, avg latency, failure count for an endpoint."""
    cutoff = time.time() - window_hours * 3600
    recent = [d for d in deliveries
              if d["endpoint_id"] == endpoint_id and d.get("created_at", 0) >= cutoff]
    if not recent:
        return {"success_rate": None, "avg_latency_ms": None,
                "success_count": 0, "failure_count": 0}
    successes = [d for d in recent if d.get("status") == "delivered"]
    failures = [d for d in recent if d.get("status") == "failed"]
    latencies = [d["latency_ms"] for d in successes if "latency_ms" in d]
    return {
        "success_rate": len(successes) / len(recent),
        "avg_latency_ms": sum(latencies) / len(latencies) if latencies else None,
        "success_count": len(successes),
        "failure_count": len(failures),
    }

Frequently Asked Questions

Queue Isolation Design

Without isolation, a failing endpoint backs up the shared queue and delays delivery to all healthy endpoints. Per-endpoint virtual queues (or Kafka partitioning by endpoint_id) ensure each endpoint's failure is contained — its tasks back up only its own queue while other endpoints continue unaffected.

Per-Endpoint Rate Limiting

Before each delivery, atomically increment a Redis counter for the endpoint. If counter exceeds max_concurrency, requeue the task with a short delay. Decrement the counter on delivery completion. This prevents overwhelming slow receivers with concurrent requests. max_concurrency is configured per endpoint and stored in the WebhookEndpoint table.

HMAC Verification

Receivers compute HMAC-SHA256 of the raw request body bytes using the shared secret, then compare with the X-Webhook-Signature header using constant-time comparison (hmac.compare_digest). Constant-time comparison prevents timing attacks. Reject requests with timestamps more than 5 minutes old to prevent replay attacks.

Observability Metrics

Most actionable metrics: success_rate (alert below 95%), queue_depth (steady growth means endpoint cannot keep up), avg_latency_ms (slow receivers reduce worker throughput), retry_rate (high values signal intermittent failures becoming permanent). Also track fan-out lag and delivery lag at the system level.