Webhook Delivery System Low-Level Design

What is a Webhook?

A webhook is a user-defined HTTP callback. When an event occurs in a platform (payment succeeded, order shipped, file uploaded), the platform sends an HTTP POST request to a URL configured by the user. Webhooks power integrations: Stripe notifies your server of payment events, GitHub notifies CI/CD pipelines of push events, Shopify notifies fulfillment services of new orders.

Requirements

  • Allow users to register webhook endpoints (URL, event types to subscribe to)
  • Deliver events to registered endpoints within 5 seconds
  • At-least-once delivery: retry on failure with exponential backoff
  • Secure payloads (HMAC signature), prevent replay attacks
  • 10M events/day, 1M registered webhooks

Data Model

WebhookEndpoint(endpoint_id UUID, user_id, url VARCHAR, secret VARCHAR,
                event_types VARCHAR[], status ENUM(ACTIVE,DISABLED),
                created_at, last_success_at, failure_count)

WebhookDelivery(delivery_id UUID, endpoint_id, event_id, event_type,
                payload JSONB, status ENUM(PENDING,SUCCESS,FAILED,ABANDONED),
                attempt_count, next_retry_at, created_at, delivered_at,
                response_code, response_body)

Delivery Architecture

Event Source → Kafka (event_type topic) → Webhook Fanout Service
                                        → For each matching endpoint:
                                          WebhookDelivery record created (PENDING)
                                          → Delivery Queue (Kafka or delayed job)
                                          → Webhook Delivery Worker
                                          → HTTP POST to endpoint URL
                                          → Update delivery status

Fanout: Event to Endpoints

When event E of type payment.succeeded occurs: query all WebhookEndpoints where user_id=owner AND event_types contains payment.succeeded AND status=ACTIVE. For large platforms (many endpoints), cache the endpoint lookup in Redis: key=endpoints:{user_id}:{event_type}, TTL=5min. For each matching endpoint, create a WebhookDelivery record and enqueue a delivery job.

Delivery and Retry

Delivery worker sends HTTP POST to the endpoint URL with timeout=30s. On success (2xx response): update delivery status=SUCCESS, reset endpoint.failure_count=0. On failure (non-2xx, timeout, DNS failure): increment attempt_count, schedule retry with exponential backoff:

retry_delays = [5s, 30s, 2min, 10min, 30min, 2h, 6h, 24h]
# attempt_count 1: retry after 5s
# attempt_count 2: retry after 30s
# attempt_count 8: retry after 24h
# attempt_count > 8: status=ABANDONED

After N consecutive failures (e.g., 3 days of failures), disable the endpoint: status=DISABLED, notify the user via email.

Payload Signing (HMAC)

Sign each payload with the endpoint’s secret key so the recipient can verify authenticity:

import hmac, hashlib, time

def sign_payload(secret, payload_bytes, timestamp):
    signed_content = f"{timestamp}.".encode() + payload_bytes
    signature = hmac.new(secret.encode(), signed_content, hashlib.sha256).hexdigest()
    return f"t={timestamp},v1={signature}"

# Delivery adds headers:
# Webhook-Signature: t=1620000000,v1=abc123...
# Webhook-Timestamp: 1620000000

# Recipient verifies:
def verify(secret, payload_bytes, timestamp_str, signature_str):
    if abs(time.time() - int(timestamp_str)) > 300:  # 5 min tolerance
        raise ReplayAttack
    expected = sign_payload(secret, payload_bytes, timestamp_str)
    return hmac.compare_digest(expected, signature_str)

Ordering and Idempotency

Webhook deliveries may arrive out of order due to retries. Include event_id and created_at in each payload. Recipients should use event_id for idempotency (deduplicate on their end) and created_at to detect out-of-order delivery. Provide a delivery_id to allow idempotent retries — if the recipient processed a delivery but the acknowledgment timed out, the re-delivered payload has the same delivery_id.

Key Design Decisions

  • Kafka for event ingestion — decouples event sources from the fanout service
  • WebhookDelivery record per delivery — full audit trail, enables re-delivery
  • Exponential backoff with cap — respects failing endpoints, reduces thundering herd
  • HMAC-SHA256 signature — recipient can verify authenticity without TLS inspection
  • Timestamp in signature — prevents replay attacks with a 5-minute tolerance window

Stripe system design covers webhook delivery and event systems. See common questions for Stripe interview: webhook and event delivery system design.

Shopify system design covers webhook delivery for merchant integrations. Review patterns for Shopify interview: webhook and integration system design.

Atlassian system design covers webhook and event notification systems. See design patterns for Atlassian interview: webhook and event notification system design.

Scroll to Top