What is a Webhook?
A webhook is a user-defined HTTP callback. When an event occurs in a platform (payment succeeded, order shipped, file uploaded), the platform sends an HTTP POST request to a URL configured by the user. Webhooks power integrations: Stripe notifies your server of payment events, GitHub notifies CI/CD pipelines of push events, Shopify notifies fulfillment services of new orders.
Requirements
- Allow users to register webhook endpoints (URL, event types to subscribe to)
- Deliver events to registered endpoints within 5 seconds
- At-least-once delivery: retry on failure with exponential backoff
- Secure payloads (HMAC signature), prevent replay attacks
- 10M events/day, 1M registered webhooks
Data Model
WebhookEndpoint(endpoint_id UUID, user_id, url VARCHAR, secret VARCHAR,
event_types VARCHAR[], status ENUM(ACTIVE,DISABLED),
created_at, last_success_at, failure_count)
WebhookDelivery(delivery_id UUID, endpoint_id, event_id, event_type,
payload JSONB, status ENUM(PENDING,SUCCESS,FAILED,ABANDONED),
attempt_count, next_retry_at, created_at, delivered_at,
response_code, response_body)
Delivery Architecture
Event Source → Kafka (event_type topic) → Webhook Fanout Service
→ For each matching endpoint:
WebhookDelivery record created (PENDING)
→ Delivery Queue (Kafka or delayed job)
→ Webhook Delivery Worker
→ HTTP POST to endpoint URL
→ Update delivery status
Fanout: Event to Endpoints
When event E of type payment.succeeded occurs: query all WebhookEndpoints where user_id=owner AND event_types contains payment.succeeded AND status=ACTIVE. For large platforms (many endpoints), cache the endpoint lookup in Redis: key=endpoints:{user_id}:{event_type}, TTL=5min. For each matching endpoint, create a WebhookDelivery record and enqueue a delivery job.
Delivery and Retry
Delivery worker sends HTTP POST to the endpoint URL with timeout=30s. On success (2xx response): update delivery status=SUCCESS, reset endpoint.failure_count=0. On failure (non-2xx, timeout, DNS failure): increment attempt_count, schedule retry with exponential backoff:
retry_delays = [5s, 30s, 2min, 10min, 30min, 2h, 6h, 24h] # attempt_count 1: retry after 5s # attempt_count 2: retry after 30s # attempt_count 8: retry after 24h # attempt_count > 8: status=ABANDONED
After N consecutive failures (e.g., 3 days of failures), disable the endpoint: status=DISABLED, notify the user via email.
Payload Signing (HMAC)
Sign each payload with the endpoint’s secret key so the recipient can verify authenticity:
import hmac, hashlib, time
def sign_payload(secret, payload_bytes, timestamp):
signed_content = f"{timestamp}.".encode() + payload_bytes
signature = hmac.new(secret.encode(), signed_content, hashlib.sha256).hexdigest()
return f"t={timestamp},v1={signature}"
# Delivery adds headers:
# Webhook-Signature: t=1620000000,v1=abc123...
# Webhook-Timestamp: 1620000000
# Recipient verifies:
def verify(secret, payload_bytes, timestamp_str, signature_str):
if abs(time.time() - int(timestamp_str)) > 300: # 5 min tolerance
raise ReplayAttack
expected = sign_payload(secret, payload_bytes, timestamp_str)
return hmac.compare_digest(expected, signature_str)
Ordering and Idempotency
Webhook deliveries may arrive out of order due to retries. Include event_id and created_at in each payload. Recipients should use event_id for idempotency (deduplicate on their end) and created_at to detect out-of-order delivery. Provide a delivery_id to allow idempotent retries — if the recipient processed a delivery but the acknowledgment timed out, the re-delivered payload has the same delivery_id.
Key Design Decisions
- Kafka for event ingestion — decouples event sources from the fanout service
- WebhookDelivery record per delivery — full audit trail, enables re-delivery
- Exponential backoff with cap — respects failing endpoints, reduces thundering herd
- HMAC-SHA256 signature — recipient can verify authenticity without TLS inspection
- Timestamp in signature — prevents replay attacks with a 5-minute tolerance window
Stripe system design covers webhook delivery and event systems. See common questions for Stripe interview: webhook and event delivery system design.
Shopify system design covers webhook delivery for merchant integrations. Review patterns for Shopify interview: webhook and integration system design.
Atlassian system design covers webhook and event notification systems. See design patterns for Atlassian interview: webhook and event notification system design.