Webhook Subscription System Low-Level Design: Registration, Challenge Verification, Event Filtering, and SSRF Prevention

Webhook Subscription System: Low-Level Design

A webhook subscription system lets external developers register URLs to receive event notifications. Unlike the internal webhook delivery system (which fans out to preconfigured endpoints), a subscription system exposes a public API: developers register endpoints, select event types to subscribe to, verify ownership, and manage their subscriptions. This design covers the registration flow, event type filtering, secret rotation, and the self-service management API used by platform integrations.

Core Data Model

CREATE TABLE WebhookSubscription (
    subscription_id  BIGSERIAL PRIMARY KEY,
    app_id           BIGINT NOT NULL,           -- the developer application
    endpoint_url     VARCHAR(2000) NOT NULL,
    description      VARCHAR(500),
    signing_secret   VARCHAR(100) NOT NULL,     -- HMAC key for signature verification
    status           VARCHAR(20) NOT NULL DEFAULT 'pending_verification',
        -- pending_verification, active, paused, disabled
    event_types      TEXT[] NOT NULL,           -- ['payment.succeeded', 'payment.failed', '*']
    api_version      VARCHAR(20) NOT NULL DEFAULT 'v1',
    created_at       TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at       TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    last_success_at  TIMESTAMPTZ,
    consecutive_failures INT NOT NULL DEFAULT 0
);

CREATE TABLE SubscriptionVerification (
    verification_id  BIGSERIAL PRIMARY KEY,
    subscription_id  BIGINT NOT NULL REFERENCES WebhookSubscription(subscription_id) ON DELETE CASCADE,
    challenge_token  VARCHAR(64) NOT NULL UNIQUE,
    expires_at       TIMESTAMPTZ NOT NULL,
    verified_at      TIMESTAMPTZ,
    method           VARCHAR(20) NOT NULL DEFAULT 'challenge'  -- challenge, test_event
);

CREATE TABLE SubscriptionEventType (
    event_type       VARCHAR(100) PRIMARY KEY,
    description      TEXT,
    schema_url       VARCHAR(500),    -- JSON Schema for the event payload
    api_version      VARCHAR(20) NOT NULL DEFAULT 'v1'
);

CREATE INDEX ON WebhookSubscription(app_id, status);
CREATE INDEX ON WebhookSubscription(event_types) USING GIN;

Registration and Verification

import secrets, datetime, hashlib, hmac, requests

def register_subscription(app_id: int, endpoint_url: str,
                           event_types: list, description: str = None) -> dict:
    """
    Register a new webhook subscription. Returns subscription with challenge token.
    The endpoint must respond to a challenge request before receiving live events.
    """
    # Validate event types
    valid_types = {r['event_type'] for r in db.fetchall("SELECT event_type FROM SubscriptionEventType")}
    for et in event_types:
        if et != '*' and et not in valid_types:
            raise ValueError(f"Unknown event type: {et}")

    # Validate URL scheme and block SSRF targets
    _validate_url(endpoint_url)

    signing_secret = f"whsec_{secrets.token_hex(32)}"

    sub = db.fetchone("""
        INSERT INTO WebhookSubscription
            (app_id, endpoint_url, description, signing_secret, event_types)
        VALUES (%s,%s,%s,%s,%s)
        RETURNING *
    """, (app_id, endpoint_url, description, signing_secret, event_types))

    # Issue a challenge verification
    challenge = _issue_challenge(sub['subscription_id'])

    return {**sub, 'challenge_token': challenge['challenge_token'],
            'signing_secret': signing_secret}

def _issue_challenge(subscription_id: int) -> dict:
    challenge_token = secrets.token_urlsafe(32)
    expires_at = datetime.datetime.utcnow() + datetime.timedelta(hours=24)
    return db.fetchone("""
        INSERT INTO SubscriptionVerification
            (subscription_id, challenge_token, expires_at)
        VALUES (%s,%s,%s) RETURNING *
    """, (subscription_id, challenge_token, expires_at))

def verify_subscription(subscription_id: int, challenge_token: str) -> bool:
    """
    Developer hits their endpoint and echoes back the challenge.
    Called by the developer's server to prove they own the endpoint.
    """
    verification = db.fetchone("""
        SELECT * FROM SubscriptionVerification
        WHERE subscription_id=%s AND challenge_token=%s AND verified_at IS NULL
    """, (subscription_id, challenge_token))

    if not verification:
        return False
    if verification['expires_at'] < datetime.datetime.utcnow():
        return False

    db.execute("""
        UPDATE SubscriptionVerification SET verified_at=NOW() WHERE verification_id=%s
    """, (verification['verification_id'],))

    db.execute("""
        UPDATE WebhookSubscription SET status='active', updated_at=NOW()
        WHERE subscription_id=%s
    """, (subscription_id,))

    return True

def _validate_url(url: str):
    """Block SSRF: reject private IPs, metadata endpoints, non-HTTPS."""
    import ipaddress, urllib.parse
    parsed = urllib.parse.urlparse(url)
    if parsed.scheme != 'https':
        raise ValueError("Webhook endpoints must use HTTPS")
    # Resolve hostname and check for private IP ranges
    import socket
    try:
        ip = socket.gethostbyname(parsed.hostname)
        addr = ipaddress.ip_address(ip)
        if addr.is_private or addr.is_loopback or addr.is_link_local:
            raise ValueError("Webhook endpoint cannot target private IP addresses")
        if ip == '169.254.169.254':  # AWS metadata endpoint
            raise ValueError("Webhook endpoint cannot target metadata endpoints")
    except socket.gaierror:
        raise ValueError(f"Cannot resolve hostname: {parsed.hostname}")

Event Fan-Out to Matching Subscriptions

def dispatch_event(event_type: str, event_payload: dict, app_id: int = None):
    """
    Find subscriptions matching the event type and enqueue delivery jobs.
    event_types GIN index makes the array containment query fast.
    """
    query = """
        SELECT subscription_id, endpoint_url, signing_secret, api_version
        FROM WebhookSubscription
        WHERE status='active'
          AND (event_types @> ARRAY[%s] OR event_types @> ARRAY['*'])
    """
    params = [event_type]
    if app_id:
        query += " AND app_id=%s"
        params.append(app_id)

    subscriptions = db.fetchall(query, params)

    for sub in subscriptions:
        payload = _serialize_event(event_type, event_payload, sub['api_version'])
        signature = _sign_payload(payload, sub['signing_secret'])
        enqueue('deliver_webhook', {
            'subscription_id': sub['subscription_id'],
            'endpoint_url': sub['endpoint_url'],
            'payload': payload,
            'signature': signature,
        }, queue_name='webhooks')

def _sign_payload(payload: str, secret: str) -> str:
    timestamp = str(int(datetime.datetime.utcnow().timestamp()))
    signed_content = f"{timestamp}.{payload}"
    sig = hmac.new(
        secret.encode(), signed_content.encode(), hashlib.sha256
    ).hexdigest()
    return f"t={timestamp},v1={sig}"

def _serialize_event(event_type: str, payload: dict, api_version: str) -> str:
    import json, uuid
    return json.dumps({
        'id': str(uuid.uuid4()),
        'type': event_type,
        'api_version': api_version,
        'created': int(datetime.datetime.utcnow().timestamp()),
        'data': payload,
    })

Key Design Decisions

  • Challenge verification prevents misconfigured endpoints: an endpoint that doesn’t respond correctly to the challenge will never receive live events. This prevents developers from registering a URL that belongs to someone else (a form of SSRF via redirect).
  • GIN index on event_types array: the array containment operator (@>) with a GIN index makes “find all subscriptions that match event type X or ‘*'” fast even with thousands of subscriptions. Without the GIN index, this requires a full table scan on every event.
  • SSRF prevention at registration time: resolving the hostname and checking against private IP ranges at registration prevents developers from using your platform to probe your internal network. Re-validate on each delivery attempt in case DNS rebinding is used to change the resolved IP after registration.
  • Per-subscription signing secret: each subscription has its own signing secret (not one global key). If a developer’s secret is compromised, only that subscription is affected — rotating their secret doesn’t impact other developers. Secret rotation: generate a new secret, provide a 24-hour dual-validation window where both old and new secrets are accepted.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does a developer verify their webhook endpoint ownership?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The challenge verification pattern: when a developer registers a URL, the system sends a GET request to that URL with a challenge query parameter (?challenge=abc123). The developer’s server must echo back the challenge value in the response body. This proves the developer controls the server at that URL — a third party cannot register someone else’s URL because they cannot intercept the challenge request. Implementation: GET https://developer.example.com/webhook?challenge=abc123 → response body: "abc123". The system checks the response matches the sent challenge and activates the subscription. Alternative: the system sends a test event to the endpoint, and the developer clicks "I received it" in the dashboard — weaker (no automated proof). Some systems (Stripe) skip verification entirely and rely on HMAC signature validation at the subscriber end instead.”}},{“@type”:”Question”,”name”:”What event types should a platform expose through its webhook subscription API?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Expose events at the level of granularity developers need without overwhelming the schema. Principles: (1) noun.verb naming: "payment.succeeded", "subscription.cancelled", "user.created" — follows Stripe and GitHub conventions; (2) expose lifecycle events for every primary entity: created, updated, deleted for core objects; (3) high-value state transitions: "payment.refunded", "order.shipped", "invoice.overdue" — not just CRUD; (4) avoid exposing internal technical events (database.row_updated) — only business-meaningful events; (5) version events explicitly: "payment.succeeded.v2" when the payload schema changes; (6) provide a wildcard ("*") for developers who want all events; (7) document the payload schema for each event type with a JSON Schema or OpenAPI spec. Start with 10–20 high-value events rather than 200 low-signal ones — developers rarely subscribe to everything.”}},{“@type”:”Question”,”name”:”How do you handle webhook endpoint SSL certificate errors?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”HTTPS is required for webhook endpoints (reject HTTP). SSL certificate validation ensures the developer’s server is who it claims to be and prevents man-in-the-middle interception of webhook payloads. Validation: verify the endpoint’s TLS certificate chain at delivery time — self-signed certificates should be rejected in production (they don’t prove identity). Configuration: use requests.get(url, verify=True) in Python (default) — this validates against the system’s trusted CA bundle. For certificate errors: (1) SSL handshake failure → log the error, treat as delivery failure, retry with standard backoff; (2) expired certificate → fail delivery and email the endpoint owner: "Your webhook endpoint’s SSL certificate expired on DATE — please renew it." (3) hostname mismatch → fail permanently, not with retry — a hostname mismatch suggests DNS hijacking or misconfiguration. Track SSL errors separately from HTTP errors in the delivery failure reason.”}},{“@type”:”Question”,”name”:”How do you test your webhook subscription implementation as a developer?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Developers need to test webhook integration without triggering real production events. Provide: (1) test event API: POST /webhooks/{subscription_id}/test_event with {"event_type":"payment.succeeded","is_test":true} — sends a synthetic event with a test payload to the subscription’s endpoint. Developers can verify their endpoint receives and processes the event. (2) Webhook CLI tool: a tool that opens a tunnel to localhost (like ngrok) and proxies webhooks to a local development server. Developers run `stripe listen –forward-to localhost:3000/webhook`. (3) Replay API: POST /webhooks/deliveries/{delivery_id}/replay — re-send a specific historical delivery, useful when debugging a failed handler. (4) Delivery log: show the last 100 deliveries with request payload, response status, response body, and timing — developers can see exactly what their endpoint received and responded with.”}},{“@type”:”Question”,”name”:”How do you handle a webhook endpoint that consistently returns 200 but is not actually processing events?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A "black hole" endpoint acknowledges every webhook with a 200 response but discards the payload — perhaps it’s a misconfigured load balancer default response. The system marks all deliveries as "succeeded" but the developer’s integration is silently broken. Detection: (1) add a response validation step — require endpoints to echo back the event ID in a custom response header (X-Processed-Event-ID: {event_id}); if the header is missing, treat as a soft failure; (2) track "acknowledged without processing" via a separate integration health API — developers can confirm event processing by calling POST /webhooks/events/{event_id}/acknowledge; (3) for critical integrations: implement idempotency confirmation — after delivering a payment.succeeded event, query the developer’s system via a separate API to confirm the payment record exists. In practice, most platforms rely on developers to self-report processing issues and provide rich delivery logs for debugging rather than implementing response validation.”}}]}

Webhook subscription and developer platform event design is discussed in Stripe system design interview questions.

Webhook subscription and merchant integration design is covered in Shopify system design interview preparation.

Webhook subscription and integration platform design is discussed in Atlassian system design interview guide.

Scroll to Top