Idempotency Key Service Low-Level Design: Deduplication, Atomic Claims, and Concurrent Retries

An idempotency key service ensures that retried API requests do not cause duplicate side effects — a critical requirement for payment APIs, order placement, and any operation that mutates state. Calling POST /charge twice with the same idempotency key must produce exactly one charge. Core challenges: storing request-response pairs efficiently, handling concurrent duplicate requests, expiring old keys to reclaim storage, and returning the original response even when the underlying state has changed.

Core Data Model

CREATE TABLE IdempotencyRecord (
    idempotency_key  TEXT NOT NULL,
    endpoint         TEXT NOT NULL,          -- '/v1/charges'
    request_hash     CHAR(64) NOT NULL,       -- SHA-256 of canonical request body
    user_id          UUID,                    -- scope keys per user/tenant
    status           TEXT NOT NULL DEFAULT 'processing',  -- 'processing','done','failed'
    response_code    SMALLINT,               -- HTTP status code
    response_body    JSONB,                  -- original response payload
    created_at       TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    completed_at     TIMESTAMPTZ,
    expires_at       TIMESTAMPTZ NOT NULL DEFAULT NOW() + interval '24 hours',
    PRIMARY KEY (idempotency_key, endpoint, user_id)
);
CREATE INDEX idx_idem_expires ON IdempotencyRecord (expires_at);

Idempotency Middleware

import hashlib, json, time
import psycopg2

def idempotency_middleware(conn, idempotency_key: str, endpoint: str, user_id: str,
                            request_body: dict, handler_fn):
    """
    Wraps any mutating API handler with idempotency.
    Returns cached response if key already processed.
    Raises on request body mismatch (different request, same key — client error).
    """
    request_hash = hashlib.sha256(
        json.dumps(request_body, sort_keys=True).encode()
    ).hexdigest()

    # Step 1: Try to insert a 'processing' row (atomic claim)
    with conn.cursor() as cur:
        cur.execute("""
            INSERT INTO IdempotencyRecord
            (idempotency_key, endpoint, request_hash, user_id)
            VALUES (%s, %s, %s, %s)
            ON CONFLICT (idempotency_key, endpoint, user_id) DO NOTHING
            RETURNING status, request_hash, response_code, response_body
        """, (idempotency_key, endpoint, request_hash, user_id))
        result = cur.fetchone()

    if result is None:
        # Conflict: key already exists — fetch the existing record
        with conn.cursor() as cur:
            cur.execute("""
                SELECT status, request_hash, response_code, response_body
                FROM IdempotencyRecord
                WHERE idempotency_key=%s AND endpoint=%s AND user_id=%s
            """, (idempotency_key, endpoint, user_id))
            result = cur.fetchone()

    if result:
        status, stored_hash, resp_code, resp_body = result

        # Mismatch: same key, different request body — not allowed
        if stored_hash != request_hash:
            raise ValueError(
                "Idempotency key already used with a different request body. "
                "Use a new idempotency key for a different request."
            )

        if status == 'done':
            # Return cached response
            return {"from_cache": True, "status_code": resp_code, "body": resp_body}

        if status == 'processing':
            # In-flight duplicate — wait briefly and check again
            for _ in range(10):
                time.sleep(0.5)
                with conn.cursor() as cur:
                    cur.execute(
                        "SELECT status, response_code, response_body FROM IdempotencyRecord "
                        "WHERE idempotency_key=%s AND endpoint=%s AND user_id=%s",
                        (idempotency_key, endpoint, user_id)
                    )
                    r = cur.fetchone()
                if r and r[0] == 'done':
                    return {"from_cache": True, "status_code": r[1], "body": r[2]}
            raise TimeoutError("Concurrent request still processing — retry later")

    conn.commit()  # commit the 'processing' insert

    # Step 2: Execute the actual handler
    try:
        response_code, response_body = handler_fn(conn, request_body)
        _complete_idempotency_record(conn, idempotency_key, endpoint, user_id,
                                      'done', response_code, response_body)
        return {"from_cache": False, "status_code": response_code, "body": response_body}
    except Exception as e:
        _complete_idempotency_record(conn, idempotency_key, endpoint, user_id,
                                      'failed', 500, {"error": str(e)})
        raise

def _complete_idempotency_record(conn, key, endpoint, user_id, status, code, body):
    with conn.cursor() as cur:
        cur.execute("""
            UPDATE IdempotencyRecord
            SET status=%s, response_code=%s, response_body=%s, completed_at=NOW()
            WHERE idempotency_key=%s AND endpoint=%s AND user_id=%s
        """, (status, code, json.dumps(body), key, endpoint, user_id))
    conn.commit()

Key Generation Guidance for API Clients

"""
Idempotency key design principles (for documentation):

Good keys (unique per intent):
  - UUID v4 generated per request attempt: "550e8400-e29b-41d4-a716-446655440000"
  - Content-based: SHA-256 of (user_id + order_id + amount): stable across retries

Bad keys (cause problems):
  - Timestamp-based: different on each retry → different keys → duplicate charges
  - Sequential integers: predictable → attackers can guess and replay
  - Session ID alone: multiple different charges per session → collisions

Stripe's convention: clients generate a UUID per logical operation,
store it alongside the pending request, and reuse it on all retries.
The key is tied to the operation, not the attempt.
"""

import uuid

def generate_idempotency_key(user_id: str, operation: str, resource_id: str) -> str:
    """
    Generate a deterministic idempotency key for a specific operation.
    Same inputs → same key → safe for retries.
    """
    seed = f"{user_id}:{operation}:{resource_id}"
    # UUID v5 is deterministic (namespace + seed → fixed UUID)
    return str(uuid.uuid5(uuid.NAMESPACE_URL, seed))

Expiry and Cleanup

def cleanup_expired_records(conn) -> int:
    """
    Nightly job: delete IdempotencyRecord rows past their expires_at.
    Bounded batch to avoid long lock times.
    """
    deleted = 0
    while True:
        with conn.cursor() as cur:
            cur.execute("""
                DELETE FROM IdempotencyRecord
                WHERE idempotency_key IN (
                    SELECT idempotency_key FROM IdempotencyRecord
                    WHERE expires_at < NOW()
                    LIMIT 1000
                )
                RETURNING 1
            """)
            batch = cur.rowcount
        conn.commit()
        deleted += batch
        if batch < 1000:
            break
    return deleted

Key Interview Points

  • Insert-first atomic claim: The INSERT … ON CONFLICT DO NOTHING pattern atomically claims the idempotency key. Only one concurrent request succeeds in inserting — others get the conflict signal and read the existing record. This prevents duplicate execution even under concurrent retries. Without this atomic claim, two threads can both read “no record found” and both execute the handler.
  • Request body hash check: The same idempotency key with a different request body is a client bug — reject with 422. This prevents accidental key reuse across different operations. The hash is SHA-256 of the canonicalized request body (sorted keys for JSON determinism). Never allow different operations to share a key — a key is tied to one logical operation.
  • Failed request idempotency: If the handler throws an exception, store status=’failed’ in the idempotency record. On retry: should the client get the failure response (replay the failure) or re-execute? Stripe: replays the failure for deterministic behavior. Custom logic: allow retry if the failure was transient (network timeout) but replay for permanent failures (invalid card). Store failure_retryable boolean to distinguish.
  • Scope by user: Include user_id in the primary key. Without scoping, user A could use user B’s idempotency key to see or interfere with their responses. Primary key: (idempotency_key, endpoint, user_id) ensures keys are isolated per user. API gateway should inject user_id from the auth token before the middleware runs.
  • Idempotency key expiry: Stripe uses a 24-hour window — after 24 hours, a key can be reused. For payments, 24 hours is sufficient — retries beyond a day indicate a bug, not a network retry. For longer operations (multi-day provisioning), extend expires_at. The expiry also bounds storage growth: at 1M requests/day × 1KB/record × 1 day = 1GB — manageable with daily cleanup.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”Why is the INSERT … ON CONFLICT DO NOTHING pattern better than SELECT-then-INSERT for idempotency?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”SELECT-then-INSERT (check if key exists, then insert if not) has a race condition: two concurrent requests both execute the SELECT and find no record, both proceed to INSERT, and both execute the operation — causing a duplicate charge or duplicate order. INSERT … ON CONFLICT DO NOTHING is atomic at the database level: the unique index on idempotency_key ensures only one of two concurrent inserts succeeds. The losing insert gets 0 rows affected (or a "do nothing" signal), and the handler knows to return the cached response. This atomicity is guaranteed by the database engine — no application-level locking or distributed coordination needed. This is why Stripe, Square, and all major payment APIs use database-backed idempotency rather than in-memory checks.”}},{“@type”:”Question”,”name”:”How does an idempotency system handle a request that is still in-flight?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A client retries a request while the first attempt is still processing (slow handler, network timeout). The retry finds the idempotency record with status=’processing’. Two options: (1) return HTTP 409 Conflict with Retry-After header — the client retries after the suggested delay; (2) wait briefly for the in-flight request to complete (poll the DB with exponential backoff for up to N seconds). Option 2 is better for APIs with fast handlers (<5 seconds): the retry returns the actual response as if it processed it, not a "try again" error. Stripe uses this approach: it polls for up to 2 minutes before returning 503 for truly stuck requests. Implement with: LOCK the row and check status in a retry loop (10 attempts × 500ms = 5 seconds max wait).”}},{“@type”:”Question”,”name”:”How should API clients generate idempotency keys that survive retries?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The key must be the same on every retry of the same logical operation. Bad: use a UUID generated at request time — each retry generates a new UUID → different keys → duplicate charges. Bad: use a timestamp — different across retries. Good: generate a UUID per user intent before the first attempt and store it. "I want to charge order abc123 for $50" → key = uuid4(); store with the order. Retry the charge using the same stored UUID. UUID v5 (deterministic from namespace + seed) is useful when you can derive the key from stable inputs: uuid5(NAMESPACE_URL, f"charge:{order_id}:v1"). The ":v1" suffix means the intent version can be incremented if the user explicitly wants a new charge (e.g., retrying after fixing a card), while using the same order_id.”}},{“@type”:”Question”,”name”:”When should you store a ‘failed’ idempotency response versus allow retry?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Permanent failures (card declined with code=card_declined, insufficient funds, invalid card number) should be stored as status=’failed’ and replayed on retry — the client gets the original failure response without re-executing. Re-executing would charge the card again, and it would fail again anyway. Transient failures (network timeout to payment processor, DB connection pool exhausted) should NOT be stored — delete the idempotency record on transient failure so the retry re-executes the handler. Without this distinction, a network blip during the Stripe API call would permanently mark the payment as "failed" even though no charge was attempted. Detection: classify exceptions — CardError → permanent (store failure); ConnectionError → transient (clear record for retry).”}},{“@type”:”Question”,”name”:”How do you scope idempotency keys to prevent cross-user replay attacks?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Without user scoping, user A could submit a request with user B’s idempotency key and receive user B’s response — potentially seeing sensitive payment or order data. Scope the primary key to include user_id: PRIMARY KEY (idempotency_key, endpoint, user_id). The API gateway injects user_id from the verified auth token before the idempotency middleware runs. User A’s key "my-payment-123" and user B’s key "my-payment-123" are completely separate records with separate responses. Additionally, validate that the request body hash matches the stored hash for the same user — this prevents a user from reusing their own key with a different request body (e.g., using the same key to charge a different amount).”}}]}

Idempotency key and payment deduplication design is discussed in Stripe system design interview questions.

Idempotency key and financial API design is covered in Coinbase system design interview preparation.

Idempotency key and distributed API reliability design is discussed in Amazon system design interview guide.

Scroll to Top