Idempotency Service Low-Level Design: Idempotency Keys, Request Deduplication, and Response Caching

What Is an Idempotency Service?

An idempotency service ensures that duplicate requests — caused by client retries, network timeouts, or message redelivery — produce the same result as the first request without executing the operation twice. It is essential for payment processing, order creation, and any mutation that must not be duplicated.

Idempotency Key Generation

The client generates a unique idempotency key per logical operation and includes it in every retry of the same request:

POST /payments
Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000
Content-Type: application/json

{"amount": 9999, "currency": "USD", "card_token": "tok_abc"}

The key should be a UUID v4 generated client-side before the first attempt. The client uses the same key for all retries of the same logical payment. Different payments get different keys.

Server-Side Deduplication with Redis

On receiving a request, the server checks whether it has seen this idempotency key before:

-- Pseudocode
key = "idempotency:" + idempotency_key

existing = redis.GET(key)
if existing:
    return deserialize(existing)   -- return cached response

# Process the request
result = process_payment(request)

# Store response
redis.SET(key, serialize(result), EX=86400)   -- 24h TTL
return result

The response stored in Redis includes both the HTTP status code and the response body, so retries receive byte-identical responses.

Handling Concurrent Duplicate Requests

A subtle problem: two retries arrive simultaneously, both miss the Redis cache, and both attempt to process the payment. The solution is a distributed lock using Redis SET NX:

lock_key = "idempotency:lock:" + idempotency_key

-- Try to acquire lock (atomic)
acquired = redis.SET(lock_key, "1", NX=True, EX=30)

if not acquired:
    # Another request is processing — wait and poll
    for attempt in range(10):
        sleep(0.5)
        existing = redis.GET(key)
        if existing:
            return deserialize(existing)
    return 503 "Processing in progress, retry"

try:
    existing = redis.GET(key)
    if existing:
        return deserialize(existing)

    result = process_payment(request)
    redis.SET(key, serialize(result), EX=86400)
    return result
finally:
    redis.DEL(lock_key)

The lock ensures only one request processes the operation. Concurrent duplicates wait and return the cached result once it is available.

Response Caching: What to Store

Store the complete response envelope:

{
  "status_code": 200,
  "headers": {"Content-Type": "application/json"},
  "body": {"payment_id": "pay_xyz", "status": "completed", "amount": 9999}
}

Include the status code — if the original request returned a 422 validation error, retries should receive the same 422, not a fresh attempt. Do not re-process failed requests with idempotency keys; cache the error response too.

TTL Selection

  • Payments: 24 hours — retry windows are typically within minutes to hours
  • Order creation: 24-48 hours
  • Search / read operations: No idempotency key needed (GET is naturally idempotent)
  • Webhook deliveries: 72 hours to cover delivery retry windows

The TTL defines how long after the first request a client can retry and receive the cached response. After TTL expiry, the same key used again will trigger a fresh execution — document this clearly in your API.

Database-Level Idempotency

For operations that must survive Redis failure, persist the idempotency key to the database with a unique constraint:

CREATE TABLE idempotency_records (
    idempotency_key VARCHAR(64) PRIMARY KEY,
    response_status INT          NOT NULL,
    response_body   JSONB        NOT NULL,
    created_at      TIMESTAMPTZ  NOT NULL DEFAULT now(),
    expires_at      TIMESTAMPTZ  NOT NULL
);

On duplicate key violation (unique constraint), return the stored response. This is slower than Redis but durable — useful for financial systems where losing idempotency records is unacceptable.

Request Fingerprinting

Some APIs use request fingerprinting as a fallback when clients do not provide an idempotency key: hash the combination of (user_id, endpoint, request_body). This is fragile — two different-but-identical requests from the same user would collide — so it should supplement, not replace, explicit idempotency keys.

Idempotency vs. Delivery Semantics

  • At-most-once: Fire and forget. May lose requests. No retries.
  • At-least-once: Retry on failure. May execute multiple times. Requires idempotent consumers.
  • Exactly-once: Not truly achievable in distributed systems without coordination overhead. Approximated by at-least-once delivery with idempotent processing.

An idempotency service converts at-least-once delivery into effectively exactly-once outcomes at the application layer.

Key Expiry and Client Retry Windows

Clients must not reuse an idempotency key after the TTL has expired. Document: “Idempotency keys expire after 24 hours. Do not retry a request with the same key after this window.” After expiry, a new key must be generated for a new attempt. This prevents stale cached responses from being returned for genuinely new requests.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How is an idempotency key stored and checked?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The idempotency key is stored in a persistent store (commonly Redis or a relational database) alongside the serialized response and a status flag (pending, complete). On each incoming request the service performs an atomic read-then-lock — typically a Redis SET NX or a SELECT FOR UPDATE — before executing business logic, returning the cached response immediately if the key already exists.”
}
},
{
“@type”: “Question”,
“name”: “How are concurrent duplicate requests handled?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “When two requests with the same idempotency key arrive simultaneously, only the first acquires a distributed lock (e.g., a Redis SETNX-based mutex or a database advisory lock); subsequent duplicates either wait and then receive the cached result, or receive a 409 Conflict so the caller retries after a short delay. This prevents double-execution without requiring global serialization of all requests.”
}
},
{
“@type”: “Question”,
“name”: “What TTL should be set on idempotency keys?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “TTL is chosen to outlast the client's retry window plus network jitter — commonly 24 hours for payment APIs, though 7 days is used when receipts must be deduplicated across day-boundary retries. Keys should be expired lazily via the store's built-in TTL mechanism rather than a background sweep to avoid lock contention.”
}
},
{
“@type”: “Question”,
“name”: “How does idempotency differ from at-most-once delivery?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “At-most-once delivery drops a message rather than retrying, so the operation executes zero or one times with no guarantee of success. Idempotency allows unlimited retries while still guaranteeing the side-effect happens exactly once, making it an at-least-once delivery mechanism whose duplicate executions are safe because the service detects and collapses them.”
}
}
]
}

See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Shopify Interview Guide

Scroll to Top