Idempotency Keys — Low-Level Design
Idempotency ensures that retrying an operation produces the same result as executing it once. This is critical for payment APIs, order creation, and any mutation that must not be applied twice. Stripe, Braintree, and all major payment processors require idempotency keys on charge APIs.
Core Data Model
IdempotencyRecord
idempotency_key TEXT PK -- client-provided, UUID recommended
user_id BIGINT NOT NULL -- scoped per user (key collision across users is a bug)
request_path TEXT NOT NULL -- '/v1/charges'
request_hash TEXT NOT NULL -- SHA-256 of request body
response_status INT
response_body TEXT
result_id BIGINT -- ID of the created resource (e.g., charge_id)
created_at TIMESTAMPTZ NOT NULL
expires_at TIMESTAMPTZ NOT NULL -- TTL: 24 hours is typical
locked_until TIMESTAMPTZ -- set while request is in-flight
Request Handling Flow
def handle_request(idempotency_key, user_id, path, body):
body_hash = sha256(body)
# Step 1: Try to claim the key (atomic upsert)
existing = db.execute("""
INSERT INTO IdempotencyRecord
(idempotency_key, user_id, request_path, request_hash,
locked_until, created_at, expires_at)
VALUES
(%(key)s, %(uid)s, %(path)s, %(hash)s,
NOW() + INTERVAL '30 seconds', NOW(), NOW() + INTERVAL '24 hours')
ON CONFLICT (idempotency_key) DO UPDATE
SET locked_until = CASE
WHEN IdempotencyRecord.response_status IS NOT NULL THEN IdempotencyRecord.locked_until
WHEN IdempotencyRecord.locked_until > NOW() THEN IdempotencyRecord.locked_until
ELSE NOW() + INTERVAL '30 seconds'
END
RETURNING *
""", {'key': idempotency_key, 'uid': user_id, 'path': path, 'hash': body_hash})
# Step 2: Replay completed response
if existing.response_status is not None:
if existing.request_hash != body_hash:
raise ConflictError('Idempotency key reused with different request body')
return Response(existing.response_status, existing.response_body)
# Step 3: Request is in-flight (another worker is processing it)
if existing.locked_until > now() and existing.response_status is None:
raise RetryAfterError('Request in progress, retry in a moment')
# Step 4: Execute the operation
try:
result = execute_operation(body)
db.execute("""
UPDATE IdempotencyRecord
SET response_status=%(status)s, response_body=%(body)s,
result_id=%(rid)s, locked_until=NULL
WHERE idempotency_key=%(key)s
""", {'status': 200, 'body': json.dumps(result), 'rid': result.id,
'key': idempotency_key})
return Response(200, result)
except Exception as e:
# Do NOT store failed responses — allow retry
db.execute("""
UPDATE IdempotencyRecord SET locked_until=NULL
WHERE idempotency_key=%(key)s
""", {'key': idempotency_key})
raise
Idempotency for Payment Charges
-- Client generates key before sending (UUID v4)
POST /v1/charges
Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000
{
"amount": 2000,
"currency": "usd",
"payment_method": "pm_xxx"
}
-- Server behavior:
-- First call: creates charge, stores response, returns 200
-- Second call (same key): returns stored response, never charges again
-- Second call (different body): returns 409 Conflict
-- Key scoping: always scope to (user_id, idempotency_key)
-- Two different users can use the same UUID key without collision
Idempotency vs At-Least-Once Delivery
-- Pattern: consumer makes operation idempotent, not the queue
-- Queue delivers messages at-least-once
-- Consumer uses the message ID as the idempotency key
def process_message(message):
idempotency_key = f'msg-{message.id}'
if ProcessedMessage.exists(key=idempotency_key):
return # Already handled
# Process exactly once
with db.transaction():
execute_business_logic(message.body)
ProcessedMessage.create(
key=idempotency_key,
processed_at=now()
)
-- Use INSERT ON CONFLICT DO NOTHING for atomic check-and-create
INSERT INTO ProcessedMessage (key, processed_at)
VALUES (%(key)s, NOW())
ON CONFLICT (key) DO NOTHING
-- Returns 0 rows affected if already processed
-- Returns 1 row if first time
Handling the “in-flight” Problem
Two simultaneous requests with the same key need coordination:
Scenario:
T=0: Client sends request A (key=X). Worker 1 starts processing.
T=1: Client times out and resends (key=X). Worker 2 picks it up.
T=2: Worker 1 and Worker 2 both try to execute the operation.
Solution: locked_until timestamp
Worker 1: INSERT → sets locked_until = NOW() + 30s
Worker 2: sees locked_until > NOW() and response_status IS NULL
→ returns 409 "request in progress, retry after {locked_until}"
Worker 1: completes → sets response_status=200, locked_until=NULL
Worker 2 (after retry): sees response_status=200 → replays response
Key Interview Points
- Scope keys to user + endpoint: A key from user A should never match a key from user B. Always include user context in the lookup.
- Do not store failure responses: If the operation fails, clear the lock and allow retries. Idempotency is for successful operations — storing a 500 response would prevent a valid retry from ever succeeding.
- Key expiry: 24-hour TTL is the Stripe standard. Keys must expire so the table does not grow unbounded. Clean up with a background job or partitioned table.
- Client responsibility: The client must generate the key before the first attempt and reuse the exact same key on all retries. The server must never generate idempotency keys — that defeats the purpose.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What is an idempotency key and why is it the client’s responsibility?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”An idempotency key is a unique identifier the client generates before making a request. It must be generated before the first attempt so the same key can be reused on all retries. If the server generated the key, it would be different on each request — defeating the purpose. The client uses a UUID v4 (or similar) and includes it in the Idempotency-Key header. The server stores the key with the response on first execution; on subsequent requests with the same key, it replays the stored response without re-executing the operation.”}},{“@type”:”Question”,”name”:”What happens if the server receives two simultaneous requests with the same idempotency key?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use a locked_until timestamp on the IdempotencyRecord. The first request sets locked_until = NOW() + 30 seconds. The second concurrent request sees locked_until in the future with no response stored yet, and returns 409 Conflict with a Retry-After header indicating when the lock expires. The client should retry after the lock expires. The first request completes and stores its response. When the second request retries, it finds the stored response and replays it. This prevents two workers from executing the same operation simultaneously.”}},{“@type”:”Question”,”name”:”Should you store failed responses in the idempotency record?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”No. Only store successful responses (2xx). If the operation fails with a transient error (network timeout, 5xx), clear the in-flight lock and allow the client to retry — the operation was not completed, so a retry is correct. If you stored a 500 response, the client would keep receiving the cached error on every retry, even after the underlying issue was fixed. The exception is for business logic failures (e.g., 402 insufficient funds): these are permanent and should be cached, since retrying will produce the same result.”}},{“@type”:”Question”,”name”:”How do you scope idempotency keys to prevent collisions across users?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Always include user_id in the idempotency key lookup: WHERE idempotency_key=%(key)s AND user_id=%(uid)s. Two different users can independently use the same UUID string as their key without collision. Without user scoping, a key generated by User A could accidentally match a key from User B, causing User B’s request to replay User A’s response — a serious security bug. Also scope keys per API endpoint/operation so a key used for a charge does not match a key used for a refund.”}},{“@type”:”Question”,”name”:”How do you clean up expired idempotency records?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Set an expires_at timestamp on each record (24 hours is the Stripe standard). Run a background cleanup job (or a cron) that deletes records where expires_at < NOW(). Run this job during off-peak hours to avoid contention. Alternatively, partition the IdempotencyRecord table by day and drop entire partitions: CREATE TABLE idempotency_2024_01_15 PARTITION OF IdempotencyRecord FOR VALUES FROM (‘2024-01-15’) TO (‘2024-01-16’). Dropping a partition is near-instant vs. a slow DELETE of millions of rows.”}}]}
Idempotency keys and payment API design is discussed in Stripe system design interview questions.
Idempotency and distributed transaction design is covered in Amazon system design interview preparation.
Idempotency and financial transaction design is discussed in Coinbase system design interview guide.