What is idempotency and why does it matter in distributed systems?

An operation is idempotent if performing it multiple times produces the same result as performing it once. In distributed systems, network failures cause clients to retry requests without knowing if the original succeeded. Without idempotency, a charge might be processed twice, an email sent twice, or a record created twice. Idempotency allows safe retries: the client can retry indefinitely knowing the server will deduplicate and return the same result. This is foundational to building reliable systems that handle the inevitable partial failures of distributed networks.

How do you implement an idempotency key pattern?

The client generates a unique UUID per logical operation (not per retry) and sends it in an Idempotency-Key header. The server checks if the key exists in storage (Redis or DB). If found, it returns the cached response without re-executing the operation. If not found, it executes the operation and atomically stores the key with the result. Critically, the check-and-store must be atomic — use Redis SET key value NX EX ttl (set if not exists with TTL) or a database unique constraint with serializable isolation. Without atomicity, two concurrent requests with the same key can both execute.

What is the difference between at-least-once and exactly-once message delivery?

At-least-once: the messaging system guarantees the message is delivered but may deliver it more than once on retransmission. Consumers must be idempotent to handle duplicates. Exactly-once: the system guarantees each message is processed exactly once, even on retries. Kafka implements exactly-once via idempotent producers (dedup by producer_id+sequence) and transactional produce+commit-offset atomicity. Exactly-once is much harder to achieve and has higher overhead; at-least-once with idempotent consumers is the preferred pattern for most systems.

How does Stripe implement idempotency for payment APIs?

Stripe accepts an Idempotency-Key header (client-provided UUID) on all POST requests. The key is scoped to the API key (per-customer namespace) and has a 24-hour TTL. When a key is first seen, Stripe executes the request and stores the full response (including HTTP status code). On subsequent requests with the same key, Stripe returns the stored response without re-processing. If a request is in-flight, concurrent requests with the same key receive a 409 Conflict until the first completes. Stripe returns the original response even if the customer's resource state has changed.

How do you make a database write idempotent?

Several approaches: (1) Natural idempotency: use INSERT ... ON CONFLICT DO NOTHING with a unique constraint on the business key (e.g., payment_reference). Re-inserting the same payment silently does nothing. (2) Upsert: INSERT ... ON CONFLICT DO UPDATE SET ... — updates the record if already exists (idempotent only if the update is idempotent, e.g., setting fields to fixed values, not incrementing). (3) Conditional update: UPDATE ... WHERE status = 'pending' — only updates if in expected state; no-op otherwise. (4) Check-then-act: SELECT then decide — safe only with SERIALIZABLE isolation or SELECT FOR UPDATE to prevent concurrent races.

Low Level Design: Idempotency Patterns

⏱ 8 min read

Idempotency is the property that an operation produces the same result whether it is applied once or multiple times. In distributed systems, network failures, timeouts, and retries are facts of life — idempotency is the mechanism that makes retries safe.

Why Idempotency Matters

When a client sends a request and doesn’t receive a response, it doesn’t know if the request was received and processed, received and failed, or never received. The safe behavior is to retry. Without idempotency on the server side, that retry can cause: a second charge to a customer’s card, a second email confirmation sent, a second order created, or a second row inserted into a database.

These are not theoretical concerns. Network partitions happen. Load balancers have timeouts. Servers crash mid-request. Any production system that handles payments, sends communications, or modifies financial records must treat idempotency as a first-class design requirement, not an afterthought.

The Idempotency Key Pattern

The idempotency key pattern solves retries at the API level. The client generates a UUID for each logical operation (not for each HTTP request) and sends it in an Idempotency-Key header. The server uses this key to deduplicate: if it has seen this key before, it returns the stored response instead of re-executing the operation.

The key is generated once per logical operation by the client. If the same charge needs to be retried three times due to timeouts, all three requests carry the same idempotency key. The server executes the charge once and returns the same response on subsequent attempts. From the client’s perspective, retrying is safe because the server guarantees exactly-once execution.

Idempotency Key Storage

The idempotency key store must be fast (reads happen on every request) and persistent (keys must survive server restarts). Two common approaches:

Redis with TTL: store idempotency:{key} → serialized response. Set TTL to match the business retention period (Stripe uses 24 hours). Fast reads, automatic cleanup. Risk: Redis eviction under memory pressure could cause a key to disappear, allowing a duplicate execution.
Database table: columns: idempotency_key (unique), status (pending/complete), response_body, created_at, expires_at. Durable, transactional, can JOIN with business data. Slower than Redis but appropriate for financial operations where durability is non-negotiable.

For financial operations, the database approach is safer. The key can be stored in the same transaction as the business operation, giving atomic check-and-execute semantics without a separate coordination mechanism.

Two-Phase Idempotency

The execution flow for an idempotent endpoint:

Phase 1 — Check: look up the idempotency key in the store. If found and status is complete, return the cached response immediately. If found and status is pending (another request is currently executing), return 409 Conflict or wait with a short poll.
Phase 2 — Execute: mark the key as pending (to block concurrent duplicates), execute the business operation, store the key with status complete and the full response, return the response.

The pending state handles the concurrent duplicate case: two retries arrive simultaneously before either completes. Without the pending marker, both could pass the "key not found" check and both execute the operation. The pending marker serializes them — the second request sees pending and waits or returns a 409.

Atomic Check-and-Store with Redis

Race conditions in the check-then-execute pattern are real. Between checking that a key doesn’t exist and inserting it, another concurrent request can do the same check and also find nothing.

Solution: use Redis SET key value NX EX ttl. The NX flag means "set only if not exists." This is atomic — Redis executes it as a single operation with no interleaving. If the key already exists, the command returns nil. If the key was set successfully, it returns OK. This eliminates the race condition entirely. Only one concurrent request wins the SET; others see nil and return the cached response.

Database Natural Idempotency

For insert operations, the database can enforce idempotency directly via unique constraints on business keys:

INSERT ... ON CONFLICT DO NOTHING — inserts the row if the unique key doesn’t exist, silently does nothing if it does. Returns successfully either way.
INSERT ... ON CONFLICT DO UPDATE SET ... (UPSERT) — inserts or updates. Idempotent as long as the update is also idempotent (setting values, not incrementing).

Example: a payments table with a unique index on payment_reference. Retrying the insert with the same reference hits the conflict clause and returns without creating a duplicate. The business key (payment_reference) is the natural idempotency key, and the database enforces uniqueness without any application-level coordination.

Conditional Writes

Conditional writes make updates idempotent by only writing when the record is in the expected state. In DynamoDB: ConditionExpression: version_id = :expected_version. The write succeeds only if the current version matches what the client read. If another request already updated the record (incrementing the version), the condition fails and returns an error.

This is optimistic locking applied to idempotency. The client includes the version it read alongside its update. The first write wins; subsequent retries with the same old version fail the condition check — the client can then re-read and decide whether a retry is needed or the operation already succeeded.

PUT vs POST Idempotency

HTTP specifies that PUT is idempotent: PUT /orders/123 with a full order body sets the order to that exact state. Calling it ten times produces the same result as calling it once. Design resource-modifying APIs as PUT where possible.

POST is not idempotent by spec — POST /orders creates a new order each time. When you must use POST for creation (because the server assigns the ID), require an Idempotency-Key header and implement the key pattern described above. Make the header required, not optional, for any POST endpoint that creates or charges.

Stripe’s Idempotency Implementation

Stripe’s public implementation details are a useful reference. Key characteristics: 24-hour TTL on idempotency keys. Keys are namespaced per API key — the same idempotency key from two different API keys is treated as two separate operations. The full response is stored, including HTTP status code — if the original request returned a 400 validation error, the retry also gets a 400, not a re-execution.

If a request is still in-flight when a duplicate arrives, Stripe returns a 409 Conflict immediately rather than waiting. The client is expected to back off and retry. This prevents request queues from building up under retry storms.

Idempotency Across Distributed Transactions

For operations that span a database write and an event publication (e.g., charge a payment and publish a PaymentCharged event), idempotency must cover both sides. The Outbox Pattern solves this: write the business record and an outbox event row in the same database transaction. Mark the idempotency key as complete in the same transaction.

A separate relay process reads the outbox and publishes events to the message broker, deleting the outbox row on successful publish. If the relay crashes and redelivers, the event broker’s own idempotency (deduplication keys in Kafka, SQS message deduplication IDs) handles the duplicate on the consumer side. The result is exactly-once execution of both the DB write and the event, even across failures, without distributed transactions.