Problem Overview
An inventory reservation system allows multiple systems (e-commerce checkout, warehouse management, wholesale orders) to place temporary holds on stock before committing to a purchase. The design must handle concurrent reservations, prevent double-allocation, automatically reclaim expired holds, and remain consistent across distributed services.
Requirements and Constraints
Functional Requirements
- Create a time-bounded soft lock (reservation) on N units of a SKU
- Confirm a reservation, decrementing available inventory permanently
- Release a reservation early (cancellation) or let it expire automatically
- Query available inventory for a SKU (total minus confirmed minus active reservations)
- Support partial reservation fulfillment (reserve what is available, reject if insufficient)
Non-Functional Requirements
- Reservation creation under 50 ms p99
- Available inventory query under 10 ms p99
- Expired reservations reclaimed within 60 seconds of expiry
- Tolerate one node failure without losing reservation state
- Consistent reads: a SKU that reaches 0 available must never accept a new reservation
Core Data Model
CREATE TABLE inventory (
sku_id BIGINT PRIMARY KEY,
total_units INT NOT NULL,
confirmed_sold INT NOT NULL DEFAULT 0,
-- available = total_units - confirmed_sold - SUM(active reservation quantities)
-- computed at query time; not stored to avoid staleness
version BIGINT NOT NULL DEFAULT 0
);
CREATE TABLE reservations (
reservation_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
sku_id BIGINT NOT NULL,
owner_id VARCHAR(128) NOT NULL, -- order_id, session_id, etc.
quantity INT NOT NULL,
status ENUM('active','confirmed','expired','cancelled') NOT NULL DEFAULT 'active',
created_at TIMESTAMP(3) NOT NULL DEFAULT NOW(3),
expires_at TIMESTAMP(3) NOT NULL,
confirmed_at TIMESTAMP(3),
INDEX idx_sku_status (sku_id, status),
INDEX idx_expiry (expires_at, status) -- for expiry sweeper
);
Available inventory is computed as a derived value to avoid consistency issues with a cached counter. The query is: total_units - confirmed_sold - (SELECT COALESCE(SUM(quantity),0) FROM reservations WHERE sku_id=? AND status='active' AND expires_at > NOW()).
Key Algorithms and Logic
Reservation Creation with Serializable Isolation
To prevent two concurrent reservations from both seeing sufficient inventory and both succeeding when only one can be accommodated:
- Begin a transaction with SERIALIZABLE isolation (or use SELECT FOR UPDATE on the inventory row).
- Compute available: total_units – confirmed_sold – SUM(active reservation quantities) FOR UPDATE.
- If available < requested quantity: rollback and return “insufficient inventory.”
- INSERT the reservation row with expires_at = NOW() + TTL.
- Commit.
SELECT FOR UPDATE on the inventory row serializes concurrent reservation attempts for the same SKU. For high-throughput SKUs, this can bottleneck. An alternative is to use an advisory lock (PostgreSQL pg_advisory_xact_lock(sku_id)) which provides the same serialization with lower overhead than a full row lock.
Distributed Soft Lock with Redis
For SKUs with extremely high reservation rates, use Redis as the concurrency gate:
- Maintain
sku:{sku_id}:availableas a Redis integer initialized from the DB. - Reservation: Lua script atomically reads the counter; if >= requested quantity, decrements and returns success.
- On success, write the reservation to the DB asynchronously. If the DB write fails, compensate by incrementing the Redis counter back.
- This is an optimistic fast path. Conflicts are resolved by treating the DB as source of truth and periodically syncing the Redis counter from the DB.
Expiry Cleanup — Sweeper Pattern
A background sweeper process runs every 30 seconds:
UPDATE reservations
SET status = 'expired'
WHERE status = 'active'
AND expires_at < NOW()
LIMIT 500; -- process in batches to avoid long-running transactions
After each batch, the sweeper publishes an inventory-updated event so any Redis counter can be reconciled. The LIMIT 500 prevents the sweeper from holding locks too long. The idx_expiry index makes this query efficient even with millions of reservation rows.
Reservation Extension
A client can extend a reservation if the TTL is about to expire (e.g., user is still in checkout). Extension is an UPDATE that sets expires_at = NOW() + TTL where status = 'active'. Extensions are capped at a maximum total reservation duration (e.g., 30 minutes) to prevent inventory being held indefinitely.
Idempotency
Reservation creation accepts an optional idempotency_key from the caller. A unique index on (owner_id, sku_id) or on idempotency_key ensures that a retry of the same request returns the existing reservation rather than creating a duplicate hold.
API Design
POST /v1/inventory/{sku_id}/reserve
Body: { "owner_id": "order-8821", "quantity": 3, "ttl_seconds": 300, "idempotency_key": "uuid" }
Response 200: { "reservation_id": "uuid", "expires_at": "...", "quantity": 3 }
Response 409: { "reason": "insufficient_inventory", "available": 1 }
POST /v1/reservations/{reservation_id}/confirm
Response 200: { "reservation_id": "uuid", "status": "confirmed" }
DELETE /v1/reservations/{reservation_id}
Response 204: No Content
GET /v1/inventory/{sku_id}/available
Response: { "sku_id": "...", "available": 47, "active_reservations": 12 }
Scalability Considerations
- Read replicas for availability queries: Route GET available queries to a read replica. Acceptable stale reads (up to 100 ms lag) do not affect correctness because the reservation creation path runs on primary with serializable isolation.
- Sharding: Shard reservations by sku_id to distribute lock contention across DB nodes. Hot SKUs (viral products) can be isolated to dedicated shards.
- Partitioned expiry table: Partition the reservations table by expires_at month. Sweeper queries hit only the current and prior partition, keeping the expiry index small and fast.
- Event sourcing for auditability: Append every state transition (created, extended, confirmed, expired, cancelled) to an immutable reservation_events table. This provides a full audit trail without complicating the main reservations table.
Failure Modes and Mitigations
- Sweeper lag: If the sweeper falls behind, available inventory query returns lower-than-actual values (false negatives, not false positives). Acceptable because it is conservative — no oversell. Alert if sweeper lag exceeds 120 seconds.
- DB primary failover: Reservations written before failover are durable (WAL-replicated). In-flight reservation requests during failover fail with a 503 and must be retried by the caller. TTL on reservations means no permanently stuck state.
- Redis desync: On startup or after a Redis flush, reload the available counter from the DB. The reload query is the same derived computation. Use a startup lock to prevent reservation requests during the reload window.
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Shopify Interview Guide
See also: Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems
See also: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering