Payment State Machine
A payment must move through well-defined states. Never allow arbitrary transitions or skipped states.
INITIATED -> PROCESSING -> COMPLETED
-> FAILED
-> CANCELLED
INITIATED: payment record created, idempotency key stored
PROCESSING: submitted to payment processor, awaiting confirmation
COMPLETED: processor confirmed success, funds moved
FAILED: processor declined or timeout after retries
CANCELLED: user or system cancelled before processing completed
Each state transition is a separate database write with a timestamp. The payment_events audit table records every transition. You can reconstruct the full lifecycle of any payment from this log. Never update a payment record in place without appending an event.
Idempotency Keys
The client generates a UUID before initiating a payment and sends it in a header (e.g., Idempotency-Key: uuid). The server stores the key alongside the response.
# Server logic on payment request
def create_payment(request):
key = request.headers['Idempotency-Key']
# Check Redis first (hot path)
cached = redis.get(f"idem:{key}")
if cached:
return deserialize(cached) # return exact same response
# Check DB (slower, handles Redis eviction)
existing = db.query("SELECT * FROM payments WHERE idempotency_key = %s", key)
if existing:
response = build_response(existing)
redis.setex(f"idem:{key}", 86400, serialize(response))
return response
# New payment - proceed
payment = process_new_payment(request, key)
response = build_response(payment)
redis.setex(f"idem:{key}", 86400, serialize(response))
return response
Idempotency keys prevent double-charging on network retries. TTL is 24 hours in Redis; also persisted in the DB permanently for audit purposes. The key must be unique per payment attempt, not per user or session.
Database Schema
CREATE TABLE payments (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
idempotency_key VARCHAR(64) UNIQUE NOT NULL,
user_id BIGINT NOT NULL,
amount BIGINT NOT NULL, -- in minor units (cents)
currency CHAR(3) NOT NULL, -- ISO 4217
status VARCHAR(20) NOT NULL,
processor_ref VARCHAR(100), -- payment processor transaction ID
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
INDEX idx_user_id (user_id),
INDEX idx_status_created (status, created_at)
);
CREATE TABLE payment_events (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
payment_id BIGINT NOT NULL REFERENCES payments(id),
from_status VARCHAR(20),
to_status VARCHAR(20) NOT NULL,
occurred_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
metadata JSON,
INDEX idx_payment_id (payment_id)
);
Double-Entry Accounting
Every financial movement has exactly two entries: a debit (money leaves an account) and a credit (money enters an account). The sum of all debits always equals the sum of all credits – books never go out of balance.
CREATE TABLE accounts (
id BIGINT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
balance BIGINT NOT NULL DEFAULT 0, -- in cents
account_type VARCHAR(20) NOT NULL -- asset, liability, revenue
);
CREATE TABLE ledger_entries (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
payment_id BIGINT NOT NULL,
debit_account BIGINT NOT NULL REFERENCES accounts(id),
credit_account BIGINT NOT NULL REFERENCES accounts(id),
amount BIGINT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Example: user pays $10. Debit user’s wallet account (balance -1000 cents), credit merchant’s account (balance +1000 cents). To verify integrity: SELECT SUM(amount) FROM ledger_entries WHERE debit_account = X must equal account balance change.
Distributed Transaction Problem
The fundamental challenge: you need to call the payment processor API (external, over the network) AND update your database. What happens if your server crashes between these two steps?
- Crash after calling processor but before DB update: processor charged the user, your DB shows no payment. User is charged, you have no record.
- Crash after DB update but before calling processor: your DB shows completed payment, processor never ran it. User not charged, you think they were.
Two-phase commit across systems is impractical. The solution is the outbox pattern.
Outbox Pattern
Write to your DB and an outbox table in a single local transaction. A background worker then calls the external processor.
CREATE TABLE outbox (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
payment_id BIGINT NOT NULL,
payload JSON NOT NULL,
status VARCHAR(20) DEFAULT 'PENDING',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
processed_at TIMESTAMP NULL
);
# Application code - single DB transaction
def initiate_payment(payment_data):
with db.transaction():
payment = db.insert("INSERT INTO payments ...", payment_data)
db.insert("INSERT INTO outbox (payment_id, payload) VALUES (%s, %s)",
payment.id, serialize(payment_data))
# Transaction commits - both rows exist or neither does
# Background worker - runs every few seconds
def process_outbox():
rows = db.query("SELECT * FROM outbox WHERE status = 'PENDING' LIMIT 100")
for row in rows:
result = payment_processor.charge(row.payload) # external call
db.execute("UPDATE outbox SET status = 'PROCESSED', processed_at = NOW() WHERE id = %s", row.id)
db.execute("UPDATE payments SET status = %s, processor_ref = %s WHERE id = %s",
result.status, result.ref, row.payment_id)
If the worker crashes after calling the processor but before marking the outbox entry processed, it will retry. The processor call must be idempotent (use the payment ID as the processor’s idempotency key). This guarantees at-least-once delivery with idempotent processing = effectively exactly-once.
Webhook Processing
Payment processors send asynchronous webhooks when payment status changes. Handle them defensively.
def handle_webhook(request):
# 1. Validate HMAC signature
sig = request.headers['X-Webhook-Signature']
expected = hmac.new(WEBHOOK_SECRET, request.body, sha256).hexdigest()
if not hmac.compare_digest(sig, expected):
return 401
event = parse(request.body)
# 2. Idempotent processing using event ID
if redis.exists(f"webhook:{event.id}"):
return 200 # already processed
# 3. Process
update_payment_status(event.payment_ref, event.status)
trigger_downstream_events(event)
# 4. Mark processed
redis.setex(f"webhook:{event.id}", 86400, "1")
return 200
Always respond 200 quickly to webhooks. Do heavy processing asynchronously via a queue if needed. Processors will retry on non-200 responses – your handler must be idempotent.
Reconciliation
End-of-day reconciliation catches discrepancies between your system and the payment processor.
- Payment processor provides a settlement report: list of all transactions, amounts, and statuses for the day
- Compare against your
paymentstable for the same period - Flag discrepancies: payments in processor report not in your DB (you missed a webhook), payments in your DB not in processor report (processor rejected silently), amount mismatches
- Route flagged records to a manual review queue
- Run daily, alert on reconciliation failure rate above threshold
Fraud Detection Hooks
Fraud checks run alongside (or just before) payment processing without blocking the happy path.
# Velocity check using Redis sliding window
def check_velocity(user_id, amount):
key = f"velocity:{user_id}"
now = time.time()
window = 3600 # 1 hour
pipe = redis.pipeline()
pipe.zremrangebyscore(key, 0, now - window) # remove old entries
pipe.zadd(key, {str(now): now})
pipe.zcard(key)
pipe.expire(key, window)
results = pipe.execute()
tx_count = results[2]
return tx_count > MAX_TRANSACTIONS_PER_HOUR
# Rule engine checks (synchronous for high-risk, async otherwise)
def fraud_check(payment):
signals = []
if payment.amount > HIGH_VALUE_THRESHOLD:
signals.append('high_value')
if is_new_country(payment.user_id, payment.ip_country):
signals.append('new_country')
if check_velocity(payment.user_id, payment.amount):
signals.append('velocity_exceeded')
risk_score = score_signals(signals)
if risk_score >= BLOCK_THRESHOLD:
return 'BLOCK' # synchronous block
elif risk_score >= REVIEW_THRESHOLD:
enqueue_for_review(payment) # async, do not block
return 'ALLOW'
return 'ALLOW'
Only block synchronously for the highest-risk signals (velocity exceeded, known fraud patterns). Everything else is async review. Blocking legitimate payments is expensive – false positive rate matters as much as catch rate.
Currency Handling
Never store monetary amounts as floats. Floating-point arithmetic is imprecise and introduces rounding errors that compound over millions of transactions.
- Store all amounts as integers in the smallest currency unit: $10.99 = 1099 cents, JPY 1000 = 1000 yen (JPY has no subunit)
- Include a currency code (ISO 4217) alongside every amount
- Store the number of decimal places per currency in a reference table (USD = 2, JPY = 0, KWD = 3)
- Only convert to decimal for display, never for storage or calculation
- Use a Decimal/BigDecimal type if intermediate calculations require decimals, then convert back to integer minor units before storing
Stripe system design interviews deeply cover payment architecture. See design patterns for Stripe interview: payment system design.
Coinbase system design covers transactional payment systems. See design patterns for Coinbase interview: payment and transaction system design.
Robinhood system design covers trade and payment processing. See design patterns for Robinhood interview: financial transaction system design.