System Design: Payment Processing Platform – Authorization, Settlement, and Fraud Detection (2025)

Scope and Requirements

A payment processing platform handles money movement between buyers and sellers through external payment processors (Stripe, Adyen, Braintree). Getting this wrong means lost money, duplicate charges, or fraud. This design covers the full lifecycle.

Functional Requirements

Authorize, capture, and settle payments
Support refunds and partial refunds
Detect and block fraudulent transactions
Reconcile settled funds with processor statements

Non-Functional Requirements

Exactly-once processing – no double charges
99.99% availability for authorization path
Audit trail for every state transition
PCI DSS compliance (never store raw card data)

Payment Lifecycle: Authorization, Capture, Settlement

The three-phase model is fundamental. Interviewers expect you to distinguish these phases clearly.

Phase 1: Authorization

Authorization reserves funds on the cardholder’s account but does not move money. The bank verifies the card is valid, has sufficient funds, and approves a hold. Response: auth code + authorization ID. This is reversible.

Phase 2: Capture

Capture instructs the processor to actually collect the authorized funds. Must happen within the authorization window (typically 7 days for card-not-present). Can capture partial amount (e.g., hotel authorizes $500, captures actual room cost $420). After capture, funds are in transit.

Phase 3: Settlement

Settlement is the actual transfer of funds from issuing bank to acquiring bank, minus processor fees. Happens in batch, typically T+1 or T+2. Processor sends settlement files that you must reconcile against your captured transactions.

State Machine

CREATED -> AUTHORIZING -> AUTHORIZED -> CAPTURING -> CAPTURED -> SETTLING -> SETTLED
                |                          |
           FAILED (auth)              FAILED (capture)
                                          |
                                     REFUNDING -> REFUNDED

Idempotency Keys – The Most Critical Concept

Network failures are inevitable. Without idempotency, a retry after a timeout could charge the customer twice. Idempotency keys solve this – they guarantee that retrying a request produces the same result as the first call.

Implementation Pattern

def process_payment(idempotency_key, amount, payment_method_id, user_id):
    # 1. Check if we have already processed this key
    existing = db.query(
        "SELECT id, status, response FROM payment_requests WHERE idempotency_key = %s",
        idempotency_key
    )
    if existing:
        # Return the stored response - do NOT reprocess
        return existing.response

    # 2. Claim the key atomically (unique constraint prevents race)
    try:
        db.execute(
            "INSERT INTO payment_requests (idempotency_key, user_id, amount, status) VALUES (%s, %s, %s, 'PROCESSING')",
            (idempotency_key, user_id, amount)
        )
    except UniqueConstraintViolation:
        # Another request with same key is in-flight, wait and return its result
        return wait_and_fetch(idempotency_key)

    # 3. Call external processor
    result = payment_processor.authorize(amount, payment_method_id)

    # 4. Store result atomically with the idempotency record
    db.execute(
        "UPDATE payment_requests SET status = %s, response = %s WHERE idempotency_key = %s",
        (result.status, json.dumps(result), idempotency_key)
    )
    return result

Key Rules for Idempotency Keys

Client generates the key (UUID v4), sends it with every request
Keys are scoped to a user or merchant – key “abc123” for user A does not conflict with key “abc123” for user B
Store key with TTL (30 days typical) – expired keys can be re-used
The stored response must be identical to the original – same HTTP status, same body
Never re-run business logic when serving a cached idempotency response

Double-Spend Prevention

Beyond idempotency keys, you need defense-in-depth against double charging.

Database Unique Constraints

CREATE TABLE payments (
    id          BIGINT PRIMARY KEY,
    order_id    BIGINT NOT NULL,
    status      VARCHAR(20) NOT NULL,
    amount      DECIMAL(10,2) NOT NULL,
    -- Prevent two successful charges for same order
    UNIQUE INDEX idx_order_success (order_id, status)
        WHERE status IN ('AUTHORIZED', 'CAPTURED', 'SETTLED')
);

Optimistic Locking for State Transitions

def capture_payment(payment_id, expected_version):
    # Read current state
    payment = db.get_payment(payment_id)
    assert payment.status == 'AUTHORIZED', "Can only capture authorized payments"

    # Update with version check - fails if another process changed state
    rows_updated = db.execute(
        "UPDATE payments SET status='CAPTURING', version=version+1 WHERE id=%s AND version=%s AND status='AUTHORIZED'",
        (payment_id, expected_version)
    )
    if rows_updated == 0:
        raise ConcurrentModificationError("Payment state changed - retry")

    # Proceed to call processor only after DB lock secured
    result = processor.capture(payment.processor_auth_id)
    db.execute(
        "UPDATE payments SET status=%s, version=version+1 WHERE id=%s",
        (result.status, payment_id)
    )

Retry Logic with Exponential Backoff

External processor calls (Stripe, Adyen) can fail transiently. You must retry with backoff, but only for idempotent operations or after securing an idempotency key.

import time
import random

def call_processor_with_retry(fn, idempotency_key, max_attempts=4):
    base_delay = 0.5  # 500ms
    max_delay = 30.0

    for attempt in range(max_attempts):
        try:
            return fn(idempotency_key=idempotency_key)
        except ProcessorNetworkError as e:
            if attempt == max_attempts - 1:
                raise
            # Exponential backoff with jitter
            delay = min(base_delay * (2 ** attempt), max_delay)
            jitter = random.uniform(0, delay * 0.1)
            time.sleep(delay + jitter)
        except ProcessorDeclineError:
            # Hard decline - do not retry
            raise
        except ProcessorTimeoutError:
            # Timeout: payment may have succeeded - check status before retry
            status = processor.check_status(idempotency_key)
            if status.succeeded:
                return status
            # Did not succeed, safe to retry with same idempotency key
            continue

Critical: What to Retry vs What Not To

Safe to retry: network errors, 5xx from processor (with same idempotency key)
Never retry: hard declines (insufficient funds, stolen card), invalid card data
Check before retry: timeouts – the operation may have succeeded

Fraud Detection

Rule-Based Layer (Fast, Synchronous)

Rule-based checks run in the authorization path – must complete in under 50ms.

FRAUD_RULES = [
    # Velocity checks
    {"rule": "max_transactions_per_hour", "threshold": 10, "action": "BLOCK"},
    {"rule": "max_amount_per_day", "threshold": 5000, "action": "BLOCK"},
    {"rule": "card_used_more_than_3_countries_24h", "action": "REVIEW"},
    # Geo anomaly
    {"rule": "transaction_country_mismatch_billing_country", "action": "REVIEW"},
    {"rule": "impossible_travel",  # purchase in NYC, 10 min later in London
     "threshold_km_per_hour": 900, "action": "BLOCK"},
    # Account signals
    {"rule": "account_age_less_than_24h", "action": "REVIEW"},
    {"rule": "first_purchase_above_threshold", "threshold": 500, "action": "REVIEW"},
]

def run_rule_engine(transaction, user_context) -> FraudDecision:
    for rule in FRAUD_RULES:
        if evaluate_rule(rule, transaction, user_context):
            if rule['action'] == 'BLOCK':
                return FraudDecision.BLOCK
            if rule['action'] == 'REVIEW':
                return FraudDecision.REVIEW
    return FraudDecision.ALLOW

ML Scoring Pipeline (Asynchronous)

ML models provide a fraud probability score (0.0 – 1.0) but may take 100-500ms. Run asynchronously for most transactions, synchronously only for high-value amounts.

Features: transaction amount, merchant category, time of day, device fingerprint, velocity signals, user historical patterns
Score thresholds: < 0.3 allow, 0.3-0.7 additional verification (3DS), > 0.7 block
Model: gradient boosting (XGBoost/LightGBM) for tabular features, updated weekly on labeled chargebacks
Feedback loop: chargebacks label transactions as fraud post-hoc, retrain on labeled data

Reconciliation

Processors send daily settlement files. Your records and theirs must match to the cent – discrepancies indicate lost transactions, processor errors, or fraud.

def reconcile_daily_settlement(processor_file_date):
    # Load processor settlement file (CSV/SFTP)
    processor_txns = load_settlement_file(processor_file_date)
    # Load our captured transactions for the same period
    our_txns = db.query(
        "SELECT processor_charge_id, amount, status FROM payments WHERE captured_at::date = %s",
        processor_file_date
    )

    processor_map = {t.charge_id: t for t in processor_txns}
    our_map = {t.processor_charge_id: t for t in our_txns}

    discrepancies = []
    for charge_id, our_txn in our_map.items():
        if charge_id not in processor_map:
            discrepancies.append({"type": "MISSING_FROM_PROCESSOR", "id": charge_id})
        elif processor_map[charge_id].amount != our_txn.amount:
            discrepancies.append({"type": "AMOUNT_MISMATCH", "id": charge_id,
                                  "ours": our_txn.amount,
                                  "theirs": processor_map[charge_id].amount})
    for charge_id in processor_map:
        if charge_id not in our_map:
            discrepancies.append({"type": "UNEXPECTED_IN_PROCESSOR", "id": charge_id})

    alert_finance_team(discrepancies)
    return discrepancies

Architecture Diagram (Key Components)

Client
  |
  v
Payment API Service
  |-- Idempotency Layer (Redis + DB)
  |-- Fraud Engine (rules sync, ML async)
  |-- Payment State Machine (Postgres)
  |-- Processor Gateway (Stripe/Adyen SDK)
  |
  v
Message Queue (Kafka)
  |-- Settlement Consumer -> Reconciliation DB
  |-- Fraud Event Consumer -> ML Feature Store
  |-- Audit Event Consumer -> Audit Log (append-only)

Interview Talking Points

Lead with idempotency – it is the single most important concept in payment systems
Distinguish the three phases; know that authorization != money movement
Mention PCI DSS – you never store raw card data, you tokenize immediately
Reconciliation is often forgotten – bring it up proactively
For fraud: rule engine is fast/synchronous, ML is async/eventual – know the tradeoff
Exponential backoff with jitter prevents thundering herd on processor retries

Frequently Asked Questions

What is an idempotency key and why is it critical for payment systems?

An idempotency key is a unique identifier (typically a UUID) that the client generates and sends with each payment request. The server stores the key and its result. On retry, instead of re-processing the payment, it returns the stored result. This prevents double charges when network failures cause the client to retry a request that may have already succeeded on the server. Without idempotency keys, any network timeout in a payment system risks charging the customer twice.

What is the difference between authorization and capture in payment processing?

Authorization reserves funds on the cardholder’s account – it verifies the card is valid and the funds are available, but does not move money. Capture is the instruction to actually collect those reserved funds. Authorization can be reversed without cost; capture initiates the settlement process. Hotels and car rentals commonly authorize at booking and capture the final amount at checkout. Some merchants authorize and capture in a single step (auth-capture), while others separate them for flexibility.

How do you prevent double charges in a distributed payment system?

Use multiple layers: (1) Idempotency keys stored in the database with a unique constraint – the first request claims the key, retries get the cached result. (2) Database unique constraints on (order_id, status) for successful payment states – the DB rejects a second successful charge for the same order. (3) Optimistic locking for state transitions – update with a version check so only one concurrent process can advance the payment state. Defense in depth is required because any single layer can fail.

How should you retry failed calls to an external payment processor?

Use exponential backoff with jitter: start at 500ms, double each retry, cap at 30 seconds, add 10% random jitter to prevent thundering herd. Always use the same idempotency key on retries so the processor treats them as the same request. Never retry hard declines (insufficient funds, stolen card – error code 4xx permanent). For timeouts, check the transaction status before retrying – the operation may have succeeded. Limit to 3-4 total attempts then fail and alert.

What is payment reconciliation and why does it matter?

Reconciliation is the daily process of matching your internal transaction records against the settlement files sent by the payment processor. Discrepancies reveal lost transactions, processor errors, or fraud. You compare each captured transaction in your DB against the processor’s file by charge ID and amount. Missing entries, amount mismatches, or unexpected processor entries all require investigation. In regulated industries, reconciliation is a compliance requirement – unreconciled discrepancies must be resolved and reported.