Payment Refund System Low-Level Design: Double-Refund Prevention, Processor Integration, and Async Processing

Payment Refund System: Low-Level Design

A payment refund system reverses a completed charge — partially or fully — and ensures the customer’s account is credited while the merchant’s revenue ledger is accurately adjusted. The critical design challenges are preventing double refunds, handling partial refunds correctly (multiple partial refunds on one payment), coordinating with payment processors (Stripe, Braintree), and providing a complete audit trail for disputes and chargebacks.

Core Data Model

CREATE TABLE Payment (
    payment_id         BIGSERIAL PRIMARY KEY,
    order_id           BIGINT NOT NULL,
    customer_id        BIGINT NOT NULL,
    amount_cents       INT NOT NULL,
    currency           CHAR(3) NOT NULL DEFAULT 'USD',
    status             VARCHAR(20) NOT NULL,  -- succeeded, refunded, partially_refunded
    processor          VARCHAR(30) NOT NULL,  -- stripe, braintree, paypal
    processor_charge_id VARCHAR(200) NOT NULL UNIQUE,  -- e.g. ch_3NxvXY2eZvKYlo2C
    captured_at        TIMESTAMPTZ NOT NULL,
    refundable_cents   INT NOT NULL,          -- amount - sum(refunds); decremented on each refund
    created_at         TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE Refund (
    refund_id          BIGSERIAL PRIMARY KEY,
    payment_id         BIGINT NOT NULL REFERENCES Payment(payment_id),
    customer_id        BIGINT NOT NULL,
    requested_by       BIGINT NOT NULL,       -- user_id of agent or customer
    amount_cents       INT NOT NULL,
    currency           CHAR(3) NOT NULL DEFAULT 'USD',
    reason             VARCHAR(50) NOT NULL,  -- customer_request, fraud, defective, duplicate
    status             VARCHAR(20) NOT NULL DEFAULT 'pending',
        -- pending, processing, succeeded, failed
    processor_refund_id VARCHAR(200) UNIQUE,  -- stripe: re_3Nxv...; set on success
    idempotency_key    VARCHAR(200) UNIQUE NOT NULL,
    notes              TEXT,
    created_at         TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    processed_at       TIMESTAMPTZ,
    failed_reason      TEXT
);

CREATE TABLE RefundAuditEvent (
    event_id           BIGSERIAL PRIMARY KEY,
    refund_id          BIGINT NOT NULL REFERENCES Refund(refund_id),
    event_type         VARCHAR(50) NOT NULL,  -- created, processor_submitted, succeeded, failed
    actor_id           BIGINT,
    metadata           JSONB,
    created_at         TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX ON Refund(payment_id, status);
CREATE INDEX ON Payment(processor_charge_id);

Refund Processing

import uuid, stripe

stripe.api_key = STRIPE_SECRET_KEY

def request_refund(payment_id: int, amount_cents: int, reason: str,
                   requested_by: int, notes: str = None) -> int:
    """
    Initiate a refund. Validates amount against remaining refundable balance.
    Uses idempotency key to prevent duplicate refund submissions.
    Returns refund_id.
    """
    # Lock the payment row to prevent concurrent over-refund
    payment = db.fetchone("""
        SELECT * FROM Payment WHERE payment_id=%s FOR UPDATE
    """, (payment_id,))

    if not payment:
        raise PaymentNotFoundError(f"Payment {payment_id} not found")
    if payment['status'] not in ('succeeded', 'partially_refunded'):
        raise RefundNotAllowedError(f"Cannot refund payment in status: {payment['status']}")
    if amount_cents  payment['refundable_cents']:
        raise InvalidRefundAmountError(
            f"Refund amount {amount_cents} exceeds refundable balance {payment['refundable_cents']}"
        )

    idem_key = f"refund:{payment_id}:{amount_cents}:{requested_by}:{str(uuid.uuid4())[:8]}"

    refund_id = db.fetchone("""
        INSERT INTO Refund
            (payment_id, customer_id, requested_by, amount_cents, currency,
             reason, idempotency_key, notes)
        VALUES (%s,%s,%s,%s,%s,%s,%s,%s)
        RETURNING refund_id
    """, (payment_id, payment['customer_id'], requested_by, amount_cents,
          payment['currency'], reason, idem_key, notes))['refund_id']

    # Reserve the refund amount immediately (prevent concurrent over-refund)
    db.execute("""
        UPDATE Payment
        SET refundable_cents = refundable_cents - %s,
            status = CASE WHEN refundable_cents - %s = 0 THEN 'refunded'
                          ELSE 'partially_refunded' END
        WHERE payment_id=%s
    """, (amount_cents, amount_cents, payment_id))

    _audit(refund_id, 'created', requested_by, {'amount_cents': amount_cents})

    # Process asynchronously via job queue for resilience
    enqueue('process_refund', {'refund_id': refund_id}, queue_name='payments',
            idempotency_key=f"process:{refund_id}")

    return refund_id

def process_refund(refund_id: int):
    """
    Job handler: submit refund to payment processor.
    Idempotent: if processor_refund_id already set, the refund was already submitted.
    """
    refund = db.fetchone("""
        SELECT r.*, p.processor, p.processor_charge_id
        FROM Refund r JOIN Payment p USING(payment_id)
        WHERE r.refund_id=%s
    """, (refund_id,))

    if refund['status'] == 'succeeded':
        return  # Already processed (idempotency)

    db.execute(
        "UPDATE Refund SET status='processing' WHERE refund_id=%s", (refund_id,)
    )
    _audit(refund_id, 'processor_submitted', None, {})

    try:
        if refund['processor'] == 'stripe':
            result = stripe.Refund.create(
                charge=refund['processor_charge_id'],
                amount=refund['amount_cents'],
                reason=_map_reason_to_stripe(refund['reason']),
                idempotency_key=refund['idempotency_key'],
            )
            processor_refund_id = result['id']
        else:
            raise NotImplementedError(f"Processor {refund['processor']} not supported")

        db.execute("""
            UPDATE Refund
            SET status='succeeded', processor_refund_id=%s, processed_at=NOW()
            WHERE refund_id=%s
        """, (processor_refund_id, refund_id))
        _audit(refund_id, 'succeeded', None, {'processor_refund_id': processor_refund_id})

    except stripe.error.StripeError as e:
        _handle_processor_failure(refund_id, refund['payment_id'], refund['amount_cents'], str(e))
        raise  # Re-raise so job queue retries

def _handle_processor_failure(refund_id: int, payment_id: int,
                               amount_cents: int, error: str):
    """On permanent failure, restore the refundable balance."""
    db.execute("""
        UPDATE Refund SET status='failed', failed_reason=%s WHERE refund_id=%s
    """, (error[:500], refund_id))

    # Restore refundable amount so the agent can retry
    db.execute("""
        UPDATE Payment
        SET refundable_cents = refundable_cents + %s,
            status = CASE WHEN status='refunded' THEN 'succeeded' ELSE status END
        WHERE payment_id=%s
    """, (amount_cents, payment_id))
    _audit(refund_id, 'failed', None, {'error': error})

def _map_reason_to_stripe(reason: str) -> str:
    return {'fraud': 'fraudulent', 'duplicate': 'duplicate', 'customer_request': 'requested_by_customer'}.get(reason, 'requested_by_customer')

def _audit(refund_id, event_type, actor_id, metadata):
    import json
    db.execute("""
        INSERT INTO RefundAuditEvent (refund_id, event_type, actor_id, metadata)
        VALUES (%s,%s,%s,%s)
    """, (refund_id, event_type, actor_id, json.dumps(metadata)))

Key Design Decisions

  • FOR UPDATE + immediate refundable_cents decrement prevents double refunds: the payment row is locked before the refund row is created. Two concurrent refund requests for the same payment: the first decrements refundable_cents; the second reads the updated value and fails the amount validation. No distributed locks or application-level checks needed — the database enforces it atomically.
  • Reserve refundable amount before calling the processor: decrement refundable_cents at request time, not after the processor confirms. If the processor call is made first and the server crashes before recording the response, a retry would submit a second refund. By reserving first and restoring on permanent failure, there is no window for accidental double submission.
  • Processor idempotency key: Stripe accepts an Idempotency-Key header — if the same key is resubmitted (due to retry), Stripe returns the same response without processing a second refund. The refund’s idempotency_key column maps directly to Stripe’s header.
  • Async processing via job queue: processor calls are slow (200–800ms) and can fail. Running them in a job queue with retry logic ensures refunds are eventually processed without blocking the user-facing API and without losing the request if the server crashes.

Payment refund and transaction reversal system design is discussed in Stripe system design interview questions.

Payment refund and merchant order management design is covered in Shopify system design interview preparation.

Payment refund and booking cancellation system design is discussed in Airbnb system design interview guide.

Scroll to Top