Low Level Design: Returns and Refunds Service

Overview

A returns and refunds service handles the reverse logistics of e-commerce: accepting return requests, authorizing them against policy, routing physical goods back to the warehouse, restocking eligible inventory, calculating refund amounts, issuing credits or reversals through the payment gateway, and reconciling the financial ledger. The service must be correct (no double refunds, no lost credits), auditable (every dollar traceable), and fraud-resistant (not exploitable by bad actors filing false claims).

Data Model

-- One return request per return event (a customer may return multiple times per order)
CREATE TABLE return_requests (
    return_id       BIGINT         PRIMARY KEY AUTO_INCREMENT,
    order_id        BIGINT         NOT NULL,
    customer_id     BIGINT         NOT NULL,
    status          ENUM('requested','authorized','in_transit','received','inspected','refund_pending','refunded','rejected','closed') NOT NULL DEFAULT 'requested',
    reason_code     VARCHAR(64)    NOT NULL,
    reason_note     TEXT,
    rma_number      VARCHAR(64)    UNIQUE,          -- Return Merchandise Authorization
    label_url       TEXT,                           -- pre-paid shipping label
    fraud_score     TINYINT UNSIGNED,               -- 0-100, higher = more suspicious
    requested_at    DATETIME       NOT NULL DEFAULT CURRENT_TIMESTAMP,
    authorized_at   DATETIME,
    received_at     DATETIME,
    closed_at       DATETIME,
    updated_at      DATETIME       NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    INDEX idx_order      (order_id),
    INDEX idx_customer   (customer_id),
    INDEX idx_status     (status),
    INDEX idx_rma        (rma_number)
);

-- Line items being returned (subset of the original order items)
CREATE TABLE return_items (
    ri_id           BIGINT        PRIMARY KEY AUTO_INCREMENT,
    return_id       BIGINT        NOT NULL,
    order_item_id   BIGINT        NOT NULL,
    qty_returned    INT           NOT NULL,
    condition_code  ENUM('new','like_new','damaged','unsellable') DEFAULT NULL,  -- set after inspection
    restock_qty     INT           NOT NULL DEFAULT 0,
    FOREIGN KEY (return_id) REFERENCES return_requests(return_id),
    INDEX idx_return (return_id)
);

-- Financial outcome of each return
CREATE TABLE refunds (
    refund_id           BIGINT         PRIMARY KEY AUTO_INCREMENT,
    return_id           BIGINT         NOT NULL UNIQUE,   -- one refund record per return
    gross_item_amount   NUMERIC(12,2)  NOT NULL,          -- sum of returned line totals
    restocking_fee      NUMERIC(12,2)  NOT NULL DEFAULT 0,
    shipping_refund     NUMERIC(12,2)  NOT NULL DEFAULT 0,
    net_refund_amount   NUMERIC(12,2)  NOT NULL,          -- gross - fee + shipping_refund
    refund_method       ENUM('original_payment','store_credit','check') NOT NULL,
    payment_gateway_ref VARCHAR(128),                     -- gateway transaction ID
    status              ENUM('pending','processing','completed','failed') NOT NULL DEFAULT 'pending',
    issued_at           DATETIME,
    FOREIGN KEY (return_id) REFERENCES return_requests(return_id)
);

-- Append-only ledger: every financial movement
CREATE TABLE refund_ledger (
    ledger_id     BIGINT        PRIMARY KEY AUTO_INCREMENT,
    refund_id     BIGINT        NOT NULL,
    entry_type    ENUM('debit','credit') NOT NULL,
    amount        NUMERIC(12,2) NOT NULL,
    description   VARCHAR(255)  NOT NULL,
    created_at    DATETIME      NOT NULL DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (refund_id) REFERENCES refunds(refund_id)
);

-- Fraud signal history per customer
CREATE TABLE return_fraud_signals (
    signal_id    BIGINT       PRIMARY KEY AUTO_INCREMENT,
    customer_id  BIGINT       NOT NULL,
    return_id    BIGINT       NOT NULL,
    signal_type  VARCHAR(64)  NOT NULL,   -- e.g., high_return_rate, wardrobing, fake_damage
    score_delta  TINYINT      NOT NULL,
    created_at   DATETIME     NOT NULL DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_customer (customer_id)
);

Return Authorization Workflow

  1. Eligibility check: verify the order is in a returnable state (fulfilled or partially_fulfilled), the return window has not expired (configurable per product category), and the requested quantity does not exceed the shipped quantity minus any prior returns for the same item.
  2. Fraud scoring: compute a fraud_score (0-100) based on signals: customer lifetime return rate, days since delivery, number of open returns, item category risk, and whether a physical label is actually used. Score >= 80 triggers manual review queue; the return is held at requested status.
  3. Authorization: if score < 80 and policy checks pass, set status to authorized, generate an RMA number, and issue a pre-paid return shipping label via the carrier API. Write all of this in one DB transaction.

Restocking Logic

When the warehouse receives the physical return it performs an inspection and records condition_code on each return_item. The restocking decision table:

condition_code = new        --> restock_qty = qty_returned  (full restock)
condition_code = like_new   --> restock_qty = qty_returned  (restock, may discount)
condition_code = damaged     --> restock_qty = 0            (quarantine, file vendor claim)
condition_code = unsellable  --> restock_qty = 0            (dispose or liquidate)

For each row where restock_qty > 0, the service calls the IMS receiveStock endpoint to increment on_hand. This is done item by item so partial failures do not block the entire return.

Refund Calculation

gross_item_amount  = SUM(order_item.unit_price * return_item.qty_returned)
restocking_fee     = gross_item_amount * policy.restocking_fee_pct   -- e.g. 0.15 if item was opened
shipping_refund    = original_order.shipping_cost  IF all items returned ELSE 0
net_refund_amount  = gross_item_amount - restocking_fee + shipping_refund

Partial refunds arise when only some items in an order are returned, when a restocking fee applies, or when the return is a goodwill credit (e.g., item arrived damaged: full refund including original shipping).

Partial Refund Flow

  1. Compute net_refund_amount.
  2. Write a refunds row with status pending and the computed amounts (single transaction also inserting ledger credit entry).
  3. Call payment gateway refund API with the original payment_gateway_ref from the order and the partial amount. Most gateways support partial refunds natively; the OMS stored the charge ID at payment capture time.
  4. On gateway success: update refunds.status to completed, store the gateway transaction reference, insert a ledger debit entry to balance the credit, and transition the return to refunded.
  5. On gateway failure: mark refunds.status as failed, enqueue a retry job with exponential backoff, and alert the ops team after N consecutive failures.

Fraud Detection

Fraud signals are evaluated at authorization time and again at inspection time:

  • High return rate: customer's return count / order count over 90 days > threshold (e.g., 40%).
  • Wardrobing: high-value item returned just before the window closes, condition is "like new", pattern matches previous returns of the same SKU by this customer.
  • Label not scanned: RMA issued but carrier never records a scan within N days — likely a claim without actual shipment.
  • Damage inconsistency: customer claims damage but inspection finds item in new condition — potential false claim.

Each detected signal inserts a row into return_fraud_signals. Cumulative score across recent returns feeds into the next authorization decision. Customers above a lifetime threshold are flagged for manual review on all future returns.

Key Design Decisions and Trade-offs

  • Separate refund row from return request row: a return can exist without a refund (e.g., exchange-only returns, rejected returns). Keeping the financial record in a dedicated table with its own ledger makes accounting queries straightforward and avoids NULLs in the return table.
  • Append-only ledger: refund_ledger is never updated. Every financial event (credit for refund amount, debit after gateway confirmation, debit for restocking fee) is a new row. Running SUM over the ledger always reconciles to the current state — no need to trust a mutable balance column.
  • Fraud score at authorization, not only at receipt: early scoring allows the service to deny fraudulent requests before issuing a label and before the customer has any expectation of a refund. Later re-scoring at inspection catches fraud that can only be detected physically.
  • Store credit vs. original payment method: store credit bypasses the payment gateway entirely — it is a ledger entry in the customer wallet service. This is faster and has no gateway failure risk. Offering store credit at a slight premium (e.g., 105% of net refund) reduces gateway fees and retains revenue.

Failure Handling and Edge Cases

  • Gateway partial refund rejection: some gateways reject partial refunds if the charge is older than a threshold (e.g., 180 days). The service falls back to store credit automatically and notifies the customer.
  • Return received but no matching RMA: the WMS may receive an unlabeled or misrouted package. A manual review queue handles these; if resolved, the return is retroactively linked and the normal flow resumes.
  • Double refund prevention: the UNIQUE constraint on refunds.return_id prevents inserting two refund records for the same return. The gateway call is wrapped in an idempotency key so retrying after a timeout does not issue a second charge reversal.
  • Inspection updates after refund issued: if the inspection finds the item unsellable after the refund was already processed, no financial reversal is issued (the customer already received credit); instead, a vendor claim or write-off entry is appended to the ledger as a debit against cost of goods.
  • Return window expiry race: the authorization check and the window expiry check must be atomic. A row-level lock on the return request (SELECT FOR UPDATE) prevents a concurrent request from slipping through after another request already consumed the last valid authorization slot.

Scalability Considerations

  • Async inspection processing: warehouse inspection events arrive as webhooks; they are written to a queue and processed asynchronously so the WMS HTTP call returns immediately. The return status transitions happen in a consumer process.
  • Ledger partitioning: partition refund_ledger by month. Accounting reconciliation queries always target a bounded date range so partition pruning keeps them fast even as the table grows to hundreds of millions of rows.
  • Fraud model refresh: fraud scoring uses pre-computed customer-level aggregates (return rate, lifetime score) stored in a materialized table refreshed hourly by a batch job. Real-time scoring reads the materialized table rather than aggregating raw event data on every authorization request.
  • Return volume spikes: post-holiday return spikes can be 10x normal volume. The authorization path is stateless and horizontally scalable; the bottleneck is the payment gateway. Implement a token bucket rate limiter per gateway credential and queue excess refund processing jobs for off-peak execution.

Summary

A returns and refunds service is a correctness-critical financial system layered on top of reverse logistics. The data model separates commercial returns (return_requests, return_items) from financial outcomes (refunds, refund_ledger) to keep each concern independently queryable and auditable. The authorization workflow gates on eligibility, fraud score, and policy rules before any physical or financial commitment is made. Restocking feeds directly into inventory via the IMS. Refund calculation handles partial returns, restocking fees, and shipping credits via a deterministic formula. An append-only ledger ensures that every dollar in and out is traceable, and idempotency keys at the gateway boundary prevent double refunds under retry conditions.

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Shopify Interview Guide

See also: Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering

Scroll to Top