Fulfillment Service Low-Level Design: Order Splitting, Warehouse Routing, and SLA Tracking

Requirements and Constraints

A fulfillment service sits between order management and warehouse execution. It receives orders, decides which warehouse(s) should fulfill them, splits multi-item orders across facilities when necessary, tracks SLA deadlines, and surfaces real-time fulfillment status to upstream consumers. Functional requirements: single and multi-warehouse order routing, line-level splitting, SLA-aware prioritization, and status webhooks. Non-functional: route decisions under 200ms, handle 5,000 orders per minute at peak, guarantee at-least-once status delivery, and support warehouse onboarding without code changes.

Core Data Model

fulfillment_orders(fulfillment_order_id PK, source_order_id, customer_id, status ENUM('new','routed','in_pick','packed','shipped','cancelled'), sla_deadline, created_at)
fulfillment_lines(line_id PK, fulfillment_order_id FK, sku_id, qty_ordered, qty_allocated, warehouse_id FK, shipment_id, status)
warehouses(warehouse_id PK, region, timezone, active, max_daily_capacity, cutoff_time)
warehouse_inventory_snapshot(warehouse_id, sku_id, qty_available, updated_at) — denormalized cache refreshed via inventory events
sla_rules(rule_id PK, shipping_method, carrier, cutoff_offset_minutes, priority)
fulfillment_events(event_id PK, fulfillment_order_id, event_type, payload JSONB, occurred_at) — outbox for webhooks

Order Splitting Logic

When an order arrives, the router evaluates each line against the warehouse inventory snapshot. The goal is to minimize the number of shipments (consolidation) while respecting stock availability. The algorithm is a greedy bin-packing variant:

For each warehouse, compute a coverage score: the fraction of order lines it can fully satisfy.
Select the warehouse with the highest coverage score. Assign all lines it can cover.
For remaining unsatisfied lines, repeat with remaining warehouses.
If a line cannot be satisfied by any single warehouse, flag it as a backorder or allow partial fulfillment based on order policy.

Splits are capped at a configurable max_shipments_per_order (typically 2 or 3) to control customer experience and shipping cost. Lines that would require a fourth split are held until consolidation is possible or backorder rules trigger.

SLA Tracking and Prioritization

Each order receives an SLA deadline computed from the shipping method, the warehouse cutoff time, and the current clock. A scheduled job runs every minute and queries for orders where sla_deadline is within a configurable warning window and status is not yet 'shipped'. These orders are escalated: their pick tasks are promoted to high priority in the WMS task queue, and an alert is fired if the deadline is breached.

Priority lanes are modeled as integer priority fields on pick_tasks. The worker mobile app fetches tasks ordered by priority DESC, so escalated tasks surface automatically without workflow changes. SLA breach events are written to fulfillment_events and delivered via webhook to the OMS for customer communication.

Warehouse Routing Rules

Routing decisions are governed by a rules engine backed by the sla_rules and warehouse configuration tables. Rules evaluate in priority order and short-circuit on first match. Criteria include: ship-to region (route to geographically closest warehouse to minimize transit time), inventory availability, warehouse daily capacity headroom, carrier service availability per warehouse, and hazmat restrictions per SKU. Adding a new warehouse requires inserting its configuration and capacity records — no code deployment needed.

Real-Time Status Tracking

Status changes are written atomically with event rows in the fulfillment_events outbox. A relay worker polls the outbox for undelivered events, calls registered webhook URLs, and marks events as delivered on HTTP 2xx. Failed deliveries retry with exponential backoff (1s, 5s, 30s, 5m) up to 24 hours. Consumers can also poll GET /orders/{id}/status for current state. The status endpoint aggregates across all fulfillment lines to compute a rolled-up order status.

Scalability Considerations

Snapshot staleness: Inventory snapshots may lag by seconds. Optimistic allocation proceeds against the snapshot; the WMS performs final reservation atomically and returns a failure if inventory is insufficient, triggering a re-route.
Horizontal scaling: The router service is stateless; scale horizontally behind a load balancer. Order-level idempotency keys prevent duplicate routing on retry.
Database partitioning: Partition fulfillment_orders by created_at month; archive completed orders older than 90 days to cold storage.
Webhook fan-out: Use a queue per webhook consumer rather than synchronous delivery to avoid slow consumers blocking routing throughput.

API Design

POST /fulfillment-orders — accepts source order payload, runs routing, returns fulfillment_order_id and line assignments
GET /fulfillment-orders/{id} — current status, line-level detail, estimated ship dates
POST /fulfillment-orders/{id}/cancel — cancels lines not yet picked; returns lines that cannot be cancelled
GET /warehouses/{id}/capacity — current daily throughput vs. limit for capacity planning
POST /webhooks — register a callback URL for fulfillment events
GET /sla-report — aggregated on-time vs. breached SLA counts by warehouse and shipping method

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does a fulfillment service split orders across multiple warehouses?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The system uses a multi-warehouse order splitting algorithm that scores each warehouse by proximity to the delivery address, available inventory, and current load. Orders are partitioned into shipments per warehouse using a greedy assignment that minimizes total shipping cost. A constraint solver handles edge cases where a single SKU must ship from multiple locations, generating child orders that share a parent order ID for aggregated tracking.”
}
},
{
“@type”: “Question”,
“name”: “How is SLA-aware routing implemented in a fulfillment service?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each order carries a promised delivery deadline. The router queries a carrier SLA matrix (carrier x service-tier x origin-zip x dest-zip) to find the latest acceptable ship date, then selects the cheapest carrier whose p95 delivery time fits within the window. Warehouses that cannot meet the SLA due to cut-off times are excluded, and the router escalates to expedited services if no standard option qualifies.”
}
},
{
“@type”: “Question”,
“name”: “What states does a fulfillment status state machine define?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A typical fulfillment state machine progresses through: PENDING -> ALLOCATED -> PICKING -> PACKED -> LABEL_PRINTED -> HANDED_TO_CARRIER -> IN_TRANSIT -> DELIVERED, with terminal states CANCELLED and RETURNED. Transitions are event-driven and persisted as an append-only event log. Invalid transitions are rejected by a guard table, and compensating transactions (e.g., release allocation on CANCELLED) are triggered as side effects of state entry.”
}
},
{
“@type”: “Question”,
“name”: “How is inventory reserved during the fulfillment process without overselling?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Inventory reservation uses a two-phase approach: a soft reserve is placed atomically via a conditional write (UPDATE … WHERE available >= qty) at order placement, reducing available count while holding committed count. A hard reserve (physical pick confirmation) converts the soft reserve once a picker scans the item. Soft reserves carry a TTL and are released back to available if the order is not picked within the window, preventing indefinite locks from abandoned carts or failed payments.”
}
}
]
}