Shipping Tracker Low-Level Design: Carrier Integration, Event Normalization, and ETA Prediction

Requirements and Constraints

A shipping tracker aggregates parcel status from multiple carriers (FedEx, UPS, USPS, DHL, regional last-mile), normalizes heterogeneous event formats into a unified model, stores checkpoint history, and predicts estimated delivery time. Functional requirements: ingest carrier webhooks and polling responses, normalize to canonical event types, store full checkpoint history per shipment, expose tracking APIs to customers and internal systems, and provide ETA prediction. Non-functional: ingest 50,000 events per minute, deliver status updates to end users within 5 seconds of carrier event, maintain 5 years of tracking history, and achieve ETA accuracy within 2 hours for 80% of shipments.

Core Data Model

  • shipments(shipment_id PK, tracking_number, carrier_id FK, service_level, origin_zip, dest_zip, weight_oz, label_created_at, estimated_delivery_date, actual_delivery_at, status ENUM('label_created','in_transit','out_for_delivery','delivered','exception'))
  • carriers(carrier_id PK, name, webhook_secret, polling_enabled, poll_interval_seconds, api_base_url)
  • checkpoints(checkpoint_id PK, shipment_id FK, carrier_event_code, canonical_event_type, location_city, location_state, location_zip, occurred_at, received_at, raw_payload JSONB)
  • eta_predictions(prediction_id PK, shipment_id, predicted_delivery_at, confidence_score, model_version, features_snapshot JSONB, created_at)
  • carrier_polling_cursors(carrier_id, last_polled_at, next_poll_at)

Carrier Integration Architecture

Each carrier integration is a plugin implementing a common interface with two methods: verify_webhook(headers, body) bool and parse_events(payload) []RawEvent. Webhook endpoints are carrier-specific paths (/webhooks/fedex, /webhooks/ups) that route to the appropriate plugin after HMAC signature verification using the stored webhook_secret. For carriers that do not support push webhooks, a polling adapter reads carrier_polling_cursors, calls the carrier tracking API for shipments due for a poll, and emits synthetic RawEvent objects that flow through the same normalization pipeline.

Raw events are published to a partitioned Kafka topic (carrier-events-raw) keyed by tracking_number. This decouples ingestion from processing, provides a durable replay buffer, and allows horizontal scaling of normalization workers.

Event Normalization

Each carrier uses different event codes and schemas. A normalization worker consumes from carrier-events-raw and applies carrier-specific mapping tables stored in the database (event_code_mappings: carrier_id, raw_code, canonical_event_type, description). Canonical types are a fixed enum: LABEL_CREATED, PICKED_UP, IN_TRANSIT, OUT_FOR_DELIVERY, DELIVERED, DELIVERY_ATTEMPTED, EXCEPTION, RETURNED_TO_SENDER.

After mapping, the worker writes a checkpoint row and updates the shipment status. Status transitions are validated against an allowed-transitions matrix to prevent out-of-order events from regressing status (e.g., an IN_TRANSIT event cannot overwrite a DELIVERED status). Duplicate detection uses a composite unique index on (shipment_id, carrier_event_code, occurred_at).

ETA Prediction

ETA prediction is a regression model trained on historical shipment data. Features include: carrier, service level, origin-dest zip pair distance bucket, current checkpoint type, days since label created, day of week, and holiday proximity flag. The model outputs a predicted delivery timestamp and a confidence score.

Prediction runs asynchronously after each new checkpoint is written. The prediction service fetches recent checkpoints for the shipment, assembles the feature vector, calls the model serving endpoint (ONNX Runtime or a lightweight HTTP model server), and upserts the latest eta_predictions row. The features_snapshot column stores the exact input vector for auditability and offline model evaluation.

For interview purposes: discuss how you would retrain the model (nightly batch job on delivered shipments with ground truth), how to handle cold start for new carriers (fall back to carrier-published SLA windows), and how to A/B test model versions using the model_version column.

Scalability Considerations

  • Hot tracking numbers: High-volume shippers generate bursts of events for the same tracking numbers. Kafka partitioning by tracking_number ensures ordered processing per shipment while parallelizing across partitions.
  • Checkpoint storage: checkpoints is an append-only table; partition by received_at month. Archive old partitions to object storage with a metadata index for rare historical lookups.
  • Fanout notifications: After status update, publish to a shipment-status-changed topic consumed by a notification service (SMS/email), the fulfillment service, and customer-facing WebSocket push servers.
  • Polling efficiency: Batch carrier polling calls; use bulk tracking APIs where available (UPS, FedEx support up to 30 tracking numbers per request) to reduce API quota consumption.

API Design

  • POST /shipments — register a new shipment with tracking number and carrier; triggers initial status poll
  • GET /shipments/{tracking_number} — returns current status, latest ETA, and full checkpoint history
  • GET /shipments/{tracking_number}/checkpoints — paginated checkpoint list with canonical and raw event data
  • POST /webhooks/{carrier} — carrier-facing webhook receiver; returns 200 immediately, processes async
  • GET /shipments/{tracking_number}/eta — latest ETA prediction with confidence score and model version
  • GET /carriers/{id}/exception-rate — operational dashboard: exception rate over rolling 24h window by carrier

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Shopify Interview Guide

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

Scroll to Top