A shadow traffic system mirrors live production requests to a new service version asynchronously, compares responses without affecting users, and alerts engineers when divergences appear. This design covers the mirroring pipeline, async replay architecture, response diffing engine, and divergence alerting.
Requirements
Functional
- Mirror a configurable percentage of production requests to a shadow service.
- Execute mirrored requests asynchronously without adding latency to the primary path.
- Compare shadow and primary responses: status code, body fields, latency.
- Throttle replay to protect the shadow service from traffic spikes.
- Alert when divergence rate exceeds a threshold.
Non-Functional
- Zero latency impact on primary path (fire-and-forget mirroring).
- Shadow responses are never returned to end users.
- Replay buffer durable across restarts.
- Configurable sampling rate per endpoint.
Data Model
- MirroredRequest —
requestId,traceId,method,path,headers(sanitized — drop auth tokens, replace with shadow credentials),bodyRef,capturedAt,shadowTarget. - ShadowResult —
requestId,primaryStatus,primaryBodyHash,primaryLatencyMs,shadowStatus,shadowBodyHash,shadowLatencyMs,diffResult(MATCH, DIVERGED, SHADOW_ERROR),evaluatedAt. - DivergenceReport —
reportId,windowStart,windowEnd,totalMirrored,divergedCount,errorCount,topDivergencePaths,sampleRequestIds. - SamplingConfig —
endpointPattern,sampleRate(0.0-1.0),maxRps,shadowTarget.
Mirroring Pipeline
The primary request handler captures the request after authentication and before writing to any external state. It serializes a MirroredRequest — with auth headers stripped and replaced with a shadow service account credential — and publishes it to a local async queue (Kafka topic or in-process ring buffer). The primary response is returned immediately. A sidecar or worker thread consumes the queue and replays the request to the shadow target.
Sanitize headers carefully: strip Authorization, Cookie, X-User-Token, and any header that carries a real user credential. Inject a pre-configured shadow API key. For endpoints that require user context, stub out user ID resolution in the shadow service to use a fixed test identity.
Core Algorithms
Sampling
For each incoming request, hash the request ID or trace ID modulo 1000 and compare against sampleRate * 1000. This produces a deterministic, stateless sample that is reproducible for debugging. Apply a token bucket against maxRps per endpoint pattern before enqueuing — if the bucket is empty, drop the mirrored copy without affecting the primary response.
Async Replay with Throttling
Workers pull from the replay queue at a rate controlled by a rate limiter seeded from SamplingConfig.maxRps. Each worker sends the shadow request with a short timeout (typically 2x primary p99 latency) and records the response. If the shadow service returns a 5xx or times out, record a SHADOW_ERROR result but do not re-enqueue — shadow traffic is best-effort.
Response Diffing
Compare primary and shadow responses at multiple levels. First compare status codes. If both are 2xx, compare response body hashes for exact equality. For endpoints with known non-deterministic fields (timestamps, request IDs, random tokens), apply a field mask: parse the JSON body, delete masked keys, then hash the remainder. Track which JSON paths diverge most frequently to guide migration effort prioritization.
Divergence Rate Calculation
Maintain sliding-window counters (total mirrored, diverged, errored) per endpoint pattern. Every five minutes, compute divergenceRate = divergedCount / totalMirrored. If it exceeds the configured threshold (e.g., 1%), emit a DIVERGENCE_ALERT with sample request IDs. Store the DivergenceReport for trend analysis and link it to the shadow deployment version.
API Design
POST /shadow-configs— create or update sampling config for an endpoint pattern.GET /shadow-configs— list all active configs with current sample rates and RPS.GET /shadow-results/{requestId}— full diff result for a specific request (useful for debugging).GET /divergence-reports— time-windowed divergence reports paginated by endpoint.POST /replay— manually replay a captured request ID against the current shadow target (for debugging).
Scalability and Observability
- Decouple capture from replay with a durable queue so replay can be paused without dropping captures.
- Scale replay workers independently from the primary service — shadow traffic is async and can lag during traffic spikes.
- Emit
shadow_divergence_rate{endpoint}as a Prometheus gauge. Dashboard it alongside the deployment version label so divergence spikes are correlated with deploys. - Never shadow endpoints that mutate shared state (payments, emails) unless the shadow service is fully isolated with a separate database and external call stubs.
- Use shadow traffic to validate new database query plans, serialization changes, and library upgrades without risk to production users.
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety