Shadow Traffic System Low-Level Design: Request Mirroring, Response Diffing, and Async Replay

A shadow traffic system mirrors live production requests to a new service version asynchronously, compares responses without affecting users, and alerts engineers when divergences appear. This design covers the mirroring pipeline, async replay architecture, response diffing engine, and divergence alerting.

Requirements

Functional

Mirror a configurable percentage of production requests to a shadow service.
Execute mirrored requests asynchronously without adding latency to the primary path.
Compare shadow and primary responses: status code, body fields, latency.
Throttle replay to protect the shadow service from traffic spikes.
Alert when divergence rate exceeds a threshold.

Non-Functional

Zero latency impact on primary path (fire-and-forget mirroring).
Shadow responses are never returned to end users.
Replay buffer durable across restarts.
Configurable sampling rate per endpoint.

Data Model

MirroredRequest — requestId, traceId, method, path, headers (sanitized — drop auth tokens, replace with shadow credentials), bodyRef, capturedAt, shadowTarget.
ShadowResult — requestId, primaryStatus, primaryBodyHash, primaryLatencyMs, shadowStatus, shadowBodyHash, shadowLatencyMs, diffResult (MATCH, DIVERGED, SHADOW_ERROR), evaluatedAt.
DivergenceReport — reportId, windowStart, windowEnd, totalMirrored, divergedCount, errorCount, topDivergencePaths, sampleRequestIds.
SamplingConfig — endpointPattern, sampleRate (0.0-1.0), maxRps, shadowTarget.

Mirroring Pipeline

The primary request handler captures the request after authentication and before writing to any external state. It serializes a MirroredRequest — with auth headers stripped and replaced with a shadow service account credential — and publishes it to a local async queue (Kafka topic or in-process ring buffer). The primary response is returned immediately. A sidecar or worker thread consumes the queue and replays the request to the shadow target.

Sanitize headers carefully: strip Authorization, Cookie, X-User-Token, and any header that carries a real user credential. Inject a pre-configured shadow API key. For endpoints that require user context, stub out user ID resolution in the shadow service to use a fixed test identity.

Core Algorithms

Sampling

For each incoming request, hash the request ID or trace ID modulo 1000 and compare against sampleRate * 1000. This produces a deterministic, stateless sample that is reproducible for debugging. Apply a token bucket against maxRps per endpoint pattern before enqueuing — if the bucket is empty, drop the mirrored copy without affecting the primary response.

Async Replay with Throttling

Workers pull from the replay queue at a rate controlled by a rate limiter seeded from SamplingConfig.maxRps. Each worker sends the shadow request with a short timeout (typically 2x primary p99 latency) and records the response. If the shadow service returns a 5xx or times out, record a SHADOW_ERROR result but do not re-enqueue — shadow traffic is best-effort.

Response Diffing

Compare primary and shadow responses at multiple levels. First compare status codes. If both are 2xx, compare response body hashes for exact equality. For endpoints with known non-deterministic fields (timestamps, request IDs, random tokens), apply a field mask: parse the JSON body, delete masked keys, then hash the remainder. Track which JSON paths diverge most frequently to guide migration effort prioritization.

Divergence Rate Calculation

Maintain sliding-window counters (total mirrored, diverged, errored) per endpoint pattern. Every five minutes, compute divergenceRate = divergedCount / totalMirrored. If it exceeds the configured threshold (e.g., 1%), emit a DIVERGENCE_ALERT with sample request IDs. Store the DivergenceReport for trend analysis and link it to the shadow deployment version.

API Design

POST /shadow-configs — create or update sampling config for an endpoint pattern.
GET /shadow-configs — list all active configs with current sample rates and RPS.
GET /shadow-results/{requestId} — full diff result for a specific request (useful for debugging).
GET /divergence-reports — time-windowed divergence reports paginated by endpoint.
POST /replay — manually replay a captured request ID against the current shadow target (for debugging).

Scalability and Observability

Decouple capture from replay with a durable queue so replay can be paused without dropping captures.
Scale replay workers independently from the primary service — shadow traffic is async and can lag during traffic spikes.
Emit shadow_divergence_rate{endpoint} as a Prometheus gauge. Dashboard it alongside the deployment version label so divergence spikes are correlated with deploys.
Never shadow endpoints that mutate shared state (payments, emails) unless the shadow service is fully isolated with a separate database and external call stubs.
Use shadow traffic to validate new database query plans, serialization changes, and library upgrades without risk to production users.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does async request mirroring work in a shadow traffic system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The shadow layer intercepts live requests, immediately forwards them to the primary service as normal, and asynchronously duplicates the request to the shadow (candidate) service in a fire-and-forget fashion. The shadow response is captured for analysis but never returned to the caller, so shadow latency or errors have no user-visible impact.”
}
},
{
“@type”: “Question”,
“name”: “How is response diff analysis performed in a shadow traffic service?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Responses from the primary and shadow services are compared field-by-field after normalizing non-deterministic values (timestamps, request IDs, ordering of unordered collections). Differences are classified by type—missing fields, value mismatches, status code divergence—and logged with the original request for debugging.”
}
},
{
“@type”: “Question”,
“name”: “Why is replay throttling important in shadow traffic, and how is it implemented?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The shadow service may have lower capacity than production, so replaying 100% of traffic can overload it. Throttling limits the fraction of requests mirrored (e.g., via token bucket sampling) or caps the shadow request rate. This also prevents shadow writes from causing data integrity issues by applying a configurable sampling percentage rather than full duplication.”
}
},
{
“@type”: “Question”,
“name”: “What is the divergence alerting strategy for a shadow traffic system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Divergence metrics (diff rate, error rate on shadow, latency delta) are tracked over a rolling window. Alerts fire when the divergence rate crosses a configurable threshold, with separate alert severity for functional differences versus performance regressions. Alerts include sampled request/response pairs to accelerate root-cause analysis before the candidate service is promoted to production.”
}
}
]
}