Sidecar Proxy Low-Level Design: L7 Routing, Circuit Breaking, and Metrics Collection

Overview

A sidecar proxy runs alongside an application container in the same network namespace, intercepting all inbound and outbound traffic. It implements Layer 7 protocol awareness, circuit breaking, retry logic, header manipulation, and metrics emission without requiring any changes to the application code. Envoy is the dominant implementation but the design principles apply to any L7 proxy used in this role.

Requirements

Functional Requirements

Intercept all TCP traffic to and from the application via iptables redirection.
Parse HTTP/1.1, HTTP/2, and gRPC at Layer 7 to enable path-based and header-based routing.
Implement circuit breaker with configurable failure threshold and half-open probe behavior.
Apply configurable retry policies per route with exponential backoff and jitter.
Manipulate request and response headers (add, remove, rewrite) per routing rule.
Emit per-request metrics and trace spans to local collection endpoints.

Non-Functional Requirements

Proxy overhead under 500 microseconds at p99 for typical HTTP requests.
Zero-downtime hot reload of routing and policy configuration.
Memory footprint under 50 MB per proxy instance under normal load.
Support 50,000 concurrent connections per proxy instance.

Data Model

Listener and Filter Chain

The proxy configuration is organized as a tree of listeners, filter chains, and clusters. A listener binds to a port (the redirected inbound port or a per-upstream outbound port) and specifies an ordered filter chain. Each filter in the chain processes the connection or request: the network filter parses the protocol, the HTTP connection manager routes requests to a cluster, and the router filter selects the upstream endpoint. This pipeline model allows inserting cross-cutting filters (authentication, rate limiting, logging) at any point without modifying the routing logic.

Cluster State

Each upstream service is represented as a cluster containing a list of endpoints with health status, weight, zone, and circuit breaker counters. The circuit breaker state machine per (cluster, endpoint) tracks: consecutive failures, last failure timestamp, state (closed, open, half-open), and next probe time. State transitions are guarded by atomic compare-and-swap operations to handle concurrent requests safely without a global lock.

Core Algorithms

L7 Routing

Incoming requests are matched against a route table using a longest-prefix-match trie on the URL path, followed by header match evaluation in order. Each route entry specifies a destination cluster, header mutations to apply, and per-route timeout and retry policy overrides. Path matching uses a radix trie indexed at configuration load time, providing O(log P) lookup where P is the number of distinct path prefixes. Header matching iterates through the ordered list of match conditions; the first matching route wins.

Circuit Breaker

The circuit breaker uses a three-state machine per upstream cluster:

Closed: requests flow normally. Consecutive failures are counted. When failures exceed threshold T within a rolling window of W seconds, transition to Open.
Open: all requests to this cluster fail immediately with a 503 response without attempting an upstream connection. After a configurable sleep duration (default 10 seconds), transition to Half-Open.
Half-Open: one probe request is forwarded to the upstream. If it succeeds, transition back to Closed and reset the failure counter. If it fails, return to Open and restart the sleep timer. The probe selection uses atomic test-and-set to ensure only one concurrent probe is in flight.

Retry with Backoff and Jitter

Retries are triggered on configurable conditions: connection failure, 503 response, or idempotent-method 5xx response. The backoff duration for attempt i is min(base * 2^i, max_backoff) multiplied by a uniform random jitter factor in [0.5, 1.5]. Jitter prevents retry storms where many clients retry simultaneously after a partial outage. The total retry budget per request is bounded by an absolute timeout to prevent retries from extending tail latency unboundedly.

API Design

UpdateRouteConfig(route_config) — hot-reloads the route table without dropping connections; new requests use the new config immediately.
UpdateClusterConfig(cluster_config) — updates endpoint list and circuit breaker parameters for a cluster.
GetAdminStats() — returns current connection counts, cluster health, circuit breaker states, and per-route request rates via the admin HTTP endpoint.
DrainConnections(timeout_ms) — signals the proxy to stop accepting new connections and drain existing ones, used during pod shutdown.
GetCircuitBreakerStatus(cluster_name) — returns current state, failure count, and next probe time for a cluster circuit breaker.

Scalability

Connection Handling

The proxy uses an event-driven non-blocking I/O model with a single worker thread per CPU core. Each worker independently accepts connections from the listener and handles all I/O for those connections without cross-thread coordination. Connection count is balanced across workers by the operating system accept queue. This design allows linear throughput scaling with CPU cores without locking overhead on the hot path.

Memory Management

Request and response body buffers use a slab allocator with fixed-size buffer pools (4 KB, 16 KB, 64 KB) to avoid heap fragmentation under sustained load. Buffers are returned to the pool on request completion. For streaming requests exceeding the largest slab size, the proxy streams data through without buffering the full body, capping memory usage per connection at the stream window size negotiated during HTTP/2 flow control setup.

Monitoring

Per-route counters track request count, error count, and latency histogram. Per-cluster gauges track active connections, pending requests, and circuit breaker state. These metrics are exposed on the admin port in Prometheus text format and scraped by the local collector agent every 15 seconds. Alerts include circuit breaker open events (immediate notification to on-call), retry rate exceeding 5% of requests (indicating upstream instability), and p99 latency increase over 50% from the hourly baseline.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does a sidecar proxy implement L7 HTTP routing with a radix trie?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A radix trie (compressed prefix tree) stores route rules keyed by URL path segments. On each request the proxy walks the trie character-by-character, matching static segments first, then parameterised wildcards, then catch-all routes, in priority order. Lookup is O(path length) rather than O(number of routes), which keeps routing overhead sub-microsecond even with thousands of registered routes.”
}
},
{
“@type”: “Question”,
“name”: “How is a circuit breaker integrated into a sidecar proxy?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The circuit breaker sits between the routing decision and the upstream connection pool. It tracks a rolling window of success/failure counts per upstream cluster. When the error rate exceeds a threshold the breaker trips to OPEN, and subsequent requests fast-fail with a 503 without touching the upstream. After a configured sleep window the breaker moves to HALF-OPEN, allows a probe request, and resets to CLOSED on success or returns to OPEN on failure.”
}
},
{
“@type”: “Question”,
“name”: “Why use retry with jitter in a sidecar proxy and how is it implemented?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Deterministic exponential backoff causes retry storms when many clients fail simultaneously. Adding jitter — a random offset drawn from [0, base_delay] — spreads retries across time and prevents thundering-herd amplification. The proxy implements per-request retry budgets (e.g., at most 20 % of outstanding requests may be retries) and respects Retry-After headers from upstreams to avoid overloading a recovering service.”
}
},
{
“@type”: “Question”,
“name”: “How does a per-core event-driven model improve sidecar proxy throughput?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A per-core event loop (like Envoy's worker threads or Nginx's worker processes) eliminates cross-core locking on the hot path. Each core owns its listener socket share via SO_REUSEPORT, its connection pool slice, and its stats counters. Non-blocking I/O multiplexing (epoll/kqueue) drives all I/O within a single thread, achieving high throughput without context-switch overhead. Global state such as route configs is distributed via read-copy-update (RCU) mechanisms.”
}
}
]
}