How does mTLS certificate rotation work without downtime?

The control plane issues new certificates before expiry (typically at 80% of TTL — for a 24-hour cert, rotation at 19.2 hours). The sidecar receives the new cert via the SDS (Secret Discovery Service) xDS API and loads it into memory alongside the old cert. During the overlap window, the sidecar accepts connections authenticated with either the old or new certificate, then retires the old cert after the window passes. Because rotation is continuous and overlapping, no connection is dropped during rotation.

Where does the circuit breaker live in a service mesh — in the sidecar or the application?

In a service mesh, the circuit breaker lives entirely in the calling sidecar. The application makes a normal HTTP call; the sidecar tracks error rates and open-circuits if the threshold is exceeded, returning 503 to the application without making an upstream request. This means circuit breaking is consistent across all callers of a service regardless of the programming language or SDK used by the application.

How does canary traffic shaping work with weighted routing in a service mesh?

The control plane distributes a VirtualService (Istio) or TrafficSplit (SMI) resource to all sidecars calling the target service. Each sidecar uses weighted random selection to route the configured percentage of requests to the canary version. Header-based overrides allow specific clients (internal QA, beta users identified by a header) to always hit the canary, independent of the weight. Traffic weights can be adjusted incrementally (1%, 5%, 20%, 50%, 100%) as confidence in the canary grows, without redeployment.

When should you use a service mesh sidecar versus a shared library for traffic management?

Use a service mesh when you have multiple programming languages, need uniform policy enforcement without per-team library adoption, or require zero-trust mTLS across all services. The sidecar approach has higher resource overhead (each sidecar uses ~50MB RAM and adds 1-2ms latency per hop) but is language-agnostic and centrally configurable. Use a shared library when you have a single-language codebase, extremely low-latency requirements, or do not need mutual TLS — the library approach avoids the sidecar overhead but requires each team to adopt and upgrade the library.

How does the sidecar proxy implement mTLS without application changes?

The sidecar intercepts all outbound connections and performs the TLS handshake using a SPIFFE X.509 certificate; the application sends plaintext to localhost and the sidecar handles encryption transparently.

How are service certificates rotated without downtime?

The control plane issues new certificates with overlapping validity windows; sidecars reload the new certificate before the old one expires, accepting connections signed by either certificate during the overlap.

How does weighted traffic splitting enable canary deployments?

TrafficSplit configuration assigns weight percentages to service versions; the sidecar uses a weighted random selection per request, routing the configured fraction to the canary version without client awareness.

How does the service mesh collect distributed traces?

Sidecars inject B3 or W3C trace context headers into forwarded requests and emit span data to a collector (Jaeger/Zipkin); no instrumentation is required in the application code.

Service Mesh Low-Level Design: Sidecar Proxy, mTLS, Traffic Policies, and Observability

⏱ 9 min read

What Is a Service Mesh?

A service mesh is an infrastructure layer that manages service-to-service communication in a microservices architecture. Rather than embedding network concerns (retries, timeouts, circuit breaking, mTLS) in each service's code, the mesh externalizes them into sidecar proxies that intercept all traffic. Istio (using Envoy) and Linkerd are the dominant implementations. At low level, the mesh consists of a data plane (sidecars on every pod) and a control plane (configuration distribution, certificate issuance, telemetry aggregation).

Sidecar Proxy Injection

Each service pod runs two containers: the application container and the Envoy sidecar proxy. The control plane uses a Kubernetes MutatingAdmissionWebhook to automatically inject the sidecar into any pod in a labeled namespace — no application code change is required. The sidecar is configured via iptables rules to intercept all inbound traffic on port 15006 and all outbound traffic on port 15001, making it transparent to the application.

The sidecar handles:

Inbound traffic: authenticate the caller via mTLS, apply rate limits, record metrics, forward to localhost application port.
Outbound traffic: resolve destination service endpoint, apply retry/timeout/circuit-breaker policy, negotiate mTLS with destination sidecar, emit trace spans.

Mutual TLS (mTLS)

mTLS provides zero-trust service identity. The control plane's certificate authority issues short-lived X.509 certificates (typically 24-hour TTL) to each sidecar, signed by the mesh CA. The SPIFFE/SPIRE standard defines the certificate Subject Alternative Name as a URI in the form spiffe://trust-domain/ns/namespace/sa/service-account.

On each connection:

The calling sidecar presents its certificate to the destination sidecar.
The destination sidecar verifies the certificate against the mesh CA root.
The destination sidecar presents its own certificate; the caller verifies it.
A mutually authenticated TLS session is established; all traffic is encrypted in transit.

Authorization policies (which service is allowed to call which other service on which port/path) are enforced at the destination sidecar using the verified SPIFFE identity from the certificate.

Control Plane Schema

CREATE TABLE ServicePolicy (
  service_name     VARCHAR(128) PRIMARY KEY,
  retry_attempts   INT NOT NULL DEFAULT 3,
  timeout_ms       INT NOT NULL DEFAULT 5000,
  cb_threshold     INT NOT NULL DEFAULT 50,  -- % error rate to open circuit
  cb_window_sec    INT NOT NULL DEFAULT 30,
  updated_at       TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE CertificateRecord (
  service_name  VARCHAR(128) NOT NULL,
  cert_pem      TEXT NOT NULL,
  private_key   TEXT NOT NULL,  -- encrypted at rest
  expiry        TIMESTAMPTZ NOT NULL,
  issued_at     TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (service_name, issued_at)
);

CREATE TABLE TrafficSplit (
  id                  BIGSERIAL PRIMARY KEY,
  source_service      VARCHAR(128),  -- NULL = applies to all callers
  destination_service VARCHAR(128) NOT NULL,
  v1_weight           INT NOT NULL DEFAULT 100,
  v2_weight           INT NOT NULL DEFAULT 0,
  header_match        JSONB,  -- optional header-based routing rules
  updated_at          TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE ServiceEndpoint (
  service_name  VARCHAR(128) NOT NULL,
  pod_ip        INET NOT NULL,
  port          INT NOT NULL,
  version       VARCHAR(32),
  healthy       BOOLEAN NOT NULL DEFAULT TRUE,
  PRIMARY KEY (service_name, pod_ip, port)
);

Traffic Policies: Retries, Timeouts, and Circuit Breaking

Traffic policies are configured per destination service in the control plane and pushed to sidecars as Envoy configuration (via xDS APIs — LDS, RDS, CDS, EDS).

Retry policy: max_attempts = 3, retry_on = [connect-failure, retriable-4xx, 503]. Retries use exponential backoff with jitter. Non-idempotent methods (POST, PATCH) are not retried by default.
Timeout: per-request timeout enforced by the calling sidecar. If the upstream does not respond within timeout_ms, the sidecar returns 504 to the caller and records the timeout as a failure for circuit breaker accounting.
Circuit breaker: the sidecar tracks the error rate over a rolling window. When error rate exceeds cb_threshold percent, the circuit opens and subsequent requests fast-fail with 503 without hitting the upstream. After recovery_timeout, the circuit half-opens and allows probe requests.

Load Balancing

The sidecar uses the endpoint list from the control plane's EDS (Endpoint Discovery Service) to load balance across healthy pods:

Round-robin: default; distributes requests evenly.
Least-request: routes to the upstream with the fewest active requests; better for variable latency services.
Consistent hashing: hashes on a request header (e.g. user_id, session_id) for sticky sessions; ensures the same client always hits the same upstream pod for cache locality.

Traffic Shaping for Canary Releases

Weighted routing splits traffic between service versions without DNS changes. A TrafficSplit record sets v1_weight = 95 and v2_weight = 5 to send 5% of traffic to the canary. Header-based routing allows QA teams to force all their traffic to v2 via a custom header (X-Canary: true), independent of weight.

Python Control Plane

import ssl
import datetime
from cryptography import x509
from cryptography.x509.oid import NameOID
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.asymmetric import rsa
from db import get_db

CERT_TTL_HOURS = 24
CA_CERT_PEM  = open('/etc/mesh-ca/ca.crt').read()
CA_KEY_PEM   = open('/etc/mesh-ca/ca.key').read()
TRUST_DOMAIN = 'cluster.local'

def issue_certificate(service_name: str, namespace: str, service_account: str) -> dict:
    """Issue a short-lived X.509 certificate for a service sidecar."""
    private_key = rsa.generate_private_key(public_exponent=65537, key_size=2048)

    spiffe_uri = f"spiffe://{TRUST_DOMAIN}/ns/{namespace}/sa/{service_account}"
    subject    = x509.Name([x509.NameAttribute(NameOID.COMMON_NAME, service_name)])

    ca_cert = x509.load_pem_x509_certificate(CA_CERT_PEM.encode())
    ca_key  = serialization.load_pem_private_key(CA_KEY_PEM.encode(), password=None)

    cert = (
        x509.CertificateBuilder()
        .subject_name(subject)
        .issuer_name(ca_cert.subject)
        .public_key(private_key.public_key())
        .serial_number(x509.random_serial_number())
        .not_valid_before(datetime.datetime.utcnow())
        .not_valid_after(datetime.datetime.utcnow() + datetime.timedelta(hours=CERT_TTL_HOURS))
        .add_extension(
            x509.SubjectAlternativeName([x509.UniformResourceIdentifier(spiffe_uri)]),
            critical=False
        )
        .sign(ca_key, hashes.SHA256())
    )

    cert_pem = cert.public_bytes(serialization.Encoding.PEM).decode()
    key_pem  = private_key.private_bytes(
        serialization.Encoding.PEM,
        serialization.PrivateFormat.TraditionalOpenSSL,
        serialization.NoEncryption()
    ).decode()

    db = get_db()
    db.execute("""
        INSERT INTO CertificateRecord (service_name, cert_pem, private_key, expiry)
        VALUES (%s, %s, %s, %s)
    """, (service_name, cert_pem, key_pem,
          datetime.datetime.utcnow() + datetime.timedelta(hours=CERT_TTL_HOURS)))
    db.commit()

    return {'cert_pem': cert_pem, 'private_key_pem': key_pem, 'spiffe_uri': spiffe_uri}

def apply_policy(service_name: str, policy: dict):
    """Upsert traffic policy for a service; control plane pushes to sidecars via xDS."""
    db = get_db()
    db.execute("""
        INSERT INTO ServicePolicy (service_name, retry_attempts, timeout_ms, cb_threshold, cb_window_sec)
        VALUES (%s, %s, %s, %s, %s)
        ON CONFLICT (service_name) DO UPDATE SET
          retry_attempts = EXCLUDED.retry_attempts,
          timeout_ms     = EXCLUDED.timeout_ms,
          cb_threshold   = EXCLUDED.cb_threshold,
          cb_window_sec  = EXCLUDED.cb_window_sec,
          updated_at     = NOW()
    """, (service_name,
          policy.get('retry_attempts', 3),
          policy.get('timeout_ms', 5000),
          policy.get('cb_threshold', 50),
          policy.get('cb_window_sec', 30)))
    db.commit()

def compute_traffic_split(destination_service: str, request_headers: dict) -> str:
    """Determine which service version to route to based on weights and header rules."""
    import random
    db = get_db()
    split = db.execute("""
        SELECT v1_weight, v2_weight, header_match FROM TrafficSplit
        WHERE destination_service = %s
        ORDER BY updated_at DESC LIMIT 1
    """, (destination_service,)).fetchone()

    if not split:
        return 'v1'

    # Header-based routing takes precedence
    header_match = split['header_match'] or {}
    for header_name, expected_value in header_match.items():
        if request_headers.get(header_name) == expected_value:
            return 'v2'

    # Weighted random
    total = split['v1_weight'] + split['v2_weight']
    if total == 0:
        return 'v1'
    return 'v2' if random.randint(1, total) <= split['v2_weight'] else 'v1'

Observability: Distributed Tracing and Metrics

The sidecar automatically propagates trace headers (B3 or W3C TraceContext) on all outbound requests and creates child spans for each hop. The application only needs to forward received trace headers on downstream calls — the sidecar handles span creation and reporting to the tracing backend (Jaeger, Zipkin, or Tempo). Per-service metrics (request rate, error rate, P50/P99 latency) are emitted as Prometheus metrics by the sidecar and scraped by the control plane, requiring zero instrumentation in the application code.