System Design: Microservices Communication — Sync vs Async, Service Mesh, Circuit Breaker, Retry, Timeout, Backpressure

How microservices communicate with each other determines the reliability, latency, and coupling of the overall system. This guide covers the communication patterns that production microservice architectures use — synchronous HTTP/gRPC, asynchronous messaging, service mesh, and resilience patterns like circuit breakers, retries, and backpressure. Essential knowledge for system design interviews and real-world architecture decisions.

Synchronous Communication: HTTP and gRPC

Synchronous communication: the caller waits for the callee to respond before proceeding. HTTP REST: the standard for external-facing APIs. JSON payload, human-readable, well-supported by every language and framework. Latency: serialization + network round-trip + deserialization. JSON parsing overhead is non-trivial at high throughput. gRPC: the standard for internal service-to-service communication at high-performance shops. Protocol Buffers (binary serialization) are 5-10x smaller and faster to serialize than JSON. HTTP/2 transport enables multiplexing (multiple requests on one TCP connection), header compression, and streaming. Code generation from .proto files provides type-safe client and server stubs in any language. When to use synchronous: the caller needs the response to continue its work (e.g., the checkout service needs the payment service to confirm the charge before confirming the order). Downside: temporal coupling — if the callee is slow or down, the caller is also slow or fails. All services in the synchronous call chain must be available simultaneously.

Asynchronous Communication: Message Queues and Event Streams

Asynchronous communication: the producer sends a message and does not wait for a response. The consumer processes it later. Message queues (RabbitMQ, Amazon SQS): point-to-point delivery. A message is consumed by exactly one consumer. Use for task distribution: a queue of image resize jobs consumed by a pool of workers. The queue provides load leveling — producers can spike without overwhelming consumers. Event streams (Kafka): publish-subscribe. An event is delivered to all subscribed consumer groups. Use for event-driven architectures where multiple services need to react to the same event. Kafka retains events for replay. When to use async: (1) The caller does not need an immediate response (order confirmation email can be sent asynchronously). (2) Load leveling — absorb traffic spikes without scaling the backend. (3) Decoupling — services do not need to know about each other. (4) Reliability — messages persist in the queue even if the consumer is temporarily down. Downside: complexity (message ordering, exactly-once processing, dead letter queues), debugging difficulty (request flows are not visible in a single call chain), and eventual consistency.

Circuit Breaker Pattern

A circuit breaker prevents cascading failures when a downstream service is unhealthy. Without it: if the payment service is down, the checkout service continues sending requests, exhausting its connection pool and thread pool, causing it to fail too — cascading to all upstream services. Circuit breaker states: (1) Closed (normal) — requests flow through. The circuit breaker monitors the error rate. (2) Open (tripped) — when the error rate exceeds a threshold (e.g., 50% of recent requests failed), the circuit opens. All subsequent requests immediately fail with a fallback response (cached data, default value, or error) without contacting the downstream service. This gives the downstream service time to recover and prevents resource exhaustion on the caller. (3) Half-open (probe) — after a timeout (e.g., 30 seconds), the circuit allows one request through. If it succeeds, the circuit closes (service has recovered). If it fails, the circuit remains open for another timeout period. Implementation: Resilience4j (Java), Polly (.NET), or Istio/Envoy at the service mesh layer. Envoy circuit breaking is transparent to the application — configured via Istio DestinationRule.

Retry and Timeout Strategies

Retries handle transient failures (network blips, temporary service unavailability). Retry strategies: (1) Exponential backoff — wait 100ms, 200ms, 400ms, 800ms between retries. Prevents thundering herd (all clients retrying simultaneously). (2) Jitter — add random jitter to the backoff: wait = base_delay * 2^attempt + random(0, base_delay). Without jitter, synchronized retries create periodic load spikes. (3) Retry budget — limit retries to 20% of total requests. If 1000 requests are sent per second, at most 200 can be retries. This prevents retry amplification (a 50% failure rate with unlimited retries doubles the load on the failing service). Timeouts: every synchronous call must have a timeout. Without timeouts, a slow downstream service consumes caller resources indefinitely. Timeout hierarchy: connection timeout (500ms — how long to wait for a TCP connection), request timeout (2-5 seconds — how long to wait for a response), and overall operation timeout (10-30 seconds — how long the user-facing request has to complete, including all downstream calls). Set timeouts at each layer. The overall timeout should be shorter than the sum of all downstream timeouts to allow graceful degradation.

Service Mesh: Istio and Envoy

A service mesh provides communication infrastructure (traffic management, security, observability) without application code changes. Architecture: a sidecar proxy (Envoy) runs alongside each application container. All inbound and outbound traffic flows through the sidecar. The control plane (Istio, Linkerd) configures the sidecars. Capabilities: (1) Traffic management — circuit breaking, retries, timeouts, rate limiting, canary routing (send 5% of traffic to v2), header-based routing. Configured declaratively via Istio VirtualService and DestinationRule CRDs. (2) Security — mutual TLS (mTLS) between all services automatically. The sidecar handles certificate provisioning, rotation, and TLS handshake. Zero application code changes. Authorization policies control which services can communicate. (3) Observability — Envoy automatically generates metrics (request rate, error rate, latency) for every service-to-service call. Distributed tracing headers are propagated automatically. Service dependency graphs are generated from traffic data. Trade-off: service mesh adds latency (1-3ms per hop through the sidecar) and operational complexity (Istio control plane management, debugging proxy issues).

Backpressure and Load Shedding

Backpressure: when a consumer cannot keep up with the producer, it signals the producer to slow down. Without backpressure, the consumer queue grows unboundedly, leading to out-of-memory failures or extreme latency. Backpressure mechanisms: (1) Bounded queues — set a maximum queue size. When the queue is full, the producer is blocked or the message is rejected. The producer either slows down or drops low-priority messages. (2) Rate limiting — the consumer advertises its processing rate. The producer sends at most that rate. (3) Reactive streams (Project Reactor, RxJava) — the consumer requests N items at a time. The producer sends at most N items, then waits for the consumer to request more. Load shedding: when a service is overloaded, deliberately reject low-priority requests to protect high-priority ones. Implementation: categorize requests by priority (health checks > paid users > free users). When CPU exceeds 80% or queue depth exceeds a threshold, reject the lowest priority requests with 503 Service Unavailable and a Retry-After header. Shed load early (at the load balancer or API gateway) rather than deep in the service stack where resources are already consumed.

Scroll to Top