When should you use asynchronous messaging instead of synchronous REST or gRPC?

Use asynchronous messaging when: (1) The caller does not need an immediate response (fire-and-forget notifications, background jobs). (2) The workload can be buffered to handle traffic spikes without dropping requests. (3) The downstream service may be temporarily unavailable and requests should be retried. (4) Multiple consumers need to react to the same event (fan-out). Use synchronous calls when: the caller needs the result to proceed (e.g., payment authorization), latency matters (sub-100ms), or the operation is simple request-response with guaranteed availability.

What is the correlation ID pattern in microservices?

A correlation ID is a unique identifier (UUID) assigned to each inbound request at the edge (API gateway or first service). It is propagated to all downstream services via HTTP headers (X-Correlation-ID or X-Request-ID). Each service logs the correlation ID with every log line. When debugging a production issue, you can trace all log lines across all services for a single request by filtering on the correlation ID. Correlation IDs are the simplest form of distributed tracing and are prerequisite to more advanced tracing systems like Jaeger.

How do you handle partial failures in a microservices call chain?

Strategies: (1) Timeout — every call has a deadline; fail fast rather than blocking indefinitely. (2) Retry with exponential backoff and jitter — retry transient failures but avoid thundering herd on recovering services. (3) Circuit breaker — stop sending requests to a failing service, fail-fast immediately, probe with half-open state. (4) Fallback — return a degraded but acceptable response (cached data, default value, empty result) when a dependency is unavailable. (5) Bulkhead — isolate dependency failures to their own thread pool so one slow service does not exhaust shared resources. These patterns are typically combined in a resilience library like Resilience4j.

Low Level Design: Microservices Communication Patterns

Microservices decompose an application into independently deployable services. The communication layer between those services is where distributed systems complexity lives: latency, partial failures, message ordering, and schema evolution. Choosing the right communication pattern for each interaction is one of the most consequential low level design decisions in a microservices architecture.

Synchronous Communication: REST vs gRPC

Synchronous communication means the caller blocks until it receives a response. Two dominant protocols:

REST over HTTP/1.1: simple, browser-native, human-readable JSON. Mature tooling, easy to debug with curl. Verbose wire format. No formal contract enforcement without OpenAPI.
gRPC over HTTP/2: strongly typed via Protocol Buffers, binary encoding (smaller payloads, faster serialization), supports bidirectional streaming, lower latency at scale. Requires code generation. Harder to debug without tooling. Preferred for internal service-to-service calls where performance matters.

Both protocols have the same fundamental tradeoff: the caller is blocked waiting for a response. If the downstream service is slow or unavailable, the caller stalls. This makes synchronous communication vulnerable to cascading failures — a slow dependency can exhaust caller thread pools and propagate the outage upstream.

Asynchronous Messaging

In asynchronous messaging, the producer publishes a message to a queue or topic and returns immediately without waiting for processing. The consumer processes the message independently. This decouples services in time — the producer and consumer do not need to be running simultaneously. Messaging systems (Kafka, RabbitMQ, SQS) act as a buffer that absorbs traffic bursts: if the consumer is slow, messages accumulate in the queue rather than causing backpressure to the producer. The tradeoff is increased complexity: eventual consistency, message ordering concerns, and the need for idempotent consumers.

Point-to-Point Queue

In a point-to-point queue, each message is consumed by exactly one consumer. Multiple consumer instances can read from the same queue — the broker distributes messages across them (competing consumers / worker pool pattern). This provides natural load distribution and horizontal scaling: add more worker instances to increase throughput. Use cases: order processing, image resizing, email sending — any task that should be executed exactly once. The consumer acknowledges the message after successful processing; unacknowledged messages are redelivered, making at-least-once delivery the default guarantee.

Pub/Sub Topic

In publish/subscribe, a message published to a topic is delivered to all subscribing consumer groups. Each consumer group receives its own copy and processes it independently. This enables event fan-out: a single OrderPlaced event triggers the inventory service, the notification service, and the analytics service simultaneously without the order service knowing about any of them. Kafka implements this as consumer groups with separate committed offsets per group. SNS/SQS fan-out uses SNS topics with SQS queue subscriptions per consumer. Pub/sub is the foundation of event-driven architectures.

Request-Reply Over Messaging

Some workflows require a response but can tolerate asynchronous delivery. The request-reply pattern over messaging achieves this: the requester publishes a message with a correlation_id and a reply_to queue address in the headers. The responder processes the request, publishes the result to the reply-to queue with the same correlation_id. The requester matches the response by correlation ID. This preserves request-response semantics while decoupling services through the message broker. It enables the requester to do other work while waiting (non-blocking I/O pattern) and supports timeouts without holding a synchronous connection open.

Event-Driven Choreography

In choreography, services react to domain events published by other services — there is no central coordinator. When the order service publishes OrderPlaced, the inventory service consumes it and publishes InventoryReserved, which the shipping service consumes, and so on. Services are maximally decoupled — they only know about the event contract, not about each other. The tradeoff: the overall business process is implicit and distributed across services. Debugging a failed saga requires correlating logs across multiple services. Adding a new step in the flow requires modifying producer or consumer services.

Orchestration and Saga Pattern

In orchestration, a central coordinator (saga orchestrator or workflow engine like Temporal, AWS Step Functions) explicitly directs the sequence of service calls. The orchestrator knows the full business process, calls each service, handles failures, and executes compensating transactions on rollback. This makes the distributed transaction flow traceable and auditable — the full state is in one place. The tradeoff: the orchestrator is a single point of coordination (though not a single point of failure if made durable). Orchestration is preferred when the workflow has complex branching, requires compensation logic, or needs strong observability.

Idempotency for Async Processing

Message brokers guarantee at-least-once delivery — a message may be delivered more than once due to network retries, consumer restarts, or rebalancing. Consumers must be idempotent: processing the same message twice must produce the same result as processing it once. Implementation strategies: store a processed event_id in a database with a unique constraint — if the insert succeeds, process; if it fails (duplicate), skip. For financial operations, use idempotency keys. Design database operations as upserts rather than inserts. Idempotency is non-negotiable in any production async messaging system.

Service Mesh

A service mesh (Istio, Linkerd, Envoy-based) deploys a sidecar proxy (Envoy) alongside each service instance. The sidecar intercepts all inbound and outbound network traffic and handles cross-cutting concerns at the infrastructure level — without any application code changes:

Automatic retries and timeouts with configurable policies per route.
Circuit breaking — stop sending requests to a failing upstream, preventing cascading failures.
Mutual TLS (mTLS) — all service-to-service communication is encrypted and mutually authenticated automatically.
Distributed tracing — propagate trace headers (B3, W3C TraceContext) and export spans to Jaeger or Zipkin.
Traffic management — canary deployments, A/B testing, traffic mirroring via control plane config.

API Composition and Gateway Aggregation

An API gateway aggregates multiple downstream service calls into a single client-facing response — the scatter-gather (fan-in) pattern. The gateway fans out requests to N services in parallel, waits for all responses, merges results, and returns a single payload. This reduces client-side complexity and round trips (critical on mobile). The gateway must handle partial failures: if one upstream service fails, it can return a partial response, use a cached value, or apply a fallback. GraphQL federation is a structured form of API composition where each service owns a subgraph and the gateway stitches them into a unified schema at query time.

Microservice API Versioning

As services evolve independently, API versioning prevents breaking consumers. Three main strategies:

URL versioning: /v1/orders, /v2/orders. Simple, explicit, easy to route at the gateway. Requires clients to update base URLs on major changes.
Header versioning: Accept: application/vnd.myapi.v2+json. Keeps URLs clean but is less visible and harder to test in a browser.
Backward-compatible evolution: for minor versions, add optional fields only (never remove or rename required fields). Consumers that ignore unknown fields (tolerant reader pattern) are unaffected. Semantic versioning guides when a breaking change warrants a new major version vs a backward-compatible addition.