Low Level Design: Microservices Communication Patterns

Microservices must communicate with each other to fulfill requests. Choosing the right communication pattern — synchronous vs. asynchronous, REST vs. gRPC vs. message queues — determines system latency, coupling, fault tolerance, and scalability. Tight synchronous chains create cascading failures; loose asynchronous messaging adds complexity but resilience. Netflix, Uber, and Airbnb have each made deliberate communication pattern choices based on their specific reliability and latency requirements. Understanding these tradeoffs is essential for microservices architecture design interviews.

Synchronous Communication: REST and gRPC

Synchronous communication: the caller waits for the response before continuing. REST over HTTP/1.1: simple, universal, human-readable JSON, works through all proxies and firewalls. Latency overhead: TCP connection + HTTP headers per request. HTTP/2 multiplexing reduces overhead significantly. gRPC over HTTP/2: uses Protocol Buffers (binary serialization — 3-10x smaller than JSON, 2-5x faster to parse), HTTP/2 streaming (server push, bidirectional streaming), and strongly-typed contracts (proto files generate client/server code). Preferred for internal service-to-service communication where performance matters. gRPC disadvantages: not human-readable (need gRPCurl for debugging), browser support requires gRPC-Web proxy, proto schema evolution requires care (backward compatibility rules). REST advantages: universal tooling, easier caching (HTTP verbs + status codes map naturally), easier external API exposure.

// Service-to-service patterns

// 1. Direct REST call (synchronous, tight coupling)
// Service A blocks waiting for Service B response
// If B is slow/down, A hangs or times out

// 2. gRPC with deadline propagation
ctx, cancel := context.WithTimeout(ctx, 500*time.Millisecond)
defer cancel()
resp, err := userServiceClient.GetUser(ctx, &pb.GetUserRequest{UserId: uid})
// Deadline propagates to downstream calls -- bounded latency chain

// 3. Async via message queue (loose coupling)
// Service A publishes event and returns immediately
producer.Publish("order.created", OrderCreatedEvent{OrderID: id, UserID: uid})
// Service B subscribes and processes independently

// 4. Saga choreography (distributed transaction without 2PC)
// Each service publishes events that trigger the next step
// Order service:     publishes OrderCreated
// Payment service:   receives OrderCreated -> charges card -> publishes PaymentCompleted
// Inventory service: receives PaymentCompleted -> reserves items -> publishes ItemsReserved
// Shipping service:  receives ItemsReserved -> creates shipment

// 5. Request-reply over queue (async with correlation ID)
replyQueue := "order-service-reply-" + uuid.New().String()
producer.Publish("get-user-request", GetUserRequest{UserID: uid, ReplyTo: replyQueue})
response := consumer.WaitForMessage(replyQueue, 500*time.Millisecond)

Asynchronous Communication: Event-Driven Architecture

Asynchronous communication: the caller publishes a message and continues without waiting. Benefits: temporal decoupling (publisher and subscriber do not need to be available simultaneously), load leveling (subscriber processes at its own rate, buffer absorbs spikes), resilience (if a subscriber crashes, the message is retained in the queue for retry). Trade-offs: eventual consistency (subscriber may process the event seconds to minutes after publishing), more complex debugging (no synchronous call stack), ordering challenges (messages may arrive out of order with multiple consumers). Event-driven design: services publish events representing state changes (OrderPlaced, PaymentFailed, UserSignedUp); other services subscribe to relevant events and react. This eliminates direct service-to-service dependencies — publisher does not know who consumes its events.

Choosing Between Synchronous and Asynchronous

Use synchronous (REST/gRPC) when: the response is needed before the operation can complete (user login requires immediate auth result), the operation is read-heavy (fetching data), latency is critical, or the client is external (public API). Use asynchronous (queue/event) when: the operation can be completed independently (sending a welcome email after registration), high throughput is needed (order processing pipeline), the operation may be slow (video encoding), or you need fan-out (one event consumed by multiple services). Hybrid patterns: start synchronously (validate and accept the request, return a 202 Accepted with a job ID), process asynchronously (heavy work in background), client polls or receives a webhook when complete. This is the preferred pattern for long-running operations (image processing, ML inference, report generation).

Key Interview Discussion Points

  • Retry and idempotency: asynchronous message processing must be idempotent (processing the same message twice produces the same result) because at-least-once delivery means duplicates are possible; use idempotency keys (message IDs) to deduplicate
  • Timeout and circuit breaker: synchronous service chains require timeouts at every hop (5s + 5s + 5s = 15s P99 is unacceptable); circuit breakers prevent cascade failures by short-circuiting calls to failing services
  • Service discovery: services need to find each other dynamically (not hardcoded IPs); Consul, Kubernetes DNS, and Eureka provide service discovery; client-side discovery (service queries registry) vs server-side discovery (load balancer queries registry)
  • Schema evolution: REST APIs version with /v1/ vs /v2/ URLs or Accept headers; gRPC schemas evolve with backward-compatible field additions (never renumber or remove fields); event schemas evolve with schema registries (Confluent Schema Registry for Kafka Avro)
  • Outbox pattern: to reliably publish an event after a database transaction, write the event to an outbox table in the same transaction, then a separate process reads the outbox and publishes to the message broker — ensures exactly-once event publication even if the publisher crashes
{ “@context”: “https://schema.org”, “@type”: “FAQPage”, “mainEntity”: [ { “@type”: “Question”, “name”: “When should you use synchronous vs asynchronous communication between microservices?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “Use synchronous (REST/gRPC) when: the caller needs the response before it can proceed (user login, payment authorization, real-time inventory check); the operation is primarily a read (fetching data for display); the client is external and expects an immediate response. Use asynchronous (message queue, event bus) when: the operation can complete independently without the caller waiting (send welcome email, update analytics, process video); high throughput is needed and the operation may be slow; you need fan-out (one event triggers multiple downstream services); the downstream service may be temporarily unavailable (queue buffers the request). Hybrid pattern: accept the request synchronously (validate and return 202 Accepted with job_id), process asynchronously, and let the client poll for completion or receive a webhook callback — ideal for long-running operations.” } }, { “@type”: “Question”, “name”: “How does gRPC differ from REST and when should you use it?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “REST uses HTTP/1.1 with JSON: human-readable, universally supported, works through all proxies, natural HTTP caching. JSON serialization is slow (string parsing) and verbose (field names repeated per record). gRPC uses HTTP/2 with Protocol Buffers: binary serialization (3-10x smaller, 5x faster to parse), HTTP/2 multiplexing (multiple streams over one TCP connection, no head-of-line blocking), streaming support (server push, bidirectional streaming). gRPC proto files define strongly-typed contracts and auto-generate client/server stubs in multiple languages — eliminating hand-written API clients. Use gRPC for high-throughput internal service-to-service communication where performance matters (trading systems, real-time bidding, high-volume data ingestion). Use REST for external APIs (developer experience, universal tooling, easier debugging) and simple CRUD services.” } }, { “@type”: “Question”, “name”: “What is the Saga pattern for distributed transactions?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “A distributed transaction spanning multiple microservices cannot use a traditional 2-phase commit (2PC) without tight coupling and performance penalties. The Saga pattern breaks the transaction into a sequence of local transactions, each publishing an event that triggers the next step. Choreography-based saga: each service reacts to events and publishes the next event (no central coordinator). Example: Order service publishes OrderCreated; Payment service receives it, charges the card, publishes PaymentCompleted; Inventory service reserves items, publishes ItemsReserved. If any step fails, compensating transactions undo previous steps (payment refund if inventory reservation fails). Orchestration-based saga: a central saga orchestrator sends commands to each service and coordinates the sequence, handling failures and compensations explicitly. Orchestration is easier to debug and reason about; choreography is more decoupled but harder to trace.” } }, { “@type”: “Question”, “name”: “What is the outbox pattern and why is it needed?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “The outbox pattern solves the dual-write problem: reliably publishing an event to a message broker after committing a database transaction. If you write to the database and then publish to Kafka separately, a crash between the two steps leaves the database committed but the event unpublished (other services never learn about the change). With the outbox pattern: in the same database transaction that writes the business data, also insert the event into an outbox table. A separate process (Debezium CDC, a polling service) reads the outbox table and publishes events to Kafka, marking them as published. This guarantees exactly-once publishing: the database transaction is atomic (both the business write and the outbox write succeed or both fail), and the outbox reader ensures every outbox entry is eventually published. Debezium reads directly from the PostgreSQL WAL (change data capture) for minimal latency.” } } ] }
Scroll to Top