System Design Interview: Microservices Architecture Patterns

Monolith vs Microservices

A monolith deploys all application functionality as one unit. Simple to develop initially, but becomes harder to scale as the codebase grows: one component failure can bring down the whole system, teams conflict on shared code, and scaling requires replicating everything. Microservices decompose the system into small, independently deployable services, each owning its own data. Benefits: independent scaling, technology diversity, team autonomy, fault isolation. Costs: distributed systems complexity, network overhead, operational burden (dozens of services to monitor and deploy), and the dreaded “distributed monolith” — microservices that are so tightly coupled they must be deployed together and fail together.

Service Decomposition Strategies

How do you decide what to make a service? (1) Decompose by business capability: align services with business functions — order service, payment service, inventory service, notification service. Each service represents a bounded context in domain-driven design (DDD). (2) Decompose by subdomain: identify core domain (competitive advantage — recommendation algorithm), supporting domain (necessary but not core — email sending), and generic domain (solved by third parties — payment processing). Build services for core domain; buy or use SaaS for generic domain. (3) Strangler Fig pattern: gradually migrate a monolith to microservices by routing specific routes to new services while the old monolith handles the rest. Never do a “big bang” rewrite — incrementally strangle the monolith.

Saga Pattern: Distributed Transactions

Microservices cannot use database ACID transactions across service boundaries — each service has its own database. The Saga pattern manages multi-step distributed transactions with compensating actions for failures. Two implementations: (1) Choreography saga: services emit events; other services react to those events and emit their own. Decoupled but hard to visualize. (2) Orchestration saga: a central saga orchestrator calls each service in sequence and handles failures by calling compensating actions. Example — Order Placement Saga: (1) Create order (PENDING). (2) Reserve inventory. (3) Process payment. If payment fails: compensate by releasing inventory, then cancel order. The saga orchestrator retries steps on transient failures and executes compensation on permanent failures.


# Orchestration saga state machine
class OrderSaga:
    def execute(self, order):
        # Step 1: reserve inventory
        reservation = inventory_service.reserve(order.items)
        if not reservation.success:
            order_service.cancel(order.id, "INVENTORY_UNAVAILABLE")
            return

        # Step 2: charge payment
        payment = payment_service.charge(order.amount, order.customer_id)
        if not payment.success:
            inventory_service.release(reservation.id)  # compensate
            order_service.cancel(order.id, "PAYMENT_FAILED")
            return

        # Step 3: confirm order
        order_service.confirm(order.id, payment.id)
        notification_service.send_confirmation(order.customer_id)

CQRS: Command Query Responsibility Segregation

CQRS separates the write model (commands: create order, update inventory) from the read model (queries: get order details, list orders for customer). Why: write models are normalized for consistency; read models are denormalized for query efficiency. A write to the command side publishes an event; an event handler updates the read model. Example: writing an order creates an event “OrderPlaced”. An event handler projects this into a denormalized “order_summary” read model optimized for the order list view. Read queries hit the denormalized model — fast, no joins. The read model can be rebuilt from scratch by replaying all historical events. CQRS adds complexity and eventual consistency — only use it where query and write patterns truly diverge.

Bulkhead Pattern

The bulkhead pattern isolates failure domains so a failure in one part of the system does not cascade. Named after ship bulkheads that compartmentalize flooding. Implementation: separate thread pools for calls to different downstream services. Service A calls services B, C, and D. If service D becomes slow, the thread pool for D fills up — but thread pools for B and C remain available. Without bulkheads, all threads in the shared pool fill up waiting for slow D, and calls to B and C also fail even though B and C are healthy. In practice: configure a separate HTTP client with its own connection pool, timeout, and retry budget for each downstream service. Resilience4j (Java) and Hystrix (older) provide bulkhead and circuit breaker implementations. Service mesh (Istio) implements bulkhead at the infrastructure level without code changes.

Service Mesh

A service mesh (Istio, Linkerd) adds a sidecar proxy (Envoy) to each service pod. All inter-service traffic flows through these proxies, which implement: (1) Mutual TLS (mTLS): services authenticate each other with certificates — no service can impersonate another. (2) Load balancing: proxy routes requests across healthy instances with configurable algorithms. (3) Circuit breaking: proxy tracks error rates per upstream and opens circuits automatically. (4) Retries and timeouts: configured in the mesh control plane, not in application code. (5) Distributed tracing: proxy injects trace headers and reports spans. (6) Traffic management: A/B routing, canary deployments, traffic mirroring — all without application changes. Service mesh trades operational complexity (running Istio is non-trivial) for application simplicity.

Avoiding the Distributed Monolith

  • Services must own their data: if two services share a database, they are a monolith disguised as microservices
  • Avoid synchronous chains: if A calls B which calls C which calls D synchronously, a D failure brings down A — same as a monolith
  • Prefer events over direct calls for non-critical paths: order placed event vs order-service calling notification-service directly
  • Deploy independently: if you cannot deploy A without also deploying B, they are not truly independent services
  • The right service granularity: start with coarser services and split only when a specific team or scaling need demands it

Interview Tips

  • Decompose by business capability — DDD bounded contexts is the expected framework
  • Saga pattern is the correct answer for distributed transactions (not 2PC)
  • Bulkhead + circuit breaker together prevent cascade failures
  • Service mesh handles cross-cutting concerns at infra level — know Istio/Envoy basics
  • Warning signs of distributed monolith: shared database, synchronous chains, co-deployment

Frequently Asked Questions

What is the Saga pattern and when do you use it?

The Saga pattern manages distributed transactions across multiple microservices that each own their own database — eliminating the need for a two-phase commit (2PC) coordinator. A saga is a sequence of local transactions where each service publishes an event or message on success, triggering the next service. On failure, compensating transactions run in reverse order to undo already-completed steps. Two implementations: (1) Choreography — services react to events; each service listens for the event that triggers it and publishes its own event on completion. Simple but hard to trace and reason about as complexity grows. (2) Orchestration — a central orchestrator (a dedicated service or workflow engine like Temporal) explicitly calls each participant in sequence and handles failures by invoking compensating transactions. Easier to monitor and debug; the orchestrator's state machine documents the business process. Use Sagas for: order fulfillment (reserve inventory, charge payment, schedule shipment), travel booking (reserve flight, hotel, car — each owned by a different service), any cross-service workflow where ACID transactions are not possible.

What is CQRS and what problem does it solve?

CQRS (Command Query Responsibility Segregation) separates the write model (commands that change state) from the read model (queries that return data). In a traditional CRUD service, the same data model serves both reads and writes, creating tension: writes need normalized, consistent data to prevent anomalies; reads need denormalized, pre-joined data for performance. CQRS solves this by maintaining separate models. The write side uses a normalized database optimized for transactional integrity. The write side publishes events (via Kafka or an outbox table) that asynchronously update one or more read-optimized projections — Elasticsearch for full-text search, Redis for leaderboards, a flat SQL table for reporting. Read replicas serve queries without touching the write database. The tradeoff is eventual consistency: reads may lag writes by milliseconds to seconds. CQRS pairs naturally with Event Sourcing (storing the event log as the source of truth rather than current state), and with the Saga pattern for distributed workflows.

How does a service mesh improve microservices reliability?

A service mesh (Istio, Linkerd, Envoy) injects a sidecar proxy (Envoy) alongside every service pod. All network traffic flows through the sidecar, which provides: (1) Mutual TLS — all service-to-service communication is encrypted and authenticated without any application code change; (2) Observability — the sidecar automatically emits metrics (request rate, error rate, latency percentiles), logs, and distributed traces for every RPC call, giving you the four golden signals across every service pair without instrumenting application code; (3) Traffic management — canary deployments, weighted routing (send 5% of traffic to v2), circuit breaking (stop sending to an unhealthy instance after N consecutive failures), retries with exponential backoff — all configured via Kubernetes CRDs, not code; (4) Rate limiting and fault injection for testing. The cost is added latency (1-2ms per hop for the sidecar proxy), memory overhead (50-100MB per pod for Envoy), and operational complexity of managing the control plane. Use a service mesh when you have 10+ services and need consistent reliability policies without embedding that logic in every service.

{ “@context”: “https://schema.org”, “@type”: “FAQPage”, “mainEntity”: [ { “@type”: “Question”, “name”: “What is the Saga pattern and when do you use it?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “The Saga pattern manages distributed transactions across multiple microservices that each own their own database — eliminating the need for a two-phase commit (2PC) coordinator. A saga is a sequence of local transactions where each service publishes an event or message on success, triggering the next service. On failure, compensating transactions run in reverse order to undo already-completed steps. Two implementations: (1) Choreography — services react to events; each service listens for the event that triggers it and publishes its own event on completion. Simple but hard to trace and reason about as complexity grows. (2) Orchestration — a central orchestrator (a dedicated service or workflow engine like Temporal) explicitly calls each participant in sequence and handles failures by invoking compensating transactions. Easier to monitor and debug; the orchestrator’s state machine documents the business process. Use Sagas for: order fulfillment (reserve inventory, charge payment, schedule shipment), travel booking (reserve flight, hotel, car — each owned by a different service), any cross-service workflow where ACID transactions are not possible.” } }, { “@type”: “Question”, “name”: “What is CQRS and what problem does it solve?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “CQRS (Command Query Responsibility Segregation) separates the write model (commands that change state) from the read model (queries that return data). In a traditional CRUD service, the same data model serves both reads and writes, creating tension: writes need normalized, consistent data to prevent anomalies; reads need denormalized, pre-joined data for performance. CQRS solves this by maintaining separate models. The write side uses a normalized database optimized for transactional integrity. The write side publishes events (via Kafka or an outbox table) that asynchronously update one or more read-optimized projections — Elasticsearch for full-text search, Redis for leaderboards, a flat SQL table for reporting. Read replicas serve queries without touching the write database. The tradeoff is eventual consistency: reads may lag writes by milliseconds to seconds. CQRS pairs naturally with Event Sourcing (storing the event log as the source of truth rather than current state), and with the Saga pattern for distributed workflows.” } }, { “@type”: “Question”, “name”: “How does a service mesh improve microservices reliability?”, “acceptedAnswer”: { “@type”: “Answer”, “text”: “A service mesh (Istio, Linkerd, Envoy) injects a sidecar proxy (Envoy) alongside every service pod. All network traffic flows through the sidecar, which provides: (1) Mutual TLS — all service-to-service communication is encrypted and authenticated without any application code change; (2) Observability — the sidecar automatically emits metrics (request rate, error rate, latency percentiles), logs, and distributed traces for every RPC call, giving you the four golden signals across every service pair without instrumenting application code; (3) Traffic management — canary deployments, weighted routing (send 5% of traffic to v2), circuit breaking (stop sending to an unhealthy instance after N consecutive failures), retries with exponential backoff — all configured via Kubernetes CRDs, not code; (4) Rate limiting and fault injection for testing. The cost is added latency (1-2ms per hop for the sidecar proxy), memory overhead (50-100MB per pod for Envoy), and operational complexity of managing the control plane. Use a service mesh when you have 10+ services and need consistent reliability policies without embedding that logic in every service.” } } ] }

Companies That Ask This Question

Scroll to Top