API gateways and service meshes are the networking backbone of modern microservices architectures. They handle cross-cutting concerns — authentication, rate limiting, routing, observability — so individual services don’t have to. Understanding when to use each is a common senior engineering interview topic.
API Gateway: North-South Traffic
An API gateway handles traffic entering the system from external clients (internet → services). It is the single entry point for all external requests.
External Client
│
▼
API Gateway (Kong, AWS API Gateway, nginx, Envoy)
│ Responsibilities:
│ ├── TLS termination
│ ├── Authentication (JWT validation, API keys, OAuth)
│ ├── Rate limiting (per-client, per-endpoint)
│ ├── Request routing (path → service)
│ ├── Request/response transformation
│ ├── Load balancing (to service instances)
│ └── Observability (access logs, metrics, tracing)
│
├── /api/v1/users → User Service
├── /api/v1/orders → Order Service
└── /api/v1/products → Product Service
Authentication at the Gateway
JWT validation flow:
Client → "Authorization: Bearer eyJhbGc..."
Gateway:
1. Parse JWT header → algorithm (RS256/ES256)
2. Fetch public key from JWKS endpoint (cached)
3. Verify signature
4. Check exp, iss, aud claims
5. Extract user_id, tenant_id, scopes
6. Forward to service as X-User-ID, X-Tenant-ID headers
(services trust these headers — no re-validation)
API Key auth:
Client → "X-API-Key: sk_live_xxxxx"
Gateway → hash(key) → lookup in API key store (Redis)
→ return associated tenant_id + rate limits
Rate Limiting at the Gateway
Token bucket implementation (Redis + Lua):
Key: rate_limit:{client_id}:{endpoint}
Per-request Lua script (atomic):
current = GET key
if current < limit:
INCR key; EXPIRE key window_seconds
allow request
else:
return 429 Too Many Requests
Response headers:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1713272400 (Unix epoch when window resets)
Retry-After: 15 (seconds until retry allowed)
Distributed rate limiting (multiple gateway instances):
All gateways share Redis cluster for consistent counting
Trade-off: Redis round-trip adds ~1ms per request
Alternative: approximate counting with local buckets + periodic sync
Service Mesh: East-West Traffic
A service mesh handles traffic between services within the cluster. It operates at the infrastructure layer without requiring application code changes.
Service A ─── Envoy sidecar ──► Envoy sidecar ─── Service B
│ │
└────────────────────┘
Control plane
(Istio / Linkerd / Consul)
Sidecar proxy handles:
├── mTLS: automatic certificate rotation, mutual auth
├── Load balancing: round-robin, least-request, zone-aware
├── Circuit breaking: open circuit after N failures
├── Retries: automatic retry with exponential backoff
├── Timeouts: per-route timeout enforcement
├── Observability: metrics, distributed traces (no app changes)
└── Traffic shaping: canary deployment, A/B testing
mTLS: Zero-Trust Service Identity
Without service mesh:
Service A → Service B (no authentication — any pod can call any service)
With mTLS (mutual TLS):
Istio CA issues X.509 certificate to each service (SPIFFE format)
Certificate: "spiffe://cluster.local/ns/default/sa/order-service"
Service A presents cert → Service B verifies A's identity
Service B presents cert → Service A verifies B's identity
Channel is encrypted end-to-end
AuthorizationPolicy (Istio):
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: order-service-policy
spec:
selector:
matchLabels:
app: order-service
rules:
- from:
- source:
principals: ["cluster.local/ns/default/sa/checkout-service"]
to:
- operation:
methods: ["POST"]
paths: ["/api/orders"]
Circuit Breaker Pattern
States: CLOSED → OPEN → HALF-OPEN
CLOSED (normal): requests pass through; failure rate tracked
If failure rate > threshold (e.g., 50% in 10s):
→ OPEN: fail fast, return error immediately (no actual call)
OPEN: wait for recovery period (e.g., 30s)
→ HALF-OPEN: allow small fraction of requests through
HALF-OPEN: test if dependency has recovered
If requests succeed → CLOSED (resume normal operation)
If requests fail → OPEN again (wait longer)
Service mesh implementation (Istio DestinationRule):
trafficPolicy:
outlierDetection:
consecutive5xxErrors: 5 # open after 5 consecutive errors
interval: 10s # evaluation window
baseEjectionTime: 30s # how long to eject unhealthy host
maxEjectionPercent: 50 # eject at most 50% of hosts
API Gateway vs Service Mesh Comparison
| Concern | API Gateway | Service Mesh |
|---|---|---|
| Traffic direction | North-south (external → internal) | East-west (service → service) |
| Authentication | External JWT/API key validation | Internal mTLS identity |
| Rate limiting | Per-client/IP global limits | Service-to-service limits |
| Observability | Access logs, API analytics | Service dependency map, latency breakdown |
| Implementation | Application-aware (paths, headers) | Infrastructure layer (transparent) |
| Typical tools | Kong, AWS API GW, nginx, Traefik | Istio, Linkerd, Consul Connect |
BFF Pattern: Backend for Frontend
Problem: one API must serve multiple clients with different needs
Mobile app: needs lightweight responses, push notifications
Web app: can handle richer data, SSE instead of polling
Partner API: needs different auth, rate limits, data format
Solution: BFF (Backend For Frontend)
Mobile Gateway → Mobile-optimized API → Services
Web Gateway → Web-optimized API → Services
Partner Gateway → Partner API → Services
Benefits:
- Each gateway tailored to client needs (field selection, pagination)
- Independent versioning and deprecation
- Client-specific auth strategies
- Different rate limits per client type
Implementation: GraphQL as BFF (Apollo Federation)
Client sends GraphQL query → BFF resolves to multiple service calls
→ assembles composite response → returns only requested fields
Interview Discussion Points
- When do you need a service mesh? When you have 10+ microservices and need consistent observability, mTLS, and traffic management without modifying every service. For simple architectures (< 5 services), the operational complexity of Istio outweighs the benefits — use a shared middleware library instead.
- How to handle API versioning? URL path versioning (/v1/, /v2/) is most explicit. Header-based versioning (Accept: application/vnd.api+json;version=2) is RESTful but harder to route. Keep at most 2 major versions in production simultaneously; deprecate old versions with sunset headers and 12-month migration periods.
- Service mesh overhead: Envoy sidecar adds ~5-10ms per hop and 50-100MB RAM per pod. Justify with the operational savings on observability and security. Ambient mesh (Istio 1.15+) removes per-pod sidecars — uses node-level DaemonSet instead, reducing overhead significantly.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the difference between an API gateway and a service mesh?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “An API gateway handles north-south traffic u2014 requests entering the system from external clients. It enforces authentication (JWT validation, API keys), rate limiting per client, request routing, and API versioning. A service mesh handles east-west traffic u2014 communication between microservices within the cluster. It provides mTLS for service-to-service authentication, circuit breaking, automatic retries, distributed tracing, and traffic shaping (canary deployments) without requiring application code changes. Both can coexist: the API gateway is the external entry point, while the service mesh governs internal service communication. They are complementary, not alternatives.”
}
},
{
“@type”: “Question”,
“name”: “How does mTLS in a service mesh improve security over traditional service-to-service calls?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Without mTLS, any pod in the cluster can call any service without authentication u2014 a compromised pod can impersonate any service. With mTLS, each service has a cryptographic identity (X.509 certificate issued by the mesh CA, in SPIFFE format). Both sides of a connection present certificates and verify each other’s identity before communication begins. This enables AuthorizationPolicies: “only the checkout-service can call POST /orders on the order-service.” Certificate rotation is automatic (every 24h by default in Istio), removing the operational burden of manual certificate management. mTLS also encrypts all inter-service traffic, protecting against network-level eavesdropping within the cluster.”
}
},
{
“@type”: “Question”,
“name”: “What is the Backend for Frontend (BFF) pattern and when is it useful?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “BFF creates a dedicated API layer for each client type (mobile app, web app, partner API) rather than one general-purpose API. This solves the problem of different clients needing different data shapes, response sizes, auth strategies, and rate limits. A mobile BFF can aggregate multiple service calls into one response optimized for mobile bandwidth constraints. A web BFF can use server-sent events instead of polling. A partner BFF enforces different rate limits and data access controls without affecting internal clients. GraphQL is commonly used as a BFF because it allows clients to request exactly the fields they need, with the BFF resolving those fields from multiple downstream services.”
}
}
]
}