An API gateway is the single entry point for all client requests in a microservices architecture. It handles cross-cutting concerns — authentication, rate limiting, routing, protocol translation, and observability — so individual services do not need to implement them. This guide covers API gateway architecture, features, and production deployment patterns — essential for system design and platform engineering interviews.
Why API Gateways Exist
Without a gateway, clients must: know the address of every microservice (service discovery burden on the client), handle authentication with each service separately, deal with different protocols (some services use REST, others gRPC), and make multiple requests to compose data from multiple services. An API gateway centralizes these concerns: (1) Single entry point — clients call one domain (api.example.com). The gateway routes to the correct backend service. (2) Authentication/authorization — verify JWT tokens or API keys once at the gateway. Backend services receive a pre-authenticated request with user context. (3) Rate limiting — enforce per-client, per-endpoint rate limits at the gateway before requests reach backend services. (4) Request/response transformation — translate between protocols (REST to gRPC), aggregate responses from multiple services (BFF pattern), and add/remove headers. (5) Observability — log all requests, collect metrics (request rate, error rate, latency), and propagate tracing headers. (6) TLS termination — decrypt HTTPS at the gateway, forward plain HTTP to backend services.
API Gateway Features
Core features: (1) Request routing — route based on URL path (/users -> user-service, /orders -> order-service), HTTP method, headers, or query parameters. (2) Load balancing — distribute requests across multiple instances of a backend service. Round-robin, least connections, or weighted. (3) Authentication — validate JWT tokens (verify signature, check expiration, extract claims), API key lookup, OAuth 2.0 token introspection, or mutual TLS. (4) Rate limiting — token bucket or sliding window per API key, per user, or per IP. Return 429 with Retry-After header. (5) Request/response transformation — add headers (X-Request-ID for tracing), remove sensitive headers (internal service headers), transform payloads (rename fields for backward compatibility). (6) Caching — cache GET responses at the gateway with configurable TTL. Reduces load on backend services for read-heavy endpoints. (7) Circuit breaking — if a backend service is unhealthy, fail fast at the gateway instead of queuing requests. (8) Canary routing — send a percentage of traffic to a new version (header-based or percentage-based). (9) CORS handling — add Cross-Origin Resource Sharing headers at the gateway instead of in each service.
Kong, AWS API Gateway, and Envoy
Kong: open-source API gateway built on Nginx/OpenResty. Plugin architecture: authentication (JWT, OAuth, key-auth), rate limiting, logging, transformations, and custom Lua plugins. Runs as a reverse proxy with PostgreSQL or Cassandra for configuration storage. Kong Konnect: managed cloud version. Best for: self-hosted deployments needing extensibility, multi-cloud environments, and teams wanting open-source control. AWS API Gateway: fully managed, serverless. Two types: REST API (feature-rich: request validation, WAF integration, caching, usage plans) and HTTP API (simpler, cheaper, lower latency). Integrates natively with Lambda, ECS, and ALB. Pay per request ($1 per million for HTTP API). Best for: AWS-native architectures, Lambda-based APIs, and teams wanting zero infrastructure management. Envoy: high-performance proxy originally built by Lyft. Not a traditional API gateway but used as one in service mesh architectures (Istio). Supports HTTP/2, gRPC natively, advanced load balancing (zone-aware, outlier detection), and extensive observability (built-in Prometheus metrics, distributed tracing). Best for: Kubernetes environments, service mesh, and gRPC-heavy architectures. In practice, many architectures use multiple layers: AWS API Gateway for external traffic (public APIs) and Envoy/Istio for internal service-to-service traffic.
Backend for Frontend (BFF) Pattern
The BFF pattern creates a dedicated API gateway per client type. A mobile app BFF aggregates data differently from a web app BFF. Mobile: needs less data per request (smaller payloads for bandwidth), more aggressive caching, and push notification integration. Web: needs richer data, supports real-time updates via WebSocket, and handles complex UI compositions. Without BFF: a single generic API forces mobile and web to receive the same payloads and make the same number of requests. Mobile over-fetches data; web under-fetches and makes extra round-trips. With BFF: the mobile BFF aggregates data from multiple microservices into a single optimized response. The web BFF composes a different, richer response. Each BFF is maintained by the team that builds the corresponding frontend. Implementation: each BFF is a lightweight API gateway (or a GraphQL layer) that: receives one request from the client, fans out to multiple backend services in parallel, aggregates the responses, transforms into the client-specific format, and returns a single response. This reduces client complexity and round-trips (one request instead of many).
API Gateway Anti-Patterns
Common mistakes: (1) Business logic in the gateway — the gateway should handle cross-cutting concerns (auth, rate limiting, routing), not business logic. Putting order validation or pricing calculation in the gateway couples it to business changes and creates a monolithic bottleneck. (2) Single point of failure — the gateway handles all traffic. Run multiple instances behind a load balancer. Auto-scale based on traffic. If the gateway goes down, every service is unreachable. (3) Over-transformation — excessive request/response transformation in the gateway adds latency and makes debugging harder (the request the service receives is different from what the client sent). Keep transformations minimal. (4) Gateway as a silver bullet — some teams route every request through every gateway feature (auth, rate limit, transform, cache, log). Not every endpoint needs every feature. Configure per-route. (5) Not versioning — the gateway is the natural place for API versioning (/v1/users -> user-service-v1, /v2/users -> user-service-v2). Forgetting versioning leads to breaking changes. (6) Ignoring gateway latency — every feature adds latency (1-5ms per feature). A gateway with auth + rate limit + transform + logging adds 5-20ms. For latency-sensitive paths, minimize gateway processing.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What does an API gateway do in a microservices architecture?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”An API gateway is the single entry point for all client requests. It handles cross-cutting concerns so individual services do not implement them: (1) Request routing — route /users to user-service, /orders to order-service based on URL path, headers, or method. (2) Authentication — verify JWT tokens or API keys once. Backend services receive pre-authenticated requests. (3) Rate limiting — enforce per-client limits before requests reach backends. (4) TLS termination — decrypt HTTPS, forward HTTP internally. (5) Request transformation — add tracing headers, remove sensitive headers, translate protocols (REST to gRPC). (6) Observability — log requests, collect metrics, propagate trace context. (7) Load balancing — distribute across service instances. Without a gateway, every client must handle service discovery, authentication, and protocol differences independently.”}},{“@type”:”Question”,”name”:”How do you choose between Kong, AWS API Gateway, and Envoy?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Kong: open-source, built on Nginx. Plugin architecture for auth, rate limiting, logging. Self-hosted with PostgreSQL/Cassandra config store. Best for: multi-cloud, self-hosted control, extensibility via Lua plugins. AWS API Gateway: fully managed, serverless. REST API (feature-rich) or HTTP API (cheaper, simpler). Pay per request. Native Lambda/ECS integration. Best for: AWS-native architectures, Lambda APIs, zero infrastructure management. Envoy: high-performance proxy from Lyft. Native HTTP/2, gRPC, advanced load balancing, built-in Prometheus metrics. Core of Istio service mesh. Best for: Kubernetes, service mesh, gRPC-heavy architectures. Common pattern: AWS API Gateway for external/public APIs + Envoy/Istio for internal service-to-service traffic. This provides managed external gateway with fine-grained internal traffic control.”}},{“@type”:”Question”,”name”:”What is the Backend for Frontend (BFF) pattern?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”BFF creates a dedicated API gateway per client type. Mobile apps need smaller payloads (bandwidth), more caching, push notifications. Web apps need richer data, WebSocket support, complex compositions. Without BFF: one generic API serves all clients. Mobile over-fetches; web under-fetches and makes extra round-trips. With BFF: the mobile BFF receives one request, fans out to multiple microservices in parallel, aggregates responses into a mobile-optimized payload. The web BFF composes a different, richer response. Each BFF is owned by the team building that frontend. Benefits: fewer client round-trips (one request instead of many), client-specific optimization, and frontend teams control their API contract. Implementation: lightweight gateway or GraphQL layer per client type. The BFF handles aggregation and transformation; backend services remain client-agnostic.”}},{“@type”:”Question”,”name”:”What are common API gateway anti-patterns?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Mistakes to avoid: (1) Business logic in the gateway — the gateway handles cross-cutting concerns (auth, rate limiting), not business logic. Order validation in the gateway creates coupling and a monolithic bottleneck. (2) Single point of failure — run multiple gateway instances behind a load balancer. Auto-scale. If the gateway dies, all services are unreachable. (3) Over-transformation — excessive request/response modification adds latency and makes debugging harder. Keep transformations minimal. (4) Applying every feature to every route — not every endpoint needs auth + rate limit + transform + cache + logging. Configure per-route to minimize unnecessary latency. (5) Ignoring gateway latency — each feature adds 1-5ms. Auth + rate limit + transform + logging = 5-20ms overhead. For latency-sensitive paths, minimize processing. (6) No API versioning — the gateway is the natural versioning point (/v1/users, /v2/users). Forgetting leads to breaking changes.”}}]}