System Design: API Gateway Architecture — Kong, AWS API Gateway, Envoy, Authentication, Rate Limiting, Routing

An API gateway is the single entry point for all client requests in a microservices architecture. It handles cross-cutting concerns — authentication, rate limiting, routing, protocol translation, and observability — so individual services do not need to implement them. This guide covers API gateway architecture, features, and production deployment patterns — essential for system design and platform engineering interviews.

Why API Gateways Exist

Without a gateway, clients must: know the address of every microservice (service discovery burden on the client), handle authentication with each service separately, deal with different protocols (some services use REST, others gRPC), and make multiple requests to compose data from multiple services. An API gateway centralizes these concerns: (1) Single entry point — clients call one domain (api.example.com). The gateway routes to the correct backend service. (2) Authentication/authorization — verify JWT tokens or API keys once at the gateway. Backend services receive a pre-authenticated request with user context. (3) Rate limiting — enforce per-client, per-endpoint rate limits at the gateway before requests reach backend services. (4) Request/response transformation — translate between protocols (REST to gRPC), aggregate responses from multiple services (BFF pattern), and add/remove headers. (5) Observability — log all requests, collect metrics (request rate, error rate, latency), and propagate tracing headers. (6) TLS termination — decrypt HTTPS at the gateway, forward plain HTTP to backend services.

API Gateway Features

Core features: (1) Request routing — route based on URL path (/users -> user-service, /orders -> order-service), HTTP method, headers, or query parameters. (2) Load balancing — distribute requests across multiple instances of a backend service. Round-robin, least connections, or weighted. (3) Authentication — validate JWT tokens (verify signature, check expiration, extract claims), API key lookup, OAuth 2.0 token introspection, or mutual TLS. (4) Rate limiting — token bucket or sliding window per API key, per user, or per IP. Return 429 with Retry-After header. (5) Request/response transformation — add headers (X-Request-ID for tracing), remove sensitive headers (internal service headers), transform payloads (rename fields for backward compatibility). (6) Caching — cache GET responses at the gateway with configurable TTL. Reduces load on backend services for read-heavy endpoints. (7) Circuit breaking — if a backend service is unhealthy, fail fast at the gateway instead of queuing requests. (8) Canary routing — send a percentage of traffic to a new version (header-based or percentage-based). (9) CORS handling — add Cross-Origin Resource Sharing headers at the gateway instead of in each service.

Kong, AWS API Gateway, and Envoy

Kong: open-source API gateway built on Nginx/OpenResty. Plugin architecture: authentication (JWT, OAuth, key-auth), rate limiting, logging, transformations, and custom Lua plugins. Runs as a reverse proxy with PostgreSQL or Cassandra for configuration storage. Kong Konnect: managed cloud version. Best for: self-hosted deployments needing extensibility, multi-cloud environments, and teams wanting open-source control. AWS API Gateway: fully managed, serverless. Two types: REST API (feature-rich: request validation, WAF integration, caching, usage plans) and HTTP API (simpler, cheaper, lower latency). Integrates natively with Lambda, ECS, and ALB. Pay per request ($1 per million for HTTP API). Best for: AWS-native architectures, Lambda-based APIs, and teams wanting zero infrastructure management. Envoy: high-performance proxy originally built by Lyft. Not a traditional API gateway but used as one in service mesh architectures (Istio). Supports HTTP/2, gRPC natively, advanced load balancing (zone-aware, outlier detection), and extensive observability (built-in Prometheus metrics, distributed tracing). Best for: Kubernetes environments, service mesh, and gRPC-heavy architectures. In practice, many architectures use multiple layers: AWS API Gateway for external traffic (public APIs) and Envoy/Istio for internal service-to-service traffic.

Backend for Frontend (BFF) Pattern

The BFF pattern creates a dedicated API gateway per client type. A mobile app BFF aggregates data differently from a web app BFF. Mobile: needs less data per request (smaller payloads for bandwidth), more aggressive caching, and push notification integration. Web: needs richer data, supports real-time updates via WebSocket, and handles complex UI compositions. Without BFF: a single generic API forces mobile and web to receive the same payloads and make the same number of requests. Mobile over-fetches data; web under-fetches and makes extra round-trips. With BFF: the mobile BFF aggregates data from multiple microservices into a single optimized response. The web BFF composes a different, richer response. Each BFF is maintained by the team that builds the corresponding frontend. Implementation: each BFF is a lightweight API gateway (or a GraphQL layer) that: receives one request from the client, fans out to multiple backend services in parallel, aggregates the responses, transforms into the client-specific format, and returns a single response. This reduces client complexity and round-trips (one request instead of many).

API Gateway Anti-Patterns

Common mistakes: (1) Business logic in the gateway — the gateway should handle cross-cutting concerns (auth, rate limiting, routing), not business logic. Putting order validation or pricing calculation in the gateway couples it to business changes and creates a monolithic bottleneck. (2) Single point of failure — the gateway handles all traffic. Run multiple instances behind a load balancer. Auto-scale based on traffic. If the gateway goes down, every service is unreachable. (3) Over-transformation — excessive request/response transformation in the gateway adds latency and makes debugging harder (the request the service receives is different from what the client sent). Keep transformations minimal. (4) Gateway as a silver bullet — some teams route every request through every gateway feature (auth, rate limit, transform, cache, log). Not every endpoint needs every feature. Configure per-route. (5) Not versioning — the gateway is the natural place for API versioning (/v1/users -> user-service-v1, /v2/users -> user-service-v2). Forgetting versioning leads to breaking changes. (6) Ignoring gateway latency — every feature adds latency (1-5ms per feature). A gateway with auth + rate limit + transform + logging adds 5-20ms. For latency-sensitive paths, minimize gateway processing.

Scroll to Top