An API gateway is the single entry point for all client requests in a microservices architecture. It sits in front of backend services and handles cross-cutting concerns: authentication, rate limiting, request routing, protocol translation, response caching, and observability. Kong, AWS API Gateway, and Nginx are common implementations. The gateway decouples clients from the internal service topology — clients talk to one endpoint regardless of how many services are behind it.
Core Responsibilities
Request routing: match incoming paths (/users/*, /orders/*) to backend services. Route based on HTTP method, path, headers, or query parameters. Support path rewriting (external /api/v2/users → internal /users). Authentication and authorization: validate JWT tokens or API keys centrally — backend services trust the gateway’s assertion and don’t re-verify. The gateway extracts claims (user_id, roles) and adds them as request headers for backend services. Rate limiting: per-API-key, per-user, or per-IP request quotas. Enforce at the gateway to protect all backend services with a single policy. SSL termination: the gateway handles TLS; internal service-to-service communication uses HTTP. Simplifies certificate management — one certificate at the gateway rather than per-service. Request/response transformation: translate between protocols (REST to gRPC), add/remove headers, aggregate multiple service responses into one.
Rate Limiting Implementation
The gateway enforces rate limits across a distributed cluster — multiple gateway instances must share rate limit state. Redis-based token bucket or sliding window algorithm: each API key has a counter in Redis. Sliding window log: ZADD ratelimit:{key} {now} {request_id}; ZREMRANGEBYSCORE ratelimit:{key} 0 {now-window}; count = ZCARD ratelimit:{key}; if count >= limit, reject. All in a Lua script for atomicity. Token bucket: a Lua script checks and decrements available tokens; tokens replenish at a fixed rate. Redis can handle rate limit checks at ~100K/second per instance. For very high throughput: use a local in-process token bucket (counters in gateway process memory) with periodic sync to Redis — allows slight over-counting at window boundaries but avoids a Redis round-trip per request. Circuit breaker at the gateway: if a backend service returns 5xx at a high rate, stop sending requests for 30 seconds (open the circuit). Return a cached response or 503 to clients. Protects the backend from being overwhelmed during recovery.
Authentication at the Gateway
JWT validation at the gateway: verify the signature (HMAC or RSA) and check claims (exp, iss, aud). If valid, extract user_id, tenant_id, roles and forward as headers (X-User-ID, X-Roles). Backend services trust these headers without re-verifying the token — they assume the gateway is a trusted internal component (validate with mTLS or network policy that only the gateway can call backend services). API key validation: hash the incoming key (SHA-256), look up the hash in Redis or a fast database to find the associated user/plan. The raw key is never stored — only its hash. Key rotation: issue new keys, set a grace period where old keys still work, then revoke. OAuth2 token introspection: the gateway calls the OAuth2 server to validate opaque tokens (/introspect endpoint). Cache introspection results by token for the token’s remaining TTL — avoid calling the OAuth server on every request.
Request Aggregation (BFF Pattern)
A mobile app loading a dashboard needs data from 5 services: user profile, notifications, feed, ads, recommendations. Without a gateway: 5 round trips from mobile → 5 services, multiplied by mobile latency (100-300ms each). Backend for Frontend (BFF): the gateway (or a thin BFF service) fans out to all 5 services in parallel, merges the responses, and returns one response. Mobile round trips: 1. Implementation: define an aggregation rule (endpoint → list of backend calls). The gateway makes parallel HTTP/gRPC calls to each service, waits for all (with a timeout), and merges the results. Handle partial failure: if one service fails or times out, return its data as null in the aggregated response — don’t fail the whole request. Response caching: cache aggregated responses at the gateway for fast repeated loads (cache by user_id + endpoint, TTL based on data freshness requirements).
Observability and Load Balancing
The gateway is the best place to collect unified observability data: every request passing through is logged with latency, status code, route, and API key. Distributed tracing: the gateway injects a trace ID header (X-Trace-ID) if not present; backend services propagate it. All traces are correlated by trace ID in the observability system. Load balancing: the gateway maintains a list of healthy instances for each backend service (via service discovery — Consul, Kubernetes service registry). Load balancing algorithms: round-robin (equal distribution), least-connections (sends to the instance with fewest active requests — good for variable-duration requests), consistent hashing by session ID (sticky sessions for stateful services). Health checks: the gateway periodically probes backend instances (/health endpoint); removes unhealthy instances from the load balancer pool within seconds. Canary deployments: route 5% of traffic to the new service version by weight, observe error rates, gradually increase.
Practice at Top Companies
This topic appears in system design interviews at: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering, Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering, Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence, Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering, LinkedIn Interview Guide 2026: Social Graph Engineering, Feed Ranking, and Professional Network Scale, Airbnb Interview Guide 2026: Search Systems, Trust and Safety, and Full-Stack Engineering, Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture, Anthropic Interview Guide 2026: Process, Questions, and AI Safety, Atlassian Interview Guide, Coinbase Interview Guide, Shopify Interview Guide, Snap Interview Guide, Lyft Interview Guide 2026: Rideshare Engineering, Real-Time Dispatch, and Safety Systems, Stripe Interview Guide 2026: Process, Bug Bash Round, and Payment Systems.