Question 1

What are the core responsibilities of an API gateway?

Accepted Answer

An API gateway is the single entry point for all client requests in a microservices architecture. Core responsibilities: (1) Request routing: match incoming paths to backend services; rewrite paths; route by HTTP method, path pattern, or headers. (2) Authentication/authorization: validate JWT tokens or API keys centrally — extract user_id and roles as headers for backend services, which trust the gateway's assertion without re-validating. (3) Rate limiting: enforce per-API-key or per-user quotas in Redis, protecting all backend services with a centrally managed policy. (4) SSL termination: handle TLS at the gateway; internal service-to-service traffic uses plain HTTP, simplifying certificate management. (5) Load balancing: distribute requests across healthy backend instances using round-robin, least-connections, or consistent hashing; health-check instances and route around failures. (6) Observability: log every request with latency, status code, route, and user ID; inject distributed trace IDs.

Question 2

How does an API gateway implement distributed rate limiting with Redis?

Accepted Answer

With multiple gateway instances, each must share rate limit state — a counter in one process doesn't help if another gateway instance handles the next request. Redis-based sliding window: for each API key, maintain a sorted set of request timestamps. Lua script (atomic): ZADD ratelimit:{key} {now_ms} {uuid}; ZREMRANGEBYSCORE ratelimit:{key} 0 {now_ms - window_ms}; count = ZCARD ratelimit:{key}; if count > limit then reject. All in a single Lua script — atomic execution prevents race conditions. The Lua script approach is exact but adds a Redis round-trip per request (~1ms). For very high throughput (>100K req/s): use local in-process token bucket counters (no Redis per request) with periodic sync to Redis for cross-instance coordination. Allow up to 1-2% over-counting at window boundaries in exchange for eliminating per-request Redis calls. Token bucket parameters: capacity (burst limit), refill_rate (sustained limit per second). Configure per API plan: free tier at 60 req/min, pro tier at 600 req/min.

Question 3

How does the Backend for Frontend (BFF) pattern work at the API gateway?

Accepted Answer

The BFF pattern addresses mobile/web performance: loading a dashboard requires data from 5+ services. Without BFF: 5 serial or parallel round trips from the client, each adding 100-300ms mobile latency. With BFF: the gateway fans out to all backend services in parallel and returns one merged response. Implementation: define aggregation endpoints (/dashboard → calls /users/{id}, /notifications, /feed, /ads, /recommendations in parallel). The gateway issues all 5 HTTP/gRPC calls simultaneously with a shared deadline (e.g., 200ms total). If one service times out: include its data as null in the response rather than failing the whole request. This degrades gracefully — the dashboard loads with missing ads rather than failing completely. Response merging: combine JSON responses from all services into the expected client schema. Field projection: the gateway filters fields the client doesn't need, reducing response payload size (mobile bandwidth optimization). The BFF layer can also handle protocol translation — REST from clients, gRPC to internal services.

Question 4

How does a circuit breaker work at the API gateway level?

Accepted Answer

A circuit breaker prevents cascading failures: when a backend service is degraded, stop sending requests rather than queuing them (which consumes threads and worsens the outage). Three states: (1) Closed (normal): requests flow through. Count failures; if failure rate exceeds threshold (e.g., >50% errors in 30 seconds), trip to Open. (2) Open (tripped): immediately reject all requests to this service with a 503 or return a cached response. No requests reach the struggling backend — it gets recovery time. After a timeout (30-60 seconds), transition to Half-Open. (3) Half-Open: allow a limited number of probe requests through. If they succeed: transition back to Closed. If they fail: return to Open. Per-service circuit breakers: each backend service (user-service, order-service) has its own circuit breaker. A failure in user-service doesn't trip the circuit for order-service. The gateway tracks circuit state in Redis (shared across gateway instances). Metrics to watch: time-to-trip (how quickly the circuit opens on degradation) and time-to-recover (how long the backend needs with zero load before recovering).

API Gateway: Low-Level Design

Core Responsibilities

Rate Limiting Implementation

Authentication at the Gateway

Request Aggregation (BFF Pattern)

Observability and Load Balancing

Practice at Top Companies