What is an API Marketplace?
An API marketplace (like Stripe, Twilio, RapidAPI, or AWS API Gateway) lets external developers discover, subscribe to, and consume APIs, with metered billing, rate limiting, and analytics. Core components: Developer Portal: API discovery, documentation, interactive sandbox, API key management. API Gateway: authenticates requests, enforces rate limits, routes to backend services, collects usage metrics. Billing Engine: meters API calls, applies pricing tiers, generates invoices. Analytics: per-API, per-consumer usage dashboards. The key system design challenges: rate limiting at scale (millions of requests per second), accurate metered billing (no missed or double-counted calls), and low-latency gateway processing (< 5ms overhead per request).
API Key Authentication and Routing
class APIGateway:
def handle_request(self, request: Request) -> Response:
# 1. Extract and validate API key
api_key = request.headers.get("X-API-Key")
if not api_key:
return Response(401, "Missing API key")
# 2. Lookup key in Redis (cached from DB)
# Key: apikey:{hash(api_key)} -> {consumer_id, plan_id, is_active}
key_data = self.redis.get(f"apikey:{hash(api_key)}")
if not key_data or not key_data["is_active"]:
return Response(401, "Invalid or inactive API key")
# 3. Rate limit check (see below)
if not self.rate_limiter.allow(key_data["consumer_id"],
key_data["plan_id"]):
return Response(429, "Rate limit exceeded")
# 4. Route to backend
backend = self.router.get_backend(request.path)
response = backend.forward(request)
# 5. Async usage logging (fire and forget)
self.usage_logger.log_async(UsageEvent(
consumer_id=key_data["consumer_id"],
api_id=backend.api_id,
plan_id=key_data["plan_id"],
endpoint=request.path,
method=request.method,
status_code=response.status_code,
latency_ms=response.latency_ms,
timestamp=datetime.utcnow()
))
return response
Rate Limiting: Token Bucket with Redis
Token bucket algorithm in Redis Lua (atomic, no race conditions): each consumer has a bucket with a capacity and refill rate (defined by their plan). The Lua script runs atomically on Redis — no other commands can interleave.
-- Redis Lua script: token_bucket.lua
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2]) -- tokens per second
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])
local bucket = redis.call("HMGET", key, "tokens", "last_refill")
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now
-- Refill tokens based on elapsed time
local elapsed = now - last_refill
tokens = math.min(capacity, tokens + elapsed * refill_rate)
if tokens >= requested then
redis.call("HMSET", key, "tokens", tokens - requested, "last_refill", now)
redis.call("EXPIRE", key, 86400)
return 1 -- allowed
else
redis.call("HMSET", key, "tokens", tokens, "last_refill", now)
redis.call("EXPIRE", key, 86400)
return 0 -- denied
end
Metered Billing Pipeline
Every API call is a billing event. Pipeline: API gateway logs usage events to Kafka (fire-and-forget, < 1ms). A Kafka consumer aggregates events in 1-minute micro-batches and writes to a usage_events table in ClickHouse (or BigQuery). Monthly invoice generation: at billing cycle end, query total calls per (consumer, api, tier) from ClickHouse. Apply pricing rules: first N calls free, next M calls at $0.001 each, calls over that at $0.0005 each (tiered pricing). Generate Invoice record in Postgres and charge via Stripe. For high-volume APIs: pre-aggregate usage counters in Redis (INCR usage:{consumer_id}:{api_id}:{day}) for real-time usage dashboards. Reconcile Redis counters against ClickHouse aggregates at billing time to catch any Kafka lag discrepancies.
API Analytics and Developer Dashboard
Real-time metrics per consumer per API: requests per minute, error rate (4xx/5xx), p50/p99 latency, top endpoints, geographic distribution. Architecture: ClickHouse stores raw usage events with sub-second ingestion lag (columnar, optimized for analytical queries). Pre-aggregate materialized views for common queries: requests_per_hour, error_rate_by_endpoint, latency_percentiles_daily. Developer dashboard queries pre-aggregated tables for fast response (< 100ms). SLA monitoring: per-API uptime and latency guarantees are computed from the same usage events. Alert when SLA breaches threshold. API deprecation: track usage of deprecated API versions per consumer; notify consumers with high usage before sunset date.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does token bucket rate limiting differ from fixed window rate limiting?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Fixed window: count requests in discrete time windows (e.g., 100 requests per minute). Reset the counter at the start of each window. Problem: a burst of 100 requests at 12:00:59 and another 100 at 12:01:01 pass the limit, but 200 requests arrive in 2 seconds — the boundary effect. Token bucket: a bucket accumulates tokens at a fixed rate (e.g., 100 tokens per minute = 1.67 tokens/second). Each request consumes one token. Tokens cap at the bucket capacity. Bursting: if no requests arrive for a while, tokens accumulate up to capacity, allowing a burst. The burst then drains the bucket — requests are rate-limited until tokens refill. Token bucket accurately models "N requests per time period" without boundary effects. It also allows short bursts (up to capacity) which is more user-friendly than fixed window's abrupt resets.”}},{“@type”:”Question”,”name”:”Why is a Redis Lua script used for rate limiting instead of a regular Redis transaction?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A rate limit check-and-update involves multiple Redis commands: GET the current token count, compute the new count, SET the updated value. Between the GET and SET, another request from the same consumer could arrive and read the same (pre-decrement) token count, causing both requests to pass even if only one token remains. Redis MULTI/EXEC (optimistic transaction) retries if the key changes between WATCH and EXEC, but under high concurrency this can loop many times. A Lua script runs atomically on the Redis server: all commands in the script execute as a single unit — no other command can interleave. This is the correct solution for rate limiting. The Lua script reads, computes, and writes in one atomic operation, eliminating the race condition without client-side retry loops.”}},{“@type”:”Question”,”name”:”How do you implement tiered pricing for API calls accurately at scale?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Tiered pricing example: first 1M calls/month free, next 9M at $0.001 each, above 10M at $0.0005 each. Accurate billing requires counting every call and applying tier breakpoints. At scale (billions of calls): counting in real time in a relational DB is too slow. Architecture: (1) API gateway increments a Redis counter (INCR usage:{consumer}:{api}:{month}) on each call — O(1), sub-millisecond. (2) Usage events are also sent to Kafka for durability (Redis could lose data on crash). (3) A monthly batch job reads the final counter from Kafka aggregations (ClickHouse or BigQuery) for billing accuracy — Redis for real-time display, event log for billing. (4) Apply tiered pricing: if total_calls <= 1M: charge $0. If 1M < calls <= 10M: charge (calls – 1M) * $0.001. If calls > 10M: charge 9M * $0.001 + (calls – 10M) * $0.0005.”}},{“@type”:”Question”,”name”:”How do you handle API key rotation without downtime?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”API key rotation flow: (1) Developer generates a new API key in the portal. (2) Old key and new key are both active simultaneously (grace period: 24-48 hours). (3) Developer updates their application to use the new key. (4) Developer revokes the old key. The system must support multiple active keys per consumer for the grace period. Database: consumer_api_keys table (consumer_id, key_hash, status, created_at, revoked_at). Allow multiple ACTIVE keys per consumer. On revocation: set status=REVOKED, update revoked_at. Redis cache: cache key lookups with a short TTL (60 seconds). On revocation: explicitly invalidate the Redis cache entry (DEL apikey:{hash}). This ensures revoked keys stop working within 60 seconds without waiting for cache expiry.”}},{“@type”:”Question”,”name”:”What is an API gateway and how does it differ from a reverse proxy?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A reverse proxy (Nginx, HAProxy) routes HTTP requests to backend servers: load balancing, SSL termination, connection pooling. It is protocol-level — it does not understand API concepts. An API gateway sits on top of a reverse proxy layer and adds API-specific functionality: authentication (API key, OAuth JWT validation), rate limiting per consumer, request/response transformation (field filtering, protocol translation), API versioning (route /v1 and /v2 to different backends), usage metering, developer portal integration, circuit breaking, and API analytics. In practice: API gateways (Kong, AWS API Gateway, Apigee) often use Nginx or Envoy as the underlying HTTP layer, adding the API-specific logic on top. The key distinction: a reverse proxy routes; an API gateway enforces API policies.”}}]}
See also: Stripe Interview Prep
See also: Cloudflare Interview Prep
See also: Shopify Interview Prep