Why Rate Limiting?
Rate limiting protects services from abuse, ensures fair usage, and prevents cascading failures from traffic spikes. Applied at: API gateways (per-user or per-IP limits), payment endpoints (fraud prevention), search endpoints (prevent scraping), notification services (prevent spam).
Four Rate Limiting Algorithms
1. Token Bucket
A bucket holds up to capacity tokens. Tokens are added at a fixed rate (e.g., 10/second). Each request consumes 1 token. If the bucket is empty, the request is rejected. Allows bursts up to the capacity. Most common algorithm — used by AWS, Stripe.
class TokenBucket:
def __init__(self, capacity, refill_rate):
self.capacity = capacity
self.tokens = capacity
self.refill_rate = refill_rate # tokens per second
self.last_refill = time.time()
def allow(self):
now = time.time()
elapsed = now - self.last_refill
self.tokens = min(self.capacity,
self.tokens + elapsed * self.refill_rate)
self.last_refill = now
if self.tokens >= 1:
self.tokens -= 1
return True
return False
2. Leaky Bucket
Requests enter a queue (the bucket) and are processed at a fixed output rate. Excess requests are dropped if the queue is full. Provides a smooth constant output rate — useful for traffic shaping. Implemented as a FIFO queue with a fixed drain rate.
3. Fixed Window Counter
Count requests per fixed time window (e.g., 100 requests per minute). Simple: INCR counter in Redis with TTL=60s. Problem: boundary burst — 100 requests at 0:59 and 100 at 1:01 = 200 requests in 2 seconds.
def allow_fixed_window(user_id, limit=100, window=60):
key = f"rate:{user_id}:{int(time.time() // window)}"
count = redis.incr(key)
if count == 1:
redis.expire(key, window)
return count <= limit
4. Sliding Window Log
Keep a log of timestamps of recent requests per user. On each request: remove timestamps older than 1 minute, count remaining — if < limit, allow and add timestamp. Accurate but memory-intensive (stores every request timestamp). Use Redis sorted set: ZADD with score=timestamp, ZREMRANGEBYSCORE to expire old entries.
5. Sliding Window Counter (Recommended)
Approximate sliding window using two fixed window counters. For a 60s window: current_window_count + previous_window_count * (overlap_fraction).
def allow_sliding_window(user_id, limit=100, window=60):
now = time.time()
current_window = int(now // window)
prev_window = current_window - 1
overlap = now % window / window # fraction of current window elapsed
curr_key = f"rate:{user_id}:{current_window}"
prev_key = f"rate:{user_id}:{prev_window}"
curr_count = int(redis.get(curr_key) or 0)
prev_count = int(redis.get(prev_key) or 0)
# Weighted estimate: prev window contributes proportionally
estimated = prev_count * (1 - overlap) + curr_count
if estimated >= limit:
return False
redis.incr(curr_key)
redis.expire(curr_key, window * 2)
return True
Distributed Rate Limiting with Redis
For a distributed system (multiple API servers), rate limits must be enforced globally — not per-server. Use Redis as the central counter store. Atomic operations via Lua script prevent race conditions:
-- Lua script for atomic token bucket check:
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(data[1]) or capacity
local last_refill = tonumber(data[2]) or now
local elapsed = now - last_refill
tokens = math.min(capacity, tokens + elapsed * refill_rate)
if tokens >= 1 then
tokens = tokens - 1
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, 3600)
return 1 -- allowed
else
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
return 0 -- rejected
end
Rate Limit Headers
Return in every response: X-RateLimit-Limit (total allowed), X-RateLimit-Remaining (remaining in window), X-RateLimit-Reset (Unix timestamp when window resets), Retry-After (seconds until next allowed request, on 429). Status 429 Too Many Requests on rejection.
Key Design Decisions
- Token bucket for API rate limits — allows bursts, easy to understand
- Sliding window counter for accuracy without memory overhead of log
- Redis Lua script for atomicity — prevents race conditions in distributed environment
- Per-user + per-IP limits — per-IP catches unauthenticated abuse, per-user catches authenticated abuse
Stripe system design covers API rate limiting and throttling. See common questions for Stripe interview: rate limiting and API design.
Uber system design covers rate limiting for high-traffic APIs. Review patterns for Uber interview: rate limiting and traffic shaping.
Atlassian system design covers API gateways and rate limiting. See design patterns for Atlassian interview: rate limiting and API gateway design.