shouldAllowRequest() that returns true if the incoming requests have arrived at a rate less than or equal to specified rate, false otherwise.
1. What if you don’t have to worry about bursts?
2. What if you want to handle burst?
2026 Update: Rate Limiter Algorithms — Token Bucket to Sliding Window
Rate limiting is a core backend/system design topic. There are 4 main algorithms, each with different trade-offs:
import time
import collections
import threading
# Algorithm 1: Token Bucket — smooth bursts, simple to implement
class TokenBucket:
def __init__(self, capacity: int, refill_rate: float):
"""
capacity: max tokens (max burst size)
refill_rate: tokens added per second
"""
self.capacity = capacity
self.tokens = capacity
self.refill_rate = refill_rate
self.last_refill = time.time()
self.lock = threading.Lock()
def allow_request(self, tokens_needed=1) -> bool:
with self.lock:
now = time.time()
elapsed = now - self.last_refill
self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_rate)
self.last_refill = now
if self.tokens >= tokens_needed:
self.tokens -= tokens_needed
return True
return False
# Algorithm 2: Fixed Window Counter — simple but has boundary spike problem
class FixedWindowCounter:
def __init__(self, limit: int, window_seconds: int):
self.limit = limit
self.window = window_seconds
self.count = 0
self.window_start = time.time()
self.lock = threading.Lock()
def allow_request(self) -> bool:
with self.lock:
now = time.time()
if now - self.window_start >= self.window:
self.count = 0
self.window_start = now
if self.count bool:
with self.lock:
now = time.time()
cutoff = now - self.window
while self.log and self.log[0] < cutoff:
self.log.popleft()
if len(self.log) bool:
with self.lock:
now = time.time()
elapsed = now - self.window_start
if elapsed >= self.window:
self.prev_count = self.curr_count
self.curr_count = 0
self.window_start = now
elapsed = 0
# Weighted estimate of requests in sliding window
weight = 1 - elapsed / self.window
estimated = self.prev_count * weight + self.curr_count
if estimated < self.limit:
self.curr_count += 1
return True
return False
Trade-off table:
| Algorithm | Memory | Accuracy | Burst handling |
|---|---|---|---|
| Token Bucket | O(1) | High | Allows burst up to capacity |
| Fixed Window | O(1) | Low (boundary spikes) | 2× limit possible at window edges |
| Sliding Window Log | O(limit) | Perfect | No bursts |
| Sliding Window Counter | O(1) | Good approximation | Smoothed |
Distributed rate limiting: Redis + Lua scripts for atomic check-and-increment. Redis’s INCR + EXPIRE implements fixed window; sorted sets implement sliding window log. Nginx rate limiting uses leaky bucket. Cloudflare uses token bucket at the edge.
💡Strategies for Solving This Problem
System Design Meets Algorithms
Got this at Stripe in 2024. It's part system design, part implementation problem. They want to see if you understand rate limiting patterns AND can code them.
The Problem
Limit how many requests a user can make in a time window. For example: max 100 requests per hour per user.
Approach 1: Fixed Window Counter
Count requests in fixed time buckets (like hourly buckets starting at :00).
Pros: Simple, memory efficient
Cons: Burst problem at window edges. User could make 100 requests at 10:59 and 100 more at 11:00.
Approach 2: Sliding Window Log
Store timestamp of each request. For new request, count how many timestamps fall within the last window.
Pros: Accurate, no burst problem
Cons: Memory intensive (O(n) per user where n is limit)
Approach 3: Sliding Window Counter
Hybrid approach. Use buckets but calculate weighted average based on time position in current window.
Pros: Good accuracy with low memory
Cons: More complex logic
Approach 4: Token Bucket
Give user tokens at fixed rate. Each request consumes a token. Allows short bursts but enforces average rate.
Pros: Flexible, handles bursts smoothly
Cons: Need background process to refill tokens
What I Used at Stripe
Started with fixed window (simplest). Interviewer asked about burst problem. Discussed sliding window log and token bucket. Implemented token bucket because Stripe needs to handle payment bursts during sales events.
Key question they asked: "How would this scale to millions of users?" Discussed Redis, sharding by user ID, and what happens during cache failures (fail open vs fail closed).