Implement a rate limiter

shouldAllowRequest() that returns true if the incoming requests have arrived at a rate less than or equal to specified rate, false otherwise.
1. What if you don’t have to worry about bursts?
2. What if you want to handle burst?

2026 Update: Rate Limiter Algorithms — Token Bucket to Sliding Window

Rate limiting is a core backend/system design topic. There are 4 main algorithms, each with different trade-offs:

import time
import collections
import threading

# Algorithm 1: Token Bucket — smooth bursts, simple to implement
class TokenBucket:
    def __init__(self, capacity: int, refill_rate: float):
        """
        capacity: max tokens (max burst size)
        refill_rate: tokens added per second
        """
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate
        self.last_refill = time.time()
        self.lock = threading.Lock()

    def allow_request(self, tokens_needed=1) -> bool:
        with self.lock:
            now = time.time()
            elapsed = now - self.last_refill
            self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_rate)
            self.last_refill = now

            if self.tokens >= tokens_needed:
                self.tokens -= tokens_needed
                return True
            return False

# Algorithm 2: Fixed Window Counter — simple but has boundary spike problem
class FixedWindowCounter:
    def __init__(self, limit: int, window_seconds: int):
        self.limit = limit
        self.window = window_seconds
        self.count = 0
        self.window_start = time.time()
        self.lock = threading.Lock()

    def allow_request(self) -> bool:
        with self.lock:
            now = time.time()
            if now - self.window_start >= self.window:
                self.count = 0
                self.window_start = now
            if self.count  bool:
        with self.lock:
            now = time.time()
            cutoff = now - self.window
            while self.log and self.log[0] < cutoff:
                self.log.popleft()
            if len(self.log)  bool:
        with self.lock:
            now = time.time()
            elapsed = now - self.window_start
            if elapsed >= self.window:
                self.prev_count = self.curr_count
                self.curr_count = 0
                self.window_start = now
                elapsed = 0
            # Weighted estimate of requests in sliding window
            weight = 1 - elapsed / self.window
            estimated = self.prev_count * weight + self.curr_count
            if estimated < self.limit:
                self.curr_count += 1
                return True
            return False

Trade-off table:

AlgorithmMemoryAccuracyBurst handling
Token BucketO(1)HighAllows burst up to capacity
Fixed WindowO(1)Low (boundary spikes)2× limit possible at window edges
Sliding Window LogO(limit)PerfectNo bursts
Sliding Window CounterO(1)Good approximationSmoothed

Distributed rate limiting: Redis + Lua scripts for atomic check-and-increment. Redis’s INCR + EXPIRE implements fixed window; sorted sets implement sliding window log. Nginx rate limiting uses leaky bucket. Cloudflare uses token bucket at the edge.

💡Strategies for Solving This Problem

System Design Meets Algorithms

Got this at Stripe in 2024. It's part system design, part implementation problem. They want to see if you understand rate limiting patterns AND can code them.

The Problem

Limit how many requests a user can make in a time window. For example: max 100 requests per hour per user.

Approach 1: Fixed Window Counter

Count requests in fixed time buckets (like hourly buckets starting at :00).

Pros: Simple, memory efficient
Cons: Burst problem at window edges. User could make 100 requests at 10:59 and 100 more at 11:00.

Approach 2: Sliding Window Log

Store timestamp of each request. For new request, count how many timestamps fall within the last window.

Pros: Accurate, no burst problem
Cons: Memory intensive (O(n) per user where n is limit)

Approach 3: Sliding Window Counter

Hybrid approach. Use buckets but calculate weighted average based on time position in current window.

Pros: Good accuracy with low memory
Cons: More complex logic

Approach 4: Token Bucket

Give user tokens at fixed rate. Each request consumes a token. Allows short bursts but enforces average rate.

Pros: Flexible, handles bursts smoothly
Cons: Need background process to refill tokens

What I Used at Stripe

Started with fixed window (simplest). Interviewer asked about burst problem. Discussed sliding window log and token bucket. Implemented token bucket because Stripe needs to handle payment bursts during sales events.

Key question they asked: "How would this scale to millions of users?" Discussed Redis, sharding by user ID, and what happens during cache failures (fail open vs fail closed).

Scroll to Top