Fair Server Scheduling: Round Robin, Weighted Fair Queueing, Modern Approaches

Fair Server Scheduling: Round Robin, Weighted Fair Queueing, and Modern Approaches

“Design a server that processes a fair number of functions per client” is a classic systems-design and concurrency interview question. It maps to a real production concern: under bursty traffic, how do you ensure no single client (or job type) starves the others? This guide covers the standard scheduling algorithms — round robin, weighted fair queueing, deficit round robin, and modern variants — with working code, the trade-offs between them, and the practical considerations interviewers ask about.

The Problem

A server receives requests from many clients. Some clients submit many requests; others submit few. You want to:

  • Process all requests without starving any client
  • Bound the maximum delay any client experiences
  • Optionally, give some clients more capacity than others (priority / weighting)
  • Handle dynamic changes (clients arrive and leave)

Approach 1: Round Robin

Maintain a queue of clients with pending work. Process one item from each client in turn; cycle back to the start.

from collections import deque, defaultdict
from threading import Lock

class RoundRobinScheduler:
    def __init__(self):
        self.queues = defaultdict(deque)  # client_id -> queue of jobs
        self.client_order = deque()        # order to visit clients
        self.lock = Lock()

    def submit(self, client_id, job):
        with self.lock:
            if not self.queues[client_id]:
                self.client_order.append(client_id)
            self.queues[client_id].append(job)

    def next_job(self):
        with self.lock:
            while self.client_order:
                client_id = self.client_order.popleft()
                if self.queues[client_id]:
                    job = self.queues[client_id].popleft()
                    if self.queues[client_id]:
                        self.client_order.append(client_id)  # reschedule
                    return client_id, job
            return None, None

Properties: Each client gets equal share. Bounded delay (each round visits everyone). Simple to implement.

Limitation: Treats all clients equally. If client A submits 100 jobs and client B submits 1, both get equal scheduling priority, which means client A waits unfairly long for its jobs to finish.

Approach 2: Weighted Round Robin

Each client has a weight; processing visits each client a number of times proportional to its weight per round.

class WeightedRoundRobin:
    def __init__(self):
        self.queues = defaultdict(deque)
        self.weights = {}      # client_id -> weight
        self.credits = {}      # client_id -> remaining credits this round
        self.client_order = deque()
        self.lock = Lock()

    def submit(self, client_id, job, weight=1):
        with self.lock:
            if client_id not in self.weights:
                self.weights[client_id] = weight
                self.credits[client_id] = weight
                self.client_order.append(client_id)
            self.queues[client_id].append(job)

    def next_job(self):
        with self.lock:
            for _ in range(len(self.client_order)):
                client_id = self.client_order[0]
                if self.queues[client_id] and self.credits[client_id] > 0:
                    job = self.queues[client_id].popleft()
                    self.credits[client_id] -= 1
                    return client_id, job
                self.client_order.append(self.client_order.popleft())
            # All credits exhausted; reset for next round
            for c in self.credits:
                self.credits[c] = self.weights[c]
            return self.next_job()

Properties: Clients with higher weight get proportionally more processing per round. Useful when you want to give paid users priority over free users, or production traffic priority over background jobs.

Approach 3: Deficit Round Robin (DRR)

Improvement on weighted RR: gives each client a “deficit” counter that accumulates over rounds. Clients can carry unused quantum to the next round, smoothing throughput.

class DeficitRoundRobin:
    def __init__(self, default_quantum: int = 1):
        self.queues = defaultdict(deque)
        self.deficits = {}          # client_id -> deficit counter
        self.quantum = {}            # client_id -> quantum per round
        self.client_order = deque()
        self.default_quantum = default_quantum
        self.lock = Lock()

    def submit(self, client_id, job, quantum=None):
        with self.lock:
            if client_id not in self.quantum:
                self.quantum[client_id] = quantum or self.default_quantum
                self.deficits[client_id] = 0
                self.client_order.append(client_id)
            self.queues[client_id].append(job)

    def next_job(self):
        with self.lock:
            while True:
                if not self.client_order:
                    return None, None
                client_id = self.client_order.popleft()
                if not self.queues[client_id]:
                    continue
                self.deficits[client_id] += self.quantum[client_id]
                # Each "service unit" is 1; consume up to deficit
                if self.deficits[client_id] >= 1:
                    job = self.queues[client_id].popleft()
                    self.deficits[client_id] -= 1
                    if self.queues[client_id]:
                        self.client_order.append(client_id)
                    return client_id, job
                self.client_order.append(client_id)

Properties: Smooth throughput; bounded fairness. Used in network packet schedulers (Linux’s sch_drr) and modern scheduler designs.

Approach 4: Token Bucket Per Client

Each client has its own token bucket; submissions consume tokens. The server processes at full capacity but rate-limits per client. This is the typical approach for production API servers — see our Rate Limiter guide for the full implementation.

Approach 5: Priority Queue with Aging

Use a priority queue ordered by (priority, submission_time). Older items get a priority boost over time to prevent starvation.

import heapq
import time

class PriorityScheduler:
    def __init__(self):
        self.heap = []
        self.lock = Lock()

    def submit(self, client_id, job, priority=0):
        with self.lock:
            # Lower priority value = higher importance
            # Add submission time as tiebreaker (prevents starvation)
            heapq.heappush(self.heap, (priority, time.time(), client_id, job))

    def next_job(self):
        with self.lock:
            if not self.heap:
                return None, None
            _, _, client_id, job = heapq.heappop(self.heap)
            return client_id, job

Trade-offs: Priority + aging gives flexible prioritization. Use when “fairness” needs to be relative rather than strict round-robin.

Common Production Considerations

Hot keys / power users

One client with 1000× the traffic of others can dominate the queue. Solutions: per-client rate limits; sharding; fanout limiting. Round-robin alone doesn’t handle hot keys well — the client’s queue grows without bound.

Latency vs throughput trade-off

Round-robin minimizes worst-case latency per client. Priority queues optimize for high-priority clients. Weighted approaches let you pick the trade-off explicitly.

Job duration variance

If some jobs take 1ms and others take 10s, RR alone doesn’t account for service time. Use SCHED_FIFO + per-job timeouts, or shortest-job-first scheduling.

Distributed scheduling

For multi-server systems, a single scheduler is a single point of failure. Modern approaches use consistent hashing (each client maps to a specific server), distributed queues (Kafka, RabbitMQ), or per-shard scheduling with global rate limits.

Backpressure

When all queues are saturated, refuse new submissions or redirect upstream. Let the load shed gracefully rather than building unbounded queues.

Frequently Asked Questions

What’s the expected interview answer?

Round robin as the baseline. Discuss its limitations (no priority, doesn’t handle different job sizes well). Mention weighted RR or DRR for prioritization, token buckets for rate limiting, priority queue with aging for flexible scheduling. Strong candidates draw the system, identify the bottlenecks, and propose monitoring.

How does this differ from rate limiting?

Rate limiting rejects or delays excess requests; scheduling decides which queued requests to process next. They’re complementary: rate limiting controls input; scheduling controls processing order. Real systems use both. See our rate limiter guide for the rate-limiting side.

What’s the right choice for a real production API?

Token bucket per client (rate limiting) plus FIFO processing within each queue (or weighted by priority for premium clients). Scheduling complexity beyond this is rarely necessary unless you have specific fairness or QoS requirements. Most large-scale APIs (Stripe, Cloudflare, AWS) use variants of this pattern.

How do I prevent starvation?

For round robin: it’s automatic — every client gets a turn each round. For priority queue: add aging (priority boost over time). For token bucket: ensure refill rate is non-zero so all clients eventually make progress. For multi-server: ensure routing fairness so no single client’s traffic clusters on one node.

What if jobs have very different durations?

RR with FIFO per queue doesn’t handle this — a 10s job blocks the queue for 10s. Solutions: time-slice processing (Linux SCHED_RR-style), separate queues for fast vs slow jobs, or asynchronous worker pools that pull from all queues. Strong candidates identify the issue and propose explicit handling.

See also: Producer-Consumer ProblemImplement a Rate LimiterExecution and Slippage

💡Strategies for Solving This Problem

Load Balancing and Request Handling

Got this at Meta in 2024. It's about designing scalable systems that handle variable load. Tests understanding of queuing, load balancing, and priority systems.

The Problem

Design a server that processes thousands of function requests per second. Challenges:

  • Unknown request rate
  • Different priorities
  • Need fairness
  • Can't block

Approach 1: Simple Queue

Single FIFO queue. Process requests as they come. Simple but:

  • No priorities
  • High-priority requests wait behind low-priority
  • Doesn't scale well

Approach 2: Priority Queue

Use heap or priority queue. High-priority requests first. Better, but:

  • Starvation: low-priority may never execute
  • Not truly "fair"

Approach 3: Multiple Queues + Round Robin

Separate queue per priority level. Use weighted round-robin to process:

  • 60% time on high priority
  • 30% on medium
  • 10% on low

Prevents starvation while maintaining priorities.

Approach 4: Thread Pool with Work Stealing

Multiple worker threads. Each has local queue. If idle, "steal" work from others. Good for:

  • Load balancing
  • Utilizing all cores
  • Handling variable load

Key Components

Rate limiting: Prevent overwhelming server

Back pressure: Reject requests when overloaded

Monitoring: Track queue sizes, latencies

Autoscaling: Add workers when load increases

At Meta

They wanted to see if I understood the tradeoffs. Started with simple queue, they asked about priorities. Then about fairness and starvation. Finally discussed scaling and monitoring.

Scroll to Top