Fair Server Scheduling: Round Robin, Weighted Fair Queueing, Modern Approaches

⏱ 6 min read

Fair Server Scheduling: Round Robin, Weighted Fair Queueing, and Modern Approaches

“Design a server that processes a fair number of functions per client” is a classic systems-design and concurrency interview question. It maps to a real production concern: under bursty traffic, how do you ensure no single client (or job type) starves the others? This guide covers the standard scheduling algorithms — round robin, weighted fair queueing, deficit round robin, and modern variants — with working code, the trade-offs between them, and the practical considerations interviewers ask about.

The Problem

A server receives requests from many clients. Some clients submit many requests; others submit few. You want to:

Process all requests without starving any client
Bound the maximum delay any client experiences
Optionally, give some clients more capacity than others (priority / weighting)
Handle dynamic changes (clients arrive and leave)

Approach 1: Round Robin

Maintain a queue of clients with pending work. Process one item from each client in turn; cycle back to the start.

from collections import deque, defaultdict
from threading import Lock

class RoundRobinScheduler:
    def __init__(self):
        self.queues = defaultdict(deque)  # client_id -> queue of jobs
        self.client_order = deque()        # order to visit clients
        self.lock = Lock()

    def submit(self, client_id, job):
        with self.lock:
            if not self.queues[client_id]:
                self.client_order.append(client_id)
            self.queues[client_id].append(job)

    def next_job(self):
        with self.lock:
            while self.client_order:
                client_id = self.client_order.popleft()
                if self.queues[client_id]:
                    job = self.queues[client_id].popleft()
                    if self.queues[client_id]:
                        self.client_order.append(client_id)  # reschedule
                    return client_id, job
            return None, None

Properties: Each client gets equal share. Bounded delay (each round visits everyone). Simple to implement.

Limitation: Treats all clients equally. If client A submits 100 jobs and client B submits 1, both get equal scheduling priority, which means client A waits unfairly long for its jobs to finish.

Approach 2: Weighted Round Robin

Each client has a weight; processing visits each client a number of times proportional to its weight per round.

class WeightedRoundRobin:
    def __init__(self):
        self.queues = defaultdict(deque)
        self.weights = {}      # client_id -> weight
        self.credits = {}      # client_id -> remaining credits this round
        self.client_order = deque()
        self.lock = Lock()

    def submit(self, client_id, job, weight=1):
        with self.lock:
            if client_id not in self.weights:
                self.weights[client_id] = weight
                self.credits[client_id] = weight
                self.client_order.append(client_id)
            self.queues[client_id].append(job)

    def next_job(self):
        with self.lock:
            for _ in range(len(self.client_order)):
                client_id = self.client_order[0]
                if self.queues[client_id] and self.credits[client_id] > 0:
                    job = self.queues[client_id].popleft()
                    self.credits[client_id] -= 1
                    return client_id, job
                self.client_order.append(self.client_order.popleft())
            # All credits exhausted; reset for next round
            for c in self.credits:
                self.credits[c] = self.weights[c]
            return self.next_job()

Properties: Clients with higher weight get proportionally more processing per round. Useful when you want to give paid users priority over free users, or production traffic priority over background jobs.

Approach 3: Deficit Round Robin (DRR)

Improvement on weighted RR: gives each client a “deficit” counter that accumulates over rounds. Clients can carry unused quantum to the next round, smoothing throughput.

class DeficitRoundRobin:
    def __init__(self, default_quantum: int = 1):
        self.queues = defaultdict(deque)
        self.deficits = {}          # client_id -> deficit counter
        self.quantum = {}            # client_id -> quantum per round
        self.client_order = deque()
        self.default_quantum = default_quantum
        self.lock = Lock()

    def submit(self, client_id, job, quantum=None):
        with self.lock:
            if client_id not in self.quantum:
                self.quantum[client_id] = quantum or self.default_quantum
                self.deficits[client_id] = 0
                self.client_order.append(client_id)
            self.queues[client_id].append(job)

    def next_job(self):
        with self.lock:
            while True:
                if not self.client_order:
                    return None, None
                client_id = self.client_order.popleft()
                if not self.queues[client_id]:
                    continue
                self.deficits[client_id] += self.quantum[client_id]
                # Each "service unit" is 1; consume up to deficit
                if self.deficits[client_id] >= 1:
                    job = self.queues[client_id].popleft()
                    self.deficits[client_id] -= 1
                    if self.queues[client_id]:
                        self.client_order.append(client_id)
                    return client_id, job
                self.client_order.append(client_id)

Properties: Smooth throughput; bounded fairness. Used in network packet schedulers (Linux’s sch_drr) and modern scheduler designs.

Approach 4: Token Bucket Per Client

Each client has its own token bucket; submissions consume tokens. The server processes at full capacity but rate-limits per client. This is the typical approach for production API servers — see our Rate Limiter guide for the full implementation.

Approach 5: Priority Queue with Aging

Use a priority queue ordered by (priority, submission_time). Older items get a priority boost over time to prevent starvation.

import heapq
import time

class PriorityScheduler:
    def __init__(self):
        self.heap = []
        self.lock = Lock()

    def submit(self, client_id, job, priority=0):
        with self.lock:
            # Lower priority value = higher importance
            # Add submission time as tiebreaker (prevents starvation)
            heapq.heappush(self.heap, (priority, time.time(), client_id, job))

    def next_job(self):
        with self.lock:
            if not self.heap:
                return None, None
            _, _, client_id, job = heapq.heappop(self.heap)
            return client_id, job

Trade-offs: Priority + aging gives flexible prioritization. Use when “fairness” needs to be relative rather than strict round-robin.

Common Production Considerations

Hot keys / power users

One client with 1000× the traffic of others can dominate the queue. Solutions: per-client rate limits; sharding; fanout limiting. Round-robin alone doesn’t handle hot keys well — the client’s queue grows without bound.

Latency vs throughput trade-off

Round-robin minimizes worst-case latency per client. Priority queues optimize for high-priority clients. Weighted approaches let you pick the trade-off explicitly.

Job duration variance

If some jobs take 1ms and others take 10s, RR alone doesn’t account for service time. Use SCHED_FIFO + per-job timeouts, or shortest-job-first scheduling.

Distributed scheduling

For multi-server systems, a single scheduler is a single point of failure. Modern approaches use consistent hashing (each client maps to a specific server), distributed queues (Kafka, RabbitMQ), or per-shard scheduling with global rate limits.

Backpressure

When all queues are saturated, refuse new submissions or redirect upstream. Let the load shed gracefully rather than building unbounded queues.

Frequently Asked Questions

What’s the expected interview answer?

Round robin as the baseline. Discuss its limitations (no priority, doesn’t handle different job sizes well). Mention weighted RR or DRR for prioritization, token buckets for rate limiting, priority queue with aging for flexible scheduling. Strong candidates draw the system, identify the bottlenecks, and propose monitoring.

How does this differ from rate limiting?

Rate limiting rejects or delays excess requests; scheduling decides which queued requests to process next. They’re complementary: rate limiting controls input; scheduling controls processing order. Real systems use both. See our rate limiter guide for the rate-limiting side.

What’s the right choice for a real production API?

Token bucket per client (rate limiting) plus FIFO processing within each queue (or weighted by priority for premium clients). Scheduling complexity beyond this is rarely necessary unless you have specific fairness or QoS requirements. Most large-scale APIs (Stripe, Cloudflare, AWS) use variants of this pattern.

How do I prevent starvation?

For round robin: it’s automatic — every client gets a turn each round. For priority queue: add aging (priority boost over time). For token bucket: ensure refill rate is non-zero so all clients eventually make progress. For multi-server: ensure routing fairness so no single client’s traffic clusters on one node.

What if jobs have very different durations?

RR with FIFO per queue doesn’t handle this — a 10s job blocks the queue for 10s. Solutions: time-slice processing (Linux SCHED_RR-style), separate queues for fast vs slow jobs, or asynchronous worker pools that pull from all queues. Strong candidates identify the issue and propose explicit handling.

💡Strategies for Solving This Problem

Load Balancing and Request Handling

Got this at Meta in 2024. It's about designing scalable systems that handle variable load. Tests understanding of queuing, load balancing, and priority systems.

The Problem

Design a server that processes thousands of function requests per second. Challenges:

Unknown request rate
Different priorities
Need fairness
Can't block

Approach 1: Simple Queue

Single FIFO queue. Process requests as they come. Simple but:

No priorities
High-priority requests wait behind low-priority
Doesn't scale well

Approach 2: Priority Queue

Use heap or priority queue. High-priority requests first. Better, but:

Starvation: low-priority may never execute
Not truly "fair"

Approach 3: Multiple Queues + Round Robin

Separate queue per priority level. Use weighted round-robin to process:

60% time on high priority
30% on medium
10% on low

Prevents starvation while maintaining priorities.

Approach 4: Thread Pool with Work Stealing

Multiple worker threads. Each has local queue. If idle, "steal" work from others. Good for:

Load balancing
Utilizing all cores
Handling variable load

Key Components

Rate limiting: Prevent overwhelming server

Back pressure: Reject requests when overloaded

Monitoring: Track queue sizes, latencies

Autoscaling: Add workers when load increases

At Meta

They wanted to see if I understood the tradeoffs. Started with simple queue, they asked about priorities. Then about fairness and starvation. Finally discussed scaling and monitoring.

✅Solution

Solution: Priority Queues with Fair Scheduling

class FairRequestServer {
    constructor() {
        this.highPriorityQueue = [];
        this.mediumPriorityQueue = [];
        this.lowPriorityQueue = [];

        this.workers = [];
        this.numWorkers = 4;
        this.running = false;

        // Weighted round-robin ratios
        this.weights = { high: 6, medium: 3, low: 1 };
    }

    async start() {
        this.running = true;

        // Start worker threads
        for (let i = 0; i < this.numWorkers; i++) {
            this.workers.push(this.worker(i));
        }

        await Promise.all(this.workers);
    }

    stop() {
        this.running = false;
    }

    async addRequest(request, priority = 'medium') {
        const queues = {
            high: this.highPriorityQueue,
            medium: this.mediumPriorityQueue,
            low: this.lowPriorityQueue
        };

        const queue = queues[priority];
        if (!queue) throw new Error(`Invalid priority: ${priority}`);

        // Check if queue is too long (back pressure)
        if (queue.length > 10000) {
            throw new Error('Server overloaded - request rejected');
        }

        queue.push(request);
    }

    async worker(id) {
        console.log(`Worker ${id} started`);
        let cycle = 0;

        while (this.running) {
            const request = this.getNextRequest(cycle);

            if (request) {
                try {
                    await this.processRequest(request);
                } catch (error) {
                    console.error(`Worker ${id} error:`, error);
                }
            } else {
                // No requests, sleep briefly
                await this.sleep(10);
            }

            cycle++;
        }

        console.log(`Worker ${id} stopped`);
    }

    getNextRequest(cycle) {
        // Weighted round-robin based on cycle number
        const totalWeight = this.weights.high +
                           this.weights.medium +
                           this.weights.low;

        const position = cycle % totalWeight;

        if (position < this.weights.high) {
            // Try high priority
            if (this.highPriorityQueue.length > 0) {
                return this.highPriorityQueue.shift();
            }
        } else if (position < this.weights.high + this.weights.medium) {
            // Try medium priority
            if (this.mediumPriorityQueue.length > 0) {
                return this.mediumPriorityQueue.shift();
            }
        } else {
            // Try low priority
            if (this.lowPriorityQueue.length > 0) {
                return this.lowPriorityQueue.shift();
            }
        }

        // Fallback: try any queue
        return this.highPriorityQueue.shift() ||
               this.mediumPriorityQueue.shift() ||
               this.lowPriorityQueue.shift();
    }

    async processRequest(request) {
        // Simulate processing
        const startTime = Date.now();
        await this.sleep(request.duration || 10);
        const duration = Date.now() - startTime;

        console.log(`Processed request: ${request.id} ` +
                   `(priority: ${request.priority}, ` +
                   `duration: ${duration}ms)`);
    }

    sleep(ms) {
        return new Promise(resolve => setTimeout(resolve, ms));
    }

    getStats() {
        return {
            high: this.highPriorityQueue.length,
            medium: this.mediumPriorityQueue.length,
            low: this.lowPriorityQueue.length,
            total: this.highPriorityQueue.length +
                  this.mediumPriorityQueue.length +
                  this.lowPriorityQueue.length
        };
    }
}

// Test
async function test() {
    const server = new FairRequestServer();
    server.start();

    // Add mixed priority requests
    for (let i = 0; i < 30; i++) {
        const priority = ['high', 'medium', 'low'][i % 3];
        await server.addRequest({
            id: i,
            priority: priority,
            duration: Math.random() * 50
        }, priority);
    }

    // Let it process
    await server.sleep(2000);

    console.log("Queue stats:", server.getStats());
    server.stop();
}

test();

With Rate Limiting

class RateLimitedServer extends FairRequestServer {
    constructor() {
        super();
        this.requestCounts = new Map();  // userId -> count
        this.resetInterval = 1000;  // Reset every second

        setInterval(() => this.resetCounts(), this.resetInterval);
    }

    async addRequest(request, priority = 'medium') {
        const userId = request.userId || 'anonymous';
        const limit = request.limit || 100;  // 100 requests per second

        const count = this.requestCounts.get(userId) || 0;

        if (count >= limit) {
            throw new Error(`Rate limit exceeded for user ${userId}`);
        }

        this.requestCounts.set(userId, count + 1);
        await super.addRequest(request, priority);
    }

    resetCounts() {
        this.requestCounts.clear();
    }
}

With Autoscaling

class AutoScalingServer extends FairRequestServer {
    constructor() {
        super();
        this.minWorkers = 2;
        this.maxWorkers = 16;
        this.scaleThreshold = 100;  // Queue size to trigger scaling

        setInterval(() => this.checkScale(), 5000);
    }

    checkScale() {
        const stats = this.getStats();
        const currentWorkers = this.numWorkers;

        if (stats.total > this.scaleThreshold &&
            currentWorkers < this.maxWorkers) {
            // Scale up
            this.scaleUp();
        } else if (stats.total < this.scaleThreshold / 4 &&
                   currentWorkers > this.minWorkers) {
            // Scale down
            this.scaleDown();
        }
    }

    scaleUp() {
        if (this.numWorkers >= this.maxWorkers) return;

        console.log(`Scaling up: ${this.numWorkers} -> ${this.numWorkers + 1}`);
        this.numWorkers++;
        this.workers.push(this.worker(this.numWorkers - 1));
    }

    scaleDown() {
        if (this.numWorkers <= this.minWorkers) return;

        console.log(`Scaling down: ${this.numWorkers} -> ${this.numWorkers - 1}`);
        this.numWorkers--;
        // Note: actual worker shutdown would be more complex
    }
}

Design Tradeoffs

Approach	Pros	Cons
Single queue	Simple, fair FIFO	No priorities, poor scaling
Priority queue	Important tasks first	Starvation possible
Multiple queues	Prevents starvation	More complex
Work stealing	Good load balance	Overhead of stealing

Real-World Considerations

Monitoring:

Queue lengths
Processing latencies (p50, p99)
Error rates
Worker utilization

Circuit breakers: Fail fast when system overloaded

Timeouts: Don't let requests hang forever

Retries: With exponential backoff for transient failures

Idempotency: Safe to retry requests

Follow-Up Questions

Q: How to handle different request types (CPU vs I/O)?
A: Separate worker pools optimized for each

Q: What about distributed systems?
A: Use message queue (Kafka, RabbitMQ), load balancer

Q: How to guarantee ordering?
A: Partition by key, single worker per partition

Fair Server Scheduling: Round Robin, Weighted Fair Queueing, and Modern Approaches

The Problem

Approach 1: Round Robin

Approach 2: Weighted Round Robin

Approach 3: Deficit Round Robin (DRR)

Approach 4: Token Bucket Per Client

Approach 5: Priority Queue with Aging

Common Production Considerations

Hot keys / power users

Latency vs throughput trade-off

Job duration variance

Distributed scheduling

Backpressure

Frequently Asked Questions

What’s the expected interview answer?

How does this differ from rate limiting?

What’s the right choice for a real production API?

How do I prevent starvation?

What if jobs have very different durations?

💡Strategies for Solving This Problem

Load Balancing and Request Handling

The Problem

Approach 1: Simple Queue

Approach 2: Priority Queue

Approach 3: Multiple Queues + Round Robin

Approach 4: Thread Pool with Work Stealing

Key Components

At Meta

✅Solution

Solution: Priority Queues with Fair Scheduling

With Rate Limiting

With Autoscaling

Design Tradeoffs

Real-World Considerations

Follow-Up Questions

Related Problems