Why use AIMD for producer rate control instead of a fixed rate?

A fixed rate cannot adapt to changing consumer capacity (e.g., GC pauses, downstream slowdowns). AIMD mimics TCP congestion control: it probes for available capacity by increasing rate additively when healthy, and backs off sharply (halving) when congestion is detected. This converges on the maximum sustainable throughput without requiring manual tuning.

What are the tradeoffs between DROP_OLDEST, DROP_NEWEST, and BLOCK overflow strategies?

DROP_OLDEST prioritizes recency: old data is discarded, new data accepted. Best for sensor streams where stale readings are useless. DROP_NEWEST rejects incoming data and signals the producer, preserving order but requiring producers to handle errors. BLOCK propagates backpressure synchronously, preventing data loss but risking producer thread starvation if the consumer is slow for a long time. SPILL_TO_DISK avoids loss and blocking but adds disk I/O latency.

How should load shedding thresholds be chosen?

Start with 80% queue depth as the trigger for shedding low-priority work and 95% for shedding everything non-critical. These thresholds give the system time to recover before hitting 100% capacity. Tune based on observed fill rates: if the queue fills from 80% to 100% in under 10 seconds, lower the first threshold so shedding starts earlier.

How does AIMD control producer send rates in response to backpressure?

On congestion signal, the producer multiplicatively decreases its send rate (e.g., halves it); when the queue drains below threshold, the rate increases additively, preventing oscillation.

What is the difference between DROP_OLDEST and DROP_NEWEST overflow strategies?

DROP_OLDEST evicts the front of the queue (oldest work) to admit new items, preserving freshness; DROP_NEWEST rejects incoming items, protecting work already in the queue.

How does load shedding differ from backpressure?

Backpressure slows producers to match consumer capacity; load shedding actively drops low-priority work when the system is overloaded, prioritizing critical tasks over throughput.

How is consumer lag measured to trigger backpressure signals?

For Kafka consumers, lag is (latest_offset - committed_offset) per partition; for internal queues, it is queue depth divided by average dequeue rate, expressed as seconds of pending work.

Backpressure Mechanism Low-Level Design: Producer Throttling, Buffer Management, and Overflow Strategies

Q: How do you monitor consumer lag in a Kafka-based backpressure system?

Use the Kafka AdminClient to read consumer group offsets and compare them to the latest partition offsets. Export the difference as a Prometheus gauge per (consumer_group, topic, partition). Alert on sustained lag above the high watermark for 2+ minutes. For non-Kafka queues, track queue depth in a metrics table sampled every few seconds.

⏱ 7 min read

Backpressure is the mechanism by which an overwhelmed consumer signals upstream producers to slow down. Without it, producers fill unbounded buffers until the process runs out of memory or messages are silently dropped with no visibility. Proper backpressure design makes the system self-regulating: throughput degrades gracefully under load rather than collapsing.

Backpressure Signals

The first problem is measurement. A consumer cannot apply backpressure without a reliable signal of its own saturation. Three primary signals are used:

Consumer lag — for Kafka, the difference between the latest offset and the committed offset. A growing lag means the consumer cannot keep up.
Queue depth — for in-process or database-backed queues, the number of pending items. Depth approaching capacity is the most direct signal.
Processing latency — the time from message receipt to processing completion. Rising latency indicates the consumer is slowing down even if the queue has not yet filled.

Producer Throttling with AIMD

AIMD (Additive Increase, Multiplicative Decrease) is the same congestion control algorithm used by TCP. Producers apply it to their send rate:

Additive increase — when consumer lag is below the low watermark, increase the send rate by a fixed step (e.g., +10 msg/s per interval).
Multiplicative decrease — when lag exceeds the high watermark, halve the send rate immediately.

The producer periodically reads the lag metric from a shared metrics store (Redis or Prometheus) and adjusts its rate limiter accordingly. This creates a control loop that converges on a sustainable throughput.

Bounded In-Memory Queue

Every internal queue must have a maximum size. An unbounded queue is a memory leak waiting to happen under sustained overload. The capacity is set based on expected message size and available heap. When the queue reaches capacity, one of four overflow strategies applies.

Overflow Strategies

DROP_OLDEST — evict the head of the queue (oldest item) to make room for the new item. Appropriate when recency matters more than completeness (e.g., sensor readings where a stale value is useless).
DROP_NEWEST — reject the incoming item. The producer receives an error and can retry or discard. Appropriate when ordered processing matters and dropping old items would break sequence.
BLOCK — the producer's put() call blocks until space is available. This propagates backpressure synchronously up the call stack. Works well when producers are async and can tolerate waiting without holding other resources.
SPILL_TO_DISK — overflow items are written to a local disk-backed queue (e.g., a memory-mapped file or SQLite). Disk is slower but effectively unlimited. Used when neither dropping nor blocking is acceptable.

Load Shedding

When the queue depth exceeds defined thresholds, the system proactively sheds work:

80% full — reject incoming LOW and BACKGROUND priority items. NORMAL and above are still accepted.
95% full — reject all non-CRITICAL items. Return HTTP 503 with Retry-After header.
100% full — apply configured overflow strategy.

Load shedding is preferable to blocking because it keeps latency predictable for high-priority work. Low-priority senders receive clear feedback (503) and can implement their own retry logic.

SQL Schema

CREATE TABLE backpressure_config (
    service_name    VARCHAR(100) PRIMARY KEY,
    strategy        VARCHAR(20) NOT NULL DEFAULT 'BLOCK',
    -- DROP_OLDEST | DROP_NEWEST | BLOCK | SPILL_TO_DISK
    max_queue_depth INT NOT NULL DEFAULT 10000,
    spill_threshold INT NOT NULL DEFAULT 8000,
    shed_threshold  INT NOT NULL DEFAULT 9500,
    low_watermark_lag_seconds  INT NOT NULL DEFAULT 5,
    high_watermark_lag_seconds INT NOT NULL DEFAULT 30,
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE backpressure_metric (
    id           BIGSERIAL PRIMARY KEY,
    service_name VARCHAR(100) NOT NULL,
    queue_depth  INT NOT NULL,
    lag_seconds  NUMERIC(10,3),
    drop_count   INT NOT NULL DEFAULT 0,
    send_rate    NUMERIC(10,2),
    sampled_at   TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX ON backpressure_metric (service_name, sampled_at DESC);

Python Implementation

import asyncio, time, random
from dataclasses import dataclass, field
from enum import Enum

class OverflowStrategy(Enum):
    DROP_OLDEST = "DROP_OLDEST"
    DROP_NEWEST = "DROP_NEWEST"
    BLOCK       = "BLOCK"
    SPILL_TO_DISK = "SPILL_TO_DISK"

@dataclass
class BoundedQueue:
    maxsize: int
    strategy: OverflowStrategy = OverflowStrategy.BLOCK
    _queue: asyncio.Queue = field(init=False)
    drop_count: int = field(init=False, default=0)

    def __post_init__(self):
        self._queue = asyncio.Queue(maxsize=self.maxsize)

    @property
    def depth(self) -> int:
        return self._queue.qsize()

    @property
    def fill_ratio(self) -> float:
        return self.depth / self.maxsize

    async def put(self, item):
        if self._queue.full():
            if self.strategy == OverflowStrategy.BLOCK:
                await self._queue.put(item)  # blocks until space available
            elif self.strategy == OverflowStrategy.DROP_NEWEST:
                self.drop_count += 1
                return  # silently discard
            elif self.strategy == OverflowStrategy.DROP_OLDEST:
                try:
                    self._queue.get_nowait()  # evict oldest
                except asyncio.QueueEmpty:
                    pass
                self.drop_count += 1
                await self._queue.put(item)
            elif self.strategy == OverflowStrategy.SPILL_TO_DISK:
                await spill_to_disk(item)
        else:
            await self._queue.put(item)

    async def get(self):
        return await self._queue.get()


async def apply_backpressure(queue: BoundedQueue, producer, redis_client,
                              service_name: str, poll_interval: float = 1.0):
    """AIMD rate controller; adjusts producer send rate based on queue depth."""
    send_rate = 100.0  # initial msg/s
    step_increase = 10.0
    while True:
        lag = float(await redis_client.get(f"lag:{service_name}") or 0)
        config = get_config(service_name)
        if lag > config["high_watermark_lag_seconds"]:
            send_rate = max(1.0, send_rate / 2)  # multiplicative decrease
        elif lag  bool:
    """Return True if the item should be accepted, False if shed."""
    ratio = queue.fill_ratio
    if ratio >= 0.95 and priority != "CRITICAL":
        return False   # shed everything non-critical
    if ratio >= 0.80 and priority in ("LOW", "BACKGROUND"):
        return False   # shed low-priority work
    return True

Monitoring Consumer Lag

For Kafka, the consumer lag is available via the AdminClient or the kafka-consumer-groups CLI. Export it as a Prometheus gauge per (consumer_group, topic, partition). Alert when lag exceeds the high watermark for more than 2 minutes — sustained lag means the AIMD loop is not converging and manual intervention may be needed (scale consumers or reduce producer rate).