Backpressure Mechanism Low-Level Design: Producer Throttling, Buffer Management, and Overflow Strategies

Backpressure is the mechanism by which an overwhelmed consumer signals upstream producers to slow down. Without it, producers fill unbounded buffers until the process runs out of memory or messages are silently dropped with no visibility. Proper backpressure design makes the system self-regulating: throughput degrades gracefully under load rather than collapsing.

Backpressure Signals

The first problem is measurement. A consumer cannot apply backpressure without a reliable signal of its own saturation. Three primary signals are used:

  • Consumer lag — for Kafka, the difference between the latest offset and the committed offset. A growing lag means the consumer cannot keep up.
  • Queue depth — for in-process or database-backed queues, the number of pending items. Depth approaching capacity is the most direct signal.
  • Processing latency — the time from message receipt to processing completion. Rising latency indicates the consumer is slowing down even if the queue has not yet filled.

Producer Throttling with AIMD

AIMD (Additive Increase, Multiplicative Decrease) is the same congestion control algorithm used by TCP. Producers apply it to their send rate:

  • Additive increase — when consumer lag is below the low watermark, increase the send rate by a fixed step (e.g., +10 msg/s per interval).
  • Multiplicative decrease — when lag exceeds the high watermark, halve the send rate immediately.

The producer periodically reads the lag metric from a shared metrics store (Redis or Prometheus) and adjusts its rate limiter accordingly. This creates a control loop that converges on a sustainable throughput.

Bounded In-Memory Queue

Every internal queue must have a maximum size. An unbounded queue is a memory leak waiting to happen under sustained overload. The capacity is set based on expected message size and available heap. When the queue reaches capacity, one of four overflow strategies applies.

Overflow Strategies

  • DROP_OLDEST — evict the head of the queue (oldest item) to make room for the new item. Appropriate when recency matters more than completeness (e.g., sensor readings where a stale value is useless).
  • DROP_NEWEST — reject the incoming item. The producer receives an error and can retry or discard. Appropriate when ordered processing matters and dropping old items would break sequence.
  • BLOCK — the producer's put() call blocks until space is available. This propagates backpressure synchronously up the call stack. Works well when producers are async and can tolerate waiting without holding other resources.
  • SPILL_TO_DISK — overflow items are written to a local disk-backed queue (e.g., a memory-mapped file or SQLite). Disk is slower but effectively unlimited. Used when neither dropping nor blocking is acceptable.

Load Shedding

When the queue depth exceeds defined thresholds, the system proactively sheds work:

  • 80% full — reject incoming LOW and BACKGROUND priority items. NORMAL and above are still accepted.
  • 95% full — reject all non-CRITICAL items. Return HTTP 503 with Retry-After header.
  • 100% full — apply configured overflow strategy.

Load shedding is preferable to blocking because it keeps latency predictable for high-priority work. Low-priority senders receive clear feedback (503) and can implement their own retry logic.

SQL Schema

CREATE TABLE backpressure_config (
    service_name    VARCHAR(100) PRIMARY KEY,
    strategy        VARCHAR(20) NOT NULL DEFAULT 'BLOCK',
    -- DROP_OLDEST | DROP_NEWEST | BLOCK | SPILL_TO_DISK
    max_queue_depth INT NOT NULL DEFAULT 10000,
    spill_threshold INT NOT NULL DEFAULT 8000,
    shed_threshold  INT NOT NULL DEFAULT 9500,
    low_watermark_lag_seconds  INT NOT NULL DEFAULT 5,
    high_watermark_lag_seconds INT NOT NULL DEFAULT 30,
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE backpressure_metric (
    id           BIGSERIAL PRIMARY KEY,
    service_name VARCHAR(100) NOT NULL,
    queue_depth  INT NOT NULL,
    lag_seconds  NUMERIC(10,3),
    drop_count   INT NOT NULL DEFAULT 0,
    send_rate    NUMERIC(10,2),
    sampled_at   TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX ON backpressure_metric (service_name, sampled_at DESC);

Python Implementation

import asyncio, time, random
from dataclasses import dataclass, field
from enum import Enum

class OverflowStrategy(Enum):
    DROP_OLDEST = "DROP_OLDEST"
    DROP_NEWEST = "DROP_NEWEST"
    BLOCK       = "BLOCK"
    SPILL_TO_DISK = "SPILL_TO_DISK"

@dataclass
class BoundedQueue:
    maxsize: int
    strategy: OverflowStrategy = OverflowStrategy.BLOCK
    _queue: asyncio.Queue = field(init=False)
    drop_count: int = field(init=False, default=0)

    def __post_init__(self):
        self._queue = asyncio.Queue(maxsize=self.maxsize)

    @property
    def depth(self) -> int:
        return self._queue.qsize()

    @property
    def fill_ratio(self) -> float:
        return self.depth / self.maxsize

    async def put(self, item):
        if self._queue.full():
            if self.strategy == OverflowStrategy.BLOCK:
                await self._queue.put(item)  # blocks until space available
            elif self.strategy == OverflowStrategy.DROP_NEWEST:
                self.drop_count += 1
                return  # silently discard
            elif self.strategy == OverflowStrategy.DROP_OLDEST:
                try:
                    self._queue.get_nowait()  # evict oldest
                except asyncio.QueueEmpty:
                    pass
                self.drop_count += 1
                await self._queue.put(item)
            elif self.strategy == OverflowStrategy.SPILL_TO_DISK:
                await spill_to_disk(item)
        else:
            await self._queue.put(item)

    async def get(self):
        return await self._queue.get()


async def apply_backpressure(queue: BoundedQueue, producer, redis_client,
                              service_name: str, poll_interval: float = 1.0):
    """AIMD rate controller; adjusts producer send rate based on queue depth."""
    send_rate = 100.0  # initial msg/s
    step_increase = 10.0
    while True:
        lag = float(await redis_client.get(f"lag:{service_name}") or 0)
        config = get_config(service_name)
        if lag > config["high_watermark_lag_seconds"]:
            send_rate = max(1.0, send_rate / 2)  # multiplicative decrease
        elif lag  bool:
    """Return True if the item should be accepted, False if shed."""
    ratio = queue.fill_ratio
    if ratio >= 0.95 and priority != "CRITICAL":
        return False   # shed everything non-critical
    if ratio >= 0.80 and priority in ("LOW", "BACKGROUND"):
        return False   # shed low-priority work
    return True

Monitoring Consumer Lag

For Kafka, the consumer lag is available via the AdminClient or the kafka-consumer-groups CLI. Export it as a Prometheus gauge per (consumer_group, topic, partition). Alert when lag exceeds the high watermark for more than 2 minutes — sustained lag means the AIMD loop is not converging and manual intervention may be needed (scale consumers or reduce producer rate).

See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering

Scroll to Top