Low Level Design: Distributed Semaphore Service

What Is a Distributed Semaphore?

A distributed semaphore limits the number of concurrent processes that can access a shared resource across multiple machines. Where a distributed lock is binary (one holder at a time), a semaphore allows up to N concurrent holders. Typical use cases include limiting concurrent database connections from a pool of app servers, throttling calls to a downstream API, controlling parallel job execution in a task queue, and capping concurrent file uploads to object storage.

Data Model

Redis provides the cleanest implementation using a sorted set (ZSET) where each active permit holder is a member:

Key:    semaphore:<resource>
Type:   Sorted Set
Member: <owner_token>      (UUID per permit acquisition)
Score:  <expiry_timestamp> (Unix ms, used for TTL eviction)

A SQL-backed alternative:

CREATE TABLE semaphore_permits (
  resource    VARCHAR(255) NOT NULL,
  owner_token VARCHAR(64)  NOT NULL,
  acquired_at TIMESTAMP    NOT NULL DEFAULT NOW(),
  expires_at  TIMESTAMP    NOT NULL,
  PRIMARY KEY (resource, owner_token)
);

CREATE TABLE semaphore_config (
  resource    VARCHAR(255) PRIMARY KEY,
  max_permits INT          NOT NULL,
  status      ENUM('active', 'draining') NOT NULL DEFAULT 'active'
);

Acquisition checks COUNT(*) < max_permits inside a transaction with a SELECT FOR UPDATE on the config row to serialize concurrent requests.

Core Algorithm

The Redis implementation uses a Lua script to ensure atomicity of the acquire step:

-- KEYS[1] = semaphore key, ARGV[1] = max_permits,
-- ARGV[2] = owner_token, ARGV[3] = expiry_ts_ms, ARGV[4] = now_ms
local count = redis.call('ZCOUNT', KEYS[1], ARGV[4], '+inf')
if count < tonumber(ARGV[1]) then
  redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', ARGV[4] - 1)
  redis.call('ZADD', KEYS[1], ARGV[3], ARGV[2])
  return 1
end
return 0

Workflow:

Evict Expired Permits: ZREMRANGEBYSCORE removes members whose score (expiry) is in the past, reclaiming slots from crashed clients.
Count Active Permits: ZCOUNT with score range [now, +inf] counts non-expired holders.
Acquire: If count < max_permits, add the owner token with its expiry score via ZADD.
Release: ZREM semaphore:<resource> <owner_token> explicitly removes the permit.
Renew: A watchdog thread calls ZADD XX <new_expiry> <owner_token> to extend the lease for long-running operations.

Failure Modes

Permit Leak (Client Crash Without Release): If a client crashes before calling release, its permit slot is held until TTL expiry. Choose TTL as the maximum tolerable stall time, not the expected operation duration. Use the watchdog renewal pattern to keep TTL short while still supporting long operations.
Thundering Herd on Release: When one permit is released, many waiting clients wake up simultaneously and race to acquire it. Add per-client random jitter (50–200 ms) on retry after a failed acquire attempt.
Redis Failover Gap: During a Redis primary failover, the semaphore state may be temporarily unavailable or, in asynchronous replication scenarios, partially lost. Use Redis Cluster with persistence (appendfsync always) or accept a brief window of over-admission.
Clock Skew Between Clients: Expiry scores use wall-clock time. If client clocks drift significantly, some clients will see permits as expired that others see as valid. Use NTP with tight bounds or derive expiry from the Redis server’s own clock via TIME command inside the Lua script.
Max Permits Misconfiguration: Raising max_permits under load can cause a spike in downstream resource consumption. Implement a rate-of-change limit and test with gradual ramp-up.

Scalability Considerations

Namespace Sharding: Shard semaphore keys across Redis nodes by resource name hash. Each semaphore lives entirely on one node (all operations on a single key are atomic in Redis).
Queue-Based Fairness: A pure spin-retry loop is unfair under high contention. Add a waiting queue (a Redis List) where clients enqueue their token; the holder signals the head of the queue on release, providing FIFO fairness.
Hierarchical Semaphores: For global rate limits across regions, use a two-level design: each region has a local semaphore capped at regional_quota, and a global semaphore (lower-frequency, higher-latency) governs cross-region totals. Local operations stay fast; global enforcement is approximate.
Observability: Instrument permit utilization (active/max), wait time histogram, and lease renewal rate. A utilization consistently above 80% is a signal to increase capacity or reduce critical section duration.
Backpressure Integration: Surface semaphore exhaustion as a 429 or circuit-breaker signal to callers rather than blocking indefinitely. Set a client-side acquire timeout and propagate it up the stack.

Summary

A distributed semaphore is the right tool when you need bounded concurrency rather than exclusive access. The Redis sorted-set pattern with Lua scripting is battle-tested and operationally simple. The key engineering decisions are TTL length, watchdog renewal strategy, retry fairness, and what to do when the semaphore is exhausted. Pair every semaphore with metrics on permit utilization and acquisition latency — these are leading indicators of capacity problems before they become outages.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is a distributed semaphore service?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A distributed semaphore service controls concurrent access to a shared resource by allowing up to N processes to hold the semaphore simultaneously. Unlike a mutex (which allows only one holder), a semaphore limits concurrency to a configurable count across distributed nodes.”
}
},
{
“@type”: “Question”,
“name”: “How is a distributed semaphore different from a distributed lock?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A distributed lock (mutex) permits only one holder at a time, while a distributed semaphore allows up to N concurrent holders. Semaphores are used to rate-limit access to a resource pool, such as a database connection pool or an API with a concurrency cap.”
}
},
{
“@type”: “Question”,
“name”: “How do you implement fairness in a distributed semaphore?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Fairness in a distributed semaphore is typically achieved with a FIFO queue of waiters stored in a coordination service like ZooKeeper or etcd. Each requester registers an ephemeral sequential node and acquires the semaphore only when fewer than N nodes with lower sequence numbers exist.”
}
},
{
“@type”: “Question”,
“name”: “Which companies ask semaphore service design questions in interviews?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Amazon, Google, and Uber have been known to ask candidates to design a distributed semaphore or rate-limiting concurrency service during system design rounds, evaluating knowledge of distributed coordination, fault tolerance, and queue management.”
}
}
]
}