Distributed Lock System Low-Level Design

Why Distributed Locks?

In a distributed system, multiple servers run the same code concurrently. A distributed lock ensures only one server executes a critical section at a time — preventing double-processing, race conditions on shared resources, and duplicate task execution. Use cases: flash sale inventory reservation, cron job deduplication, preventing double-payment, leader election.

Redis-Based Lock (Single Node)

The fundamental command: SET key token NX PX ttl_ms. NX = set only if Not eXists. PX = TTL in milliseconds. Returns OK if lock acquired, nil if already held.

import uuid

def acquire_lock(r, lock_name, ttl_ms=5000):
    token = str(uuid.uuid4())  # unique per lock holder
    acquired = r.set(lock_name, token, nx=True, px=ttl_ms)
    return token if acquired else None

Always use a unique token (UUID) per acquisition — this is critical for safe release.

Safe Lock Release with Lua Script

Release must be atomic: check that the stored token matches before deleting. Without this, a timed-out lock holder can delete another holder’s lock.

# Lua script runs atomically in Redis
RELEASE_LUA = """
if redis.call('get', KEYS[1]) == ARGV[1] then
    return redis.call('del', KEYS[1])
else
    return 0
end
"""

def release_lock(r, lock_name, token):
    r.execute_script(RELEASE_LUA, 1, lock_name, token)

Scenario without this protection: Server A acquires lock, pauses past TTL, lock expires, Server B acquires lock, Server A resumes and deletes — it has now deleted B’s lock. The atomic check-and-delete prevents this.

The TTL Problem and Fencing Tokens

Even with unique tokens, there’s a subtle issue: Server A holds the lock with TTL=5s. A pauses for 6 seconds (GC, VM migration). Lock expires. B acquires it. A resumes — both A and B are now in the critical section simultaneously. Solution: fencing tokens. Each lock acquisition returns a monotonically increasing integer (the lock version). Pass this token to downstream systems. Downstream systems reject requests with a token lower than the latest accepted token. This requires the downstream resource to track the max token seen. Used in distributed databases (HBase epoch fencing, Kubernetes leader election).

Redlock (Multi-Node Redis)

For higher availability, acquire the lock on N=5 independent Redis nodes. Lock is acquired if: (a) majority N/2+1 = 3 nodes return OK, and (b) total acquisition time < TTL. Actual TTL for use = original_ttl – acquisition_time. Release: send DEL to all 5 nodes. Controversy: Redlock does not guarantee safety under process pauses or clock drift — Martin Kleppmann’s 2016 analysis showed scenarios where two nodes simultaneously believe they hold the lock. For most workloads, single-node Redis is sufficient. For critical financial operations, use a CP system.

ZooKeeper and etcd Locks

ZooKeeper: create ephemeral sequential znodes under /locks/. The holder with the lowest sequence number owns the lock. Others watch the node immediately below them (not the root, to avoid thundering herd). When the holder disconnects, its ephemeral node is auto-deleted, waking the next waiter. etcd: PUT /lock/{caller_id} with a lease TTL; key auto-expires when lease elapses; use watch to be notified when the key is deleted. Both are CP systems — they sacrifice availability for consistency. Correct under network partitions and process pauses (unlike Redis). Use for: leader election, critical coordination in distributed databases.

Comparison

Redis SET NX: simple, fast, eventual safety guarantee. Good for deduplication and low-stakes critical sections.
Redlock: more available than single-node Redis, but controversial safety guarantees. Avoid for financial transactions.
ZooKeeper / etcd: strong consistency, correct under failures. Use for leader election and high-stakes coordination. Slower due to consensus overhead.
DB advisory locks: SELECT GET_LOCK() in MySQL — simple, leverages existing DB, single point of failure.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How does a Redis-based distributed lock work?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The core command: SET lock_key unique_token NX PX ttl_milliseconds. NX (Not eXists) makes the SET conditional — it only succeeds if the key does not already exist. PX sets an expiry in milliseconds so the lock auto-releases if the holder crashes. On success: the key is set and the caller holds the lock. On failure (nil returned): another process holds the lock. Release: use an atomic Lua script to check that the stored value matches the caller's token, then delete. Never use GET + DEL for release — another process could acquire the lock between the GET and DEL. The unique token (UUID) per acquisition ensures you only release your own lock, not a lock re-acquired by another process after your TTL expired.”}},{“@type”:”Question”,”name”:”Why must distributed lock release be atomic?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Without atomic release, this race condition is possible: (1) Server A holds lock, stores token=A1. (2) A pauses for longer than TTL. Lock expires. (3) Server B acquires lock, stores token=B1. (4) A resumes, does GET lock_key — sees B1. But A checks "is B1 == A1?" — no, so A correctly skips. Wait — what if A stored a non-unique token? Then: (4) A checks if key==my_token, sees match (same non-unique value), deletes — releasing B's lock. The unique token prevents this. The Lua script check-and-delete must be atomic (single round trip) to prevent the check-then-delete TOCTOU race where B acquires between A's GET and DEL.”}},{“@type”:”Question”,”name”:”What is a fencing token and when do you need it?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A fencing token is a monotonically increasing integer returned by each lock acquisition. The lock service increments a counter on each successful acquisition: first caller gets token 1, second gets token 2, etc. When lock holder A sends a request to a downstream system (database, file storage), it includes its fencing token. The downstream system rejects any request with a token lower than the highest token it has seen. This prevents: A holds lock (token=5), pauses past TTL, B acquires lock (token=6), B writes to DB with token 6, A resumes and tries to write with token 5 — DB rejects A's write (token 5 < 6). Requires the downstream resource to implement token checking. Used in Google Chubby, HBase RegionServer assignment.”}},{“@type”:”Question”,”name”:”When should you use ZooKeeper locks instead of Redis?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use ZooKeeper (or etcd) when: strong consistency is required (critical financial operations, leader election in distributed databases), correctness matters more than throughput, and the lock holder may pause for extended periods. ZooKeeper is a CP system — it remains consistent under network partitions, sacrificing availability. Redis-based locks are not safe under process pauses beyond TTL (two processes can simultaneously believe they hold the lock). ZooKeeper ephemeral nodes auto-delete on client disconnect (session timeout), with automatic failover to the next waiter via watches. Latency: ZooKeeper 1-10ms vs Redis 0.1-1ms. For deduplication or low-stakes mutual exclusion, Redis is fine. For leader election where incorrectness causes data corruption, use ZooKeeper or etcd.”}},{“@type”:”Question”,”name”:”What happens if the lock holder crashes while holding a distributed lock?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The TTL (time-to-live) is the safety valve. If the holder crashes, the key auto-expires after TTL milliseconds and other processes can acquire the lock. TTL should be longer than the expected critical section duration but short enough to recover quickly from crashes. For operations with unknown duration: implement lock renewal (heartbeat). While the holder is alive and processing, periodically extend the TTL: if the stored token matches, extend. If the holder crashes, heartbeat stops, TTL eventually expires. Lock renewal with ZooKeeper: the client keeps its session alive with heartbeats; ephemeral node persists as long as the session is alive. Session timeout = how long ZooKeeper waits before declaring the client dead and deleting its ephemeral nodes.”}}]}

Uber system design covers distributed locking for dispatch concurrency. See common questions for Uber interview: distributed lock and concurrency system design.

Stripe system design covers distributed locks for payment deduplication. Review patterns for Stripe interview: distributed lock and payment idempotency design.

Databricks system design covers distributed coordination and leader election. See patterns for Databricks interview: distributed coordination and lock design.