Why Distributed Locks?
In a distributed system, multiple servers run the same code concurrently. A distributed lock ensures only one server executes a critical section at a time — preventing double-processing, race conditions on shared resources, and duplicate task execution. Use cases: flash sale inventory reservation, cron job deduplication, preventing double-payment, leader election.
Redis-Based Lock (Single Node)
The fundamental command: SET key token NX PX ttl_ms. NX = set only if Not eXists. PX = TTL in milliseconds. Returns OK if lock acquired, nil if already held.
import uuid
def acquire_lock(r, lock_name, ttl_ms=5000):
token = str(uuid.uuid4()) # unique per lock holder
acquired = r.set(lock_name, token, nx=True, px=ttl_ms)
return token if acquired else None
Always use a unique token (UUID) per acquisition — this is critical for safe release.
Safe Lock Release with Lua Script
Release must be atomic: check that the stored token matches before deleting. Without this, a timed-out lock holder can delete another holder’s lock.
# Lua script runs atomically in Redis
RELEASE_LUA = """
if redis.call('get', KEYS[1]) == ARGV[1] then
return redis.call('del', KEYS[1])
else
return 0
end
"""
def release_lock(r, lock_name, token):
r.execute_script(RELEASE_LUA, 1, lock_name, token)
Scenario without this protection: Server A acquires lock, pauses past TTL, lock expires, Server B acquires lock, Server A resumes and deletes — it has now deleted B’s lock. The atomic check-and-delete prevents this.
The TTL Problem and Fencing Tokens
Even with unique tokens, there’s a subtle issue: Server A holds the lock with TTL=5s. A pauses for 6 seconds (GC, VM migration). Lock expires. B acquires it. A resumes — both A and B are now in the critical section simultaneously. Solution: fencing tokens. Each lock acquisition returns a monotonically increasing integer (the lock version). Pass this token to downstream systems. Downstream systems reject requests with a token lower than the latest accepted token. This requires the downstream resource to track the max token seen. Used in distributed databases (HBase epoch fencing, Kubernetes leader election).
Redlock (Multi-Node Redis)
For higher availability, acquire the lock on N=5 independent Redis nodes. Lock is acquired if: (a) majority N/2+1 = 3 nodes return OK, and (b) total acquisition time < TTL. Actual TTL for use = original_ttl – acquisition_time. Release: send DEL to all 5 nodes. Controversy: Redlock does not guarantee safety under process pauses or clock drift — Martin Kleppmann’s 2016 analysis showed scenarios where two nodes simultaneously believe they hold the lock. For most workloads, single-node Redis is sufficient. For critical financial operations, use a CP system.
ZooKeeper and etcd Locks
ZooKeeper: create ephemeral sequential znodes under /locks/. The holder with the lowest sequence number owns the lock. Others watch the node immediately below them (not the root, to avoid thundering herd). When the holder disconnects, its ephemeral node is auto-deleted, waking the next waiter. etcd: PUT /lock/{caller_id} with a lease TTL; key auto-expires when lease elapses; use watch to be notified when the key is deleted. Both are CP systems — they sacrifice availability for consistency. Correct under network partitions and process pauses (unlike Redis). Use for: leader election, critical coordination in distributed databases.
Comparison
- Redis SET NX: simple, fast, eventual safety guarantee. Good for deduplication and low-stakes critical sections.
- Redlock: more available than single-node Redis, but controversial safety guarantees. Avoid for financial transactions.
- ZooKeeper / etcd: strong consistency, correct under failures. Use for leader election and high-stakes coordination. Slower due to consensus overhead.
- DB advisory locks: SELECT GET_LOCK() in MySQL — simple, leverages existing DB, single point of failure.
Uber system design covers distributed locking for dispatch concurrency. See common questions for Uber interview: distributed lock and concurrency system design.
Stripe system design covers distributed locks for payment deduplication. Review patterns for Stripe interview: distributed lock and payment idempotency design.
Databricks system design covers distributed coordination and leader election. See patterns for Databricks interview: distributed coordination and lock design.
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Atlassian Interview Guide