Distributed locks appear deceptively simple — acquire a lock, do some work, release it — but implementing them correctly in the presence of network partitions, clock skew, and process pauses requires careful reasoning about failure modes. This guide covers the spectrum from naive Redis locks to ZooKeeper-based solutions, explaining exactly what each approach guarantees and where it can go wrong.
Why Distributed Locks Are Hard
In a single-process system a mutex works because the CPU provides atomic compare-and-swap and the lock state lives in shared memory. In a distributed system neither property holds. Three failure modes make distributed locking fundamentally harder than its local counterpart.
Clock skew means that two nodes cannot agree on the current time. If a lock has a TTL of 10 seconds and node A’s clock runs 5 seconds fast, A may believe its lock has expired while it is still holding it — or vice versa. Process pauses are equally dangerous: a JVM garbage collection pause, a VM live migration, or a kernel scheduling stall can suspend a lock holder for seconds or even minutes. During the pause the lock’s TTL expires and another node acquires it; when the paused process resumes it believes it still holds the lock and proceeds to write shared state concurrently with the new holder, corrupting it. Network partitions can cause a node to lose its connection to the lock service, making it unable to determine whether it still holds the lock. These are not theoretical edge cases — they occur in production multiple times per day in large fleets.
Fencing Tokens
The fencing token pattern, described by Martin Kleppmann, is the only mechanism that provides safety in the presence of process pauses. Every time a client acquires the lock, the lock service returns a fencing token: a monotonically increasing integer. The client passes this token with every write to the storage service it is protecting. The storage service tracks the highest token it has seen and rejects writes with a lower token number.
Consider the GC pause scenario: client A acquires the lock and receives token 33. A is paused for 30 seconds. The lock expires, client B acquires it and receives token 34. B writes to storage with token 34, which the storage service accepts. A resumes and tries to write with token 33 — the storage service sees that 34 has already been processed and rejects A’s write. Fencing tokens push safety enforcement to the resource being protected rather than relying on the lock service and the client staying synchronized. This is architecturally significant: no matter how buggy or delayed the lock client is, the storage service’s monotonic check prevents double-writes. The lock service needs only to guarantee that tokens are strictly increasing, which it can do with a simple atomic counter.
Redis Single-Instance Lock
The canonical Redis lock uses SET with the NX (only set if not exists) and PX (TTL in milliseconds) options in a single atomic command: SET lock_key client_uuid NX PX 30000. The NX flag ensures only one client can set the key. The value is a UUID unique to the lock holder — this is critical for safe release. A naive DELETE would release a lock held by a different client if the original holder’s TTL expired; using a UUID prevents this by making release conditional on identity.
Safe release requires an atomic check-and-delete, which Redis implements with a Lua script: if redis.call("get", KEYS[1]) == ARGV[1] then return redis.call("del", KEYS[1]) else return 0 end. Redis executes Lua scripts atomically, so no other command can interpose between the GET and the DEL. The PX TTL acts as a safety net: if the lock holder crashes without releasing, the key expires automatically and other clients can acquire. The practical limit of this approach is that it provides no safety guarantee if the Redis node itself fails — a replica-promoted master may not have received the latest write, and a key that was just set may appear absent on the new master.
Redlock Algorithm
Redlock is Redis’s answer to single-node failure. To acquire the lock the client attempts to set the key on N independent Redis instances (typically 5) using the same NX PX command as the single-instance case. The algorithm records the time before and after the acquisition attempts. The lock is considered acquired only if the client succeeded on at least N/2+1 instances (a quorum) and the time elapsed during acquisition is less than the lock’s TTL minus a safety margin. The validity time — the window during which the client can safely use the lock — is TTL minus the elapsed acquisition time.
Release requires sending the Lua delete script to all N instances, regardless of whether acquisition succeeded on each — this prevents a delayed SET from arriving after a failed acquisition and creating a ghost lock. Martin Kleppmann’s critique of Redlock, published in 2016, argues that Redlock still fails in the GC pause scenario: after acquiring the lock, a process pause can outlast the validity window and the algorithm provides no fencing token mechanism to detect this. The Redis authors (Salvatore Sanfilippo) dispute whether this failure mode is practically significant given typical TTL values. The practical consensus is: use Redlock for efficiency (preventing duplicate work) but not for correctness (preventing concurrent conflicting writes to storage that lacks its own concurrency control).
ZooKeeper Ephemeral Nodes
ZooKeeper’s lock recipe uses ephemeral sequential nodes to implement both mutual exclusion and fair ordering. To acquire a lock, a client creates an ephemeral sequential znode under a lock path — for example /locks/mylock/lock-0000000042. ZooKeeper assigns a monotonically increasing sequence number to each node. The client then lists all children of /locks/mylock and checks whether its node has the lowest sequence number. If it does, the client holds the lock. If not, the client sets a watch on the node with the next-lower sequence number and waits for a notification.
The ephemeral node is the key safety mechanism: when a client’s session expires — either because the client crashed, was paused long enough to miss heartbeats, or lost network connectivity — ZooKeeper automatically deletes all ephemeral nodes created by that session. This triggers watches on the next waiter, which then acquires the lock. There is no manual TTL to tune, no renewal heartbeat to implement, and no UUID to track: session expiry handles it all. Watching only the predecessor rather than all nodes prevents the thundering herd problem — when the lock is released, only one client (the next in queue) is notified, not all waiters. ZooKeeper’s consensus protocol (ZAB) ensures that all writes are linearizable, so sequence numbers and the lock state are globally consistent without relying on clock synchronization.
Lock TTL and Renewal
Any lock with a TTL-based expiry — which includes Redis locks and ZooKeeper session timeouts — must address the renewal problem. If the lock TTL is shorter than the time needed to complete the protected work, the lock expires before the work is done, defeating its purpose. Setting a very long TTL is dangerous because a crashed holder blocks other clients for the full duration.
The solution is active lease renewal: a background thread (or goroutine, or async task) periodically extends the lock’s TTL before it expires. For a Redis lock with a 30-second TTL, the renewal thread fires every 10 seconds and resets the TTL to 30 seconds using a PEXPIRE command, conditioned on the lock still being held by this client (checked via a Lua script that verifies the UUID). For ZooKeeper, session renewal happens automatically through the ZooKeeper client library’s heartbeat thread — the application does not need to implement it explicitly. The renewal thread must be monitored: if it fails (due to a network partition or GC starvation), the client should detect the missed renewal and abort the protected operation rather than continuing under a potentially expired lock. Redisson (the Java Redis client) implements this watchdog pattern, automatically extending lock TTL until the lock is explicitly released.
Deadlock Prevention
Distributed deadlocks occur when two or more clients each hold a lock that the other needs. In systems where a single operation must acquire multiple locks — for example, transferring funds between two accounts requires locking both accounts — deadlock is a real risk.
Lock ordering is the simplest prevention strategy: always acquire multiple locks in a globally consistent order (for example, sorted by resource ID). If all clients follow this ordering, circular wait — the necessary condition for deadlock — cannot occur. For cases where ordering is impractical, timeout with exponential backoff provides a practical alternative: if acquiring the second lock fails within a deadline, the client releases the first lock, waits a random exponential backoff interval, and retries the full acquisition sequence from scratch. The tryLock-with-deadline pattern makes this explicit: tryLock(lockA, 100ms) and if successful tryLock(lockB, 100ms), aborting and releasing on any timeout. The backoff must include jitter (randomization) to prevent synchronized retry storms where all blocked clients retry simultaneously, causing a new round of contention.
Fair Lock Semantics
An unfair lock grants the lock to any waiting client when it is released. Under high contention this causes starvation: some clients acquire the lock repeatedly while others wait indefinitely. A fair lock guarantees FIFO ordering — the client that has been waiting longest is always next to acquire.
ZooKeeper’s sequential node recipe is inherently fair: clients acquire in the order their sequential nodes were created. Redis does not natively support fair locking, but it can be emulated with a list-based queue: clients enqueue themselves by pushing a UUID to a Redis list (RPUSH), then spin-wait polling the head of the list (LINDEX 0) and comparing it to their UUID. Only the client at the head of the list may proceed; others wait with exponential backoff. The tradeoff of fairness is throughput: an unfair (thundering herd) lock can achieve higher throughput when contention is sporadic because the first client to retry after a release succeeds immediately without waiting for a queue traversal. Fair locks sacrifice peak throughput for predictable latency distribution, which is preferable in systems with SLA guarantees or where starvation would cause correctness problems (such as lease-based leadership election where a perpetually starved candidate can never become leader).
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering