Why Distributed Locks?
A mutex works within a single process. A database row lock works within a single database. When multiple application servers must coordinate access to a shared resource (run a cron job on exactly one server, prevent two checkouts from reserving the same inventory), you need a distributed lock — one that works across servers and data centers.
Redis-based Locking (Simple)
SET lock:{resource} {unique_id} NX PX {ttl_ms}. NX: set only if not exists (atomic acquisition). PX {ttl_ms}: expire automatically if the holder crashes (prevents deadlock). The unique_id (UUID) identifies the lock owner — critical for safe release.
Safe release (Lua script for atomicity): if the current value matches the owner UUID, delete the key. Without this check: a slow process might release a lock acquired by another process (its own lock expired, another acquired it, then the slow process deletes the new lock). The Lua script is atomic on Redis (single-threaded).
-- Lua script for safe release
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
Redlock Algorithm
The simple Redis lock fails if the Redis master crashes before replication — the new master has no lock record. Redlock uses N independent Redis instances (N=5 is typical). To acquire: try to SET the lock on all N instances. If a majority (ceil(N/2)+1 = 3) succeed within a timeout, the lock is acquired. The lock TTL is reduced by the acquisition time. To release: release on all N instances. Properties: tolerates up to (N-1)/2 = 2 instance failures. The clock drift assumption is the main criticism (Martin Kleppmann argued that distributed systems with clocks and pauses make Redlock unsafe in theory; in practice it works for most use cases with careful TTL settings).
ZooKeeper / etcd-based Locking
ZooKeeper uses ephemeral sequential nodes. To acquire: create a node /locks/mylock/guid- (sequential). List all children; if your node has the lowest sequence number, you have the lock. If not, watch the node with the next-lower sequence number — when it is deleted, recheck. ZooKeeper guarantees linearizability and handles sessions (ephemeral nodes are deleted when the session expires, releasing the lock automatically on crash). etcd uses leases (TTL-based) and compare-and-swap operations for similar semantics. More heavyweight than Redis but stronger consistency guarantees.
Database-based Locking
INSERT INTO distributed_locks (resource, owner, expires_at) VALUES (X, Y, NOW() + INTERVAL 30 seconds). Unique constraint on resource prevents duplicate locks. On failure (constraint violation): another holder has the lock. Heartbeat: holder updates expires_at every 10 seconds to prevent expiry while active. To release: DELETE WHERE resource = X AND owner = Y. Clean up expired locks: DELETE WHERE expires_at < NOW() (run periodically or on next acquisition attempt). Simpler than Redis; no extra infrastructure; higher latency (database round trip vs Redis in-memory). Good for: applications already running on a database, infrequent lock acquisitions.
Fencing Tokens
Even with a perfect distributed lock, a process can hold a lock, pause (GC, network partition), have its lock expire, another process acquires the lock — both believe they hold the lock. Solution: fencing tokens. On lock acquisition, return a monotonically increasing token. When writing to the protected resource, include the token. The resource rejects writes with tokens older than the last accepted token. Only one process (the one with the highest token) can successfully write, regardless of clock drift or pauses.
When to Use Each
| Mechanism | Latency | Reliability | Complexity | Use When |
|---|---|---|---|---|
| Simple Redis SET NX | Very Low | Medium | Low | Single Redis, tolerate rare failures |
| Redlock | Low | High | Medium | Multi-datacenter, high reliability needed |
| ZooKeeper / etcd | Medium | Very High | High | Critical coordination, strong consistency |
| Database lock | Medium | High | Low | Simple use case, DB already available |
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you implement a distributed lock with Redis?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use SET lock:{resource} {unique_owner_id} NX PX {ttl_ms}. NX ensures the SET only succeeds if the key does not exist (atomic acquisition). PX sets a TTL so the lock auto-expires if the holder crashes (prevents deadlock). The unique_owner_id (UUID) is critical for safe release: use a Lua script to atomically check ownership before deleting. Without the check, a slow holder could release a lock that has expired and been re-acquired by another process. Typical TTL: 30 seconds for short operations, longer for batch jobs (with heartbeat renewal). Heartbeat renewal: extend the TTL every 10 seconds while the holder is still active using PEXPIRE with a new TTL.”
}
},
{
“@type”: “Question”,
“name”: “What is the Redlock algorithm and when should you use it?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Redlock uses N independent Redis instances (N=5 recommended). To acquire a lock: try SET NX PX on all N instances with the same lock name and owner ID. If a majority (3 of 5) succeed within a small timeout (10ms), the lock is acquired. The effective TTL is reduced by acquisition time. To release: run the Lua delete-if-owner script on all N instances. Redlock tolerates (N-1)/2 instance failures. Use Redlock when: you need high availability and cannot accept a single Redis SPOF. Do not use when: strong safety guarantees are critical and clock drift is a concern (use ZooKeeper or etcd instead). For most applications where lock loss probability is acceptable (e.g., job scheduling, not financial transactions), simple Redis SET NX is sufficient.”
}
},
{
“@type”: “Question”,
“name”: “What is a fencing token and why is it needed?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Even with a perfect distributed lock, a process can experience: GC pause, network partition, or slow disk — during which its lock expires and another process acquires the same lock. Now two processes believe they hold the lock simultaneously. A fencing token solves this: the lock server returns a monotonically increasing token on each acquisition (implemented with Redis INCR or ZooKeeper zxid). When writing to the protected resource, the holder includes its token. The resource rejects any write with a token older than the last accepted token. This means even if an old lock holder resumes after a pause, its writes are rejected because a newer token has been accepted. Fencing tokens make distributed locking safe even under arbitrary delays.”
}
},
{
“@type”: “Question”,
“name”: “How does ZooKeeper implement distributed locking?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “ZooKeeper uses ephemeral sequential znodes. To acquire a lock on /locks/myresource: create a node /locks/myresource/lock- with the EPHEMERAL_SEQUENTIAL flag. ZooKeeper assigns a sequence number (e.g., lock-0000000003). List all children of /locks/myresource. If your node has the lowest sequence number, you have the lock. If not: watch the node with the next-lower sequence number. When that node is deleted (its holder released or crashed), you are notified and re-check. Ephemeral nodes: ZooKeeper automatically deletes ephemeral nodes when the client session expires (client crash = lock released). This provides reliable crash detection without TTL expiry uncertainty.”
}
},
{
“@type”: “Question”,
“name”: “How do you prevent deadlock in distributed locking?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Deadlock occurs when two processes each hold a lock the other needs: A holds lock1, needs lock2; B holds lock2, needs lock1. Prevention strategies: (1) Lock ordering: always acquire locks in the same order (alphabetical by resource name). If everyone acquires lock1 before lock2, deadlock is impossible. (2) TTL-based expiry: all distributed locks have a TTL. Deadlock is bounded — the locks will expire within the TTL window. (3) Timeout with backoff: if acquisition times out, release all held locks and retry with exponential backoff and jitter. (4) Single lock for the entire operation: if possible, acquire one lock covering all resources rather than multiple fine-grained locks. TTL expiry is the most common approach in production — deadlock is simply bounded rather than prevented.”
}
}
]
}
Asked at: Netflix Interview Guide
Asked at: Cloudflare Interview Guide
Asked at: Databricks Interview Guide
Asked at: Atlassian Interview Guide
Asked at: Coinbase Interview Guide