System Design: Distributed Lock Service — ZooKeeper, etcd, Redis Redlock, Fencing Tokens, Consensus, Leader Election

Distributed locks coordinate access to shared resources across multiple processes or machines. Unlike local mutexes, distributed locks must handle network partitions, process crashes, and clock skew. This guide covers the architecture of distributed lock services like ZooKeeper, etcd, and Redis-based locking, including the subtle correctness issues that make this one of the hardest problems in distributed systems.

Why Distributed Locks Are Needed

Use cases requiring distributed locks: (1) Leader election — only one instance of a service should perform a specific task (cron job, queue consumer, migration runner). The lock holder is the leader; if it crashes, another instance acquires the lock and becomes leader. (2) Resource protection — only one process should write to a specific database row, file, or external resource at a time. Example: two payment processors should not charge the same order simultaneously. (3) Rate limiting with distributed state — a global rate limiter needs atomic increment operations across multiple API gateway instances. (4) Workflow coordination — distributed workflow engines use locks to ensure exactly-once execution of workflow steps.

Redis-Based Distributed Locks

The simplest distributed lock uses a single Redis instance. Acquire: SET lock_key unique_value NX EX 30. NX ensures only one client succeeds (atomic set-if-not-exists). EX 30 sets a 30-second TTL to prevent deadlocks if the holder crashes. Release: use a Lua script to atomically check the value and delete: if redis.call(“get”, KEYS[1]) == ARGV[1] then return redis.call(“del”, KEYS[1]) else return 0 end. The unique_value (UUID) prevents a client from releasing another client lock. Problem: if the Redis instance fails, the lock is lost. A client that acquired the lock may continue operating, while another client acquires a new lock on the replacement Redis — two clients now hold the “same” lock simultaneously.

Redlock: Multi-Node Redis Locking

Martin Kleppmann and Salvatore Sanfilippo (antirez) famously debated the correctness of Redlock, Redis multi-node locking algorithm. Redlock algorithm: deploy 5 independent Redis instances (not replicas). To acquire a lock: (1) Record the current time T1. (2) Attempt to acquire the lock on all 5 instances sequentially with a short timeout (5-50ms per instance). (3) If the lock was acquired on at least 3 of 5 instances (majority), and the total time elapsed (T2 – T1) is less than the lock TTL, the lock is acquired. (4) If the lock is not acquired, release all instances. Kleppmann argument against Redlock: if the lock holder process pauses (GC pause, page fault) for longer than the TTL, the lock expires and another client acquires it. The original holder resumes and believes it still holds the lock. Two clients now operate on the shared resource. This is fundamental to any TTL-based lock — the lock can expire before the holder finishes its work.

Fencing Tokens for Correctness

Fencing tokens solve the TTL expiration problem. When the lock service grants a lock, it also returns a monotonically increasing fencing token (integer). The lock holder includes this token in every request to the shared resource. The resource (database, storage service) rejects any request with a token lower than the highest token it has seen. Example: client A acquires lock with token 33, pauses. Lock expires. Client B acquires lock with token 34 and writes to the database with token 34. Client A resumes and tries to write with token 33 — the database rejects it because it has already seen token 34. This requires the resource to participate in the protocol by tracking and comparing tokens. ZooKeeper ephemeral sequential nodes naturally provide monotonically increasing values that serve as fencing tokens. Redis does not natively provide fencing tokens — you must implement them separately.

ZooKeeper-Based Distributed Locks

ZooKeeper provides stronger guarantees than Redis for distributed locking. Lock acquisition: (1) Create an ephemeral sequential node under a lock path: /locks/resource_name/lock-. ZooKeeper appends a sequence number: /locks/resource_name/lock-0000000001. (2) Get all children of /locks/resource_name/. (3) If the created node has the lowest sequence number, the lock is acquired. (4) If not, set a watch on the node with the next-lower sequence number and wait for it to be deleted. Ephemeral nodes are automatically deleted when the client session ends (client crashes or disconnects). This prevents deadlocks without relying on TTLs. The sequential ordering provides fairness (FIFO) and prevents the thundering herd problem — only the next waiter is notified when the lock is released, not all waiters. etcd provides similar functionality with its lease mechanism and compare-and-swap operations.

etcd Locking with Leases

etcd uses leases (similar to TTLs but with explicit renewal) for lock management. Lock acquisition: (1) Create a lease with a TTL (e.g., 15 seconds): etcdctl lease grant 15. (2) Put a key with the lease attached: etcdctl put /locks/resource lock_holder –lease=LEASE_ID. (3) The put succeeds only if no other key exists at that path (using a transaction with a compare operation). (4) The holder must periodically renew the lease (keep-alive) before it expires: etcdctl lease keep-alive LEASE_ID. If the holder crashes and stops renewing, the lease expires after 15 seconds and the key is deleted — another client can acquire the lock. etcd advantage over ZooKeeper: etcd uses the Raft consensus protocol (simpler than ZooKeeper ZAB), has a gRPC API (better for modern microservices), and provides linearizable reads by default.

Leader Election Pattern

Leader election is the most common use case for distributed locks. Pattern: all instances of a service attempt to acquire the same lock. The instance that succeeds becomes the leader and performs the exclusive work (processing a queue, running scheduled jobs, coordinating other workers). Non-leaders either idle or perform non-exclusive work. When the leader crashes, its lock is released (ephemeral node deleted or lease expires), and another instance acquires the lock and becomes the new leader. Implementation with etcd: use the etcd concurrency package. The leader campaigns for election, and the package handles lease creation, renewal, and leader change notifications. Kubernetes uses etcd leader election for controller-manager and scheduler — only one instance runs at a time in an HA deployment.

Choosing the Right Distributed Lock

Decision matrix: (1) If you need best-effort locking for efficiency (prevent duplicate work but tolerate occasional overlap), use a single Redis instance with SET NX EX. Simple, fast, and good enough for most use cases. (2) If you need correctness guarantees (financial transactions, inventory management), use ZooKeeper or etcd with fencing tokens. The consensus-based coordination service provides stronger guarantees. (3) If you already run Kubernetes, etcd is available and well-understood — use it rather than adding ZooKeeper. (4) If latency is critical (sub-millisecond lock acquisition), Redis is faster than ZooKeeper/etcd because it skips consensus. Accept the weaker guarantees. (5) Avoid Redlock in production for correctness-critical paths — the debate around its safety is unresolved, and the operational complexity of 5 independent Redis instances is high. Use it only when a single Redis instance is insufficient for availability but correctness is not critical.

Scroll to Top