Low Level Design: Distributed Lock Manager Design

⏱ 3 min read

A distributed lock manager (DLM) coordinates exclusive access to shared resources across multiple processes or services running on different machines. Unlike single-process mutexes (which work within one process), distributed locks must handle network partitions, clock skew, process crashes, and lock expiry. Distributed locking is used for leader election, preventing duplicate job execution, coordinating database migrations, and ensuring mutual exclusion in distributed cron jobs. Redis-based locks (Redlock) and ZooKeeper/etcd ephemeral nodes are the two dominant implementations.

Redis-Based Locking: SET NX PX

The simplest Redis distributed lock: SET lock_key unique_value NX PX 30000. NX (only set if not exists) ensures mutual exclusion. PX 30000 sets a 30-second expiry so the lock is automatically released if the holder crashes. The unique_value (UUID) ensures only the lock holder can release it — prevents a slow process from releasing a lock that expired and was re-acquired by another process. Release: Lua script checks that the value matches before deleting. This provides mutual exclusion for the duration of the lock, assuming Redis does not fail.

// Redis-based distributed lock
func (r *RedisLock) Acquire(ctx context.Context, key string, ttl time.Duration) (bool, error) {
    token := uuid.New().String()
    result, err := r.client.SetNX(ctx, "lock:"+key, token, ttl).Result()
    if err != nil || !result {
        return false, err  // lock already held or Redis error
    }
    r.token = token
    return true, nil
}

// Atomic release via Lua script (check-then-delete)
var releaseScript = redis.NewScript(`
    if redis.call("GET", KEYS[1]) == ARGV[1] then
        return redis.call("DEL", KEYS[1])
    else
        return 0  -- lock expired and re-acquired by someone else
    end
`)

func (r *RedisLock) Release(ctx context.Context, key string) error {
    return releaseScript.Run(ctx, r.client, []string{"lock:"+key}, r.token).Err()
}

Redlock: Multi-Node Consensus

A single Redis instance as a lock manager has a failure mode: if Redis fails, all locks are lost (or held indefinitely). Redlock acquires the lock on N Redis nodes (typically 5) independently. A lock is considered acquired if obtained on a majority (N/2 + 1 = 3 nodes) within a validity window. Release happens on all nodes. If Redis instances fail, the lock remains valid as long as a majority are up. Controversy: Martin Kleppmann argued Redlock is unsafe under certain conditions (GC pauses, clock jumps). For safety-critical locking, use fencing tokens (monotonically increasing lock sequence number included in downstream requests) to detect and reject stale lock holders.

etcd/ZooKeeper: Consensus-Based Locking

etcd and ZooKeeper use Raft/ZAB consensus protocols — all writes go through a leader and are replicated to a quorum. Distributed locks use ephemeral nodes (ZooKeeper) or leases (etcd): create an ephemeral key/node with a TTL. The node is automatically deleted when the client session ends (heartbeat timeout). Only one client can create the same key (atomicity via consensus). This provides stronger guarantees than Redis-based locking because consensus ensures the write is truly durable before acknowledging. Kubernetes uses etcd leases for leader election (controller-manager, scheduler leader election).

Key Interview Discussion Points

Lock duration vs. TTL: TTL must be longer than the maximum expected critical section duration plus clock skew margin; too short and the lock expires while still held
Fencing tokens: the lock manager returns a monotonically increasing token with each lock grant; downstream services reject requests with a token lower than the last seen — prevents stale lock holder from corrupting state after lock expiry
Lock-free alternatives: when possible, use optimistic concurrency control (database CAS, version numbers) instead of distributed locks — locks are a single point of contention
Deadlock prevention: always acquire multiple locks in a consistent global order; set a maximum lock wait timeout; use tryLock with backoff instead of blocking lock()
Leader election: distributed locks power leader election — the process holding the lock is the leader; on lock expiry or process crash, another process acquires the lock and becomes leader