Question 1

How does Raft's leader election randomized timeout mechanism work?

Accepted Answer

Each Raft follower picks a randomized election timeout between 150ms and 300ms. When a follower doesn't receive a heartbeat from the leader within that window, it increments its term, transitions to candidate state, and broadcasts RequestVote RPCs. Randomization staggers election starts across nodes, dramatically reducing the chance that two candidates split votes simultaneously and allowing one candidate to win a majority before others time out.

Question 2

How does Raft log replication use majority acknowledgment to ensure durability?

Accepted Answer

The leader appends a new entry to its local log and sends AppendEntries RPCs to all followers in parallel. Once a majority (quorum) of nodes — including the leader — have written the entry to their logs and responded with success, the leader marks the entry as committed and applies it to the state machine. It then notifies followers of the new commit index on the next heartbeat so they can apply the entry locally. This majority requirement guarantees that any future elected leader will contain every committed entry.

Question 3

How does Raft compare to Paxos in terms of understandability?

Accepted Answer

Raft was explicitly designed for understandability, decomposing consensus into three relatively independent sub-problems: leader election, log replication, and safety. It enforces a strong leader model where all writes flow through a single leader, simplifying reasoning about data flow. Paxos (and especially Multi-Paxos) is considered significantly harder to understand and implement correctly because the original paper leaves many practical details unspecified — such as leader election and log management — leading to wide implementation variation and subtle bugs in practice.

Question 4

What is multi-Raft and how does it enable scalable partitioned systems?

Accepted Answer

Multi-Raft runs multiple independent Raft consensus groups (regions) across the same cluster of nodes, each group owning a contiguous key range or partition of data. Each node typically participates as a leader in some groups and a follower in others, distributing leader load evenly. This avoids the single-leader bottleneck of a single Raft group and allows horizontal scaling of both throughput and storage. Systems like TiKV and CockroachDB use multi-Raft to manage thousands of regions across large clusters.

Question 5

What is a leader lease and how does it enable low-latency linearizable reads?

Accepted Answer

A leader lease is a time-bounded guarantee that a Raft leader holds exclusive leadership for a fixed interval (e.g., the election timeout duration). After winning election and confirming its lease is valid, the leader can serve read requests directly from its local state without issuing a round-trip ReadIndex or log entry — eliminating a network round trip. The lease relies on bounded clock drift between nodes; if clocks skew beyond the lease duration, the guarantee breaks, so this optimization requires careful clock discipline or a hardware clock assumption.

Low Level Design: Consensus Algorithms (Raft and Paxos)

Introduction

Raft Overview

Raft Leader Election

Raft Log Replication

Raft Safety

Paxos Overview

Paxos Phases

Practical Considerations

Frequently Asked Questions: Consensus Algorithms (Raft and Paxos)

How does Raft’s leader election randomized timeout mechanism work?

How does Raft log replication use majority acknowledgment to ensure durability?

How does Raft compare to Paxos in terms of understandability?

What is multi-Raft and how does it enable scalable partitioned systems?

What is a leader lease and how does it enable low-latency linearizable reads?