Ensure exactly one node is in charge in a distributed system. Consensus algorithms like Raft elect a leader, detect failures, and coordinate failover โ the backbone of distributed coordination.
Medium-Hard Medium FrequencyIn distributed systems, sometimes exactly one node must be responsible for a task โ accepting writes, running a batch job, or coordinating others. Leader election algorithms ensure one node is chosen, and a new leader is elected if the current one fails.
Nodes exist in one of three states: Follower, Candidate, or Leader. The leader sends periodic heartbeats. If followers don't hear from the leader within an election timeout, they transition to candidate and start an election.
RequestVote RPCs to other nodes.Formal, proven algorithms. Used internally by etcd, ZooKeeper, Consul. Guarantees safety (at most one leader per term) even during network partitions.
Create an ephemeral lock. The node holding the lock is leader. If it dies, the lock expires, and another node takes over. Simpler to use as a client than implementing Raft yourself.
The most dangerous failure mode is split brain: a network partition causes two nodes to think they're leader. Raft prevents this by requiring a majority quorum โ in a 3-node cluster, you need 2/3 votes. During a partition, only the partition with the majority can elect a leader.
3 nodes vs 5 nodes: 3-node cluster tolerates 1 failure (2/3 majority). 5-node tolerates 2 failures (3/5 majority) but every write needs 3 ACKs instead of 2 โ higher latency. Never use even numbers: 4-node tolerates only 1 failure (same as 3) with higher write latency.
Election timeout tuning: Too short = false elections during GC pauses or network blips. Too long = slow failover. The timeout must be randomized (300-500ms) to prevent two candidates from always splitting votes.
Fencing tokens: Even with leader election, a slow leader might not realize it's been replaced. Use monotonically increasing fencing tokens so downstream services can reject stale leader's writes.
Embedded vs external coordination: Implement Raft in your service (complex, full control) or use etcd/ZooKeeper as a coordination service (simpler, extra dependency). Most teams should use existing coordination services.
Leader election appears whenever you need a single-writer guarantee or exactly-once task execution in a distributed system.
Interview signal: The interviewer wants to see you understand how distributed systems achieve single-writer guarantees and handle leader failures without split brain.
kubectl apply, the write goes through Raft consensus. If the etcd leader dies, Raft elects a new one in ~1 second.| Metric | Value |
|---|---|
| Raft heartbeat interval | 100โ150 ms |
| Election timeout (randomized) | 300โ500 ms (10ร heartbeat) |
| Typical election completion | ~300โ500 ms |
| Worst case (split votes) | ~2 seconds |
| etcd write throughput | ~10K writes/sec |
| etcd read throughput | ~50K reads/sec |
| 3-node fault tolerance | 1 node failure |
| 5-node fault tolerance | 2 node failures |