Consensus: Agreeing on Truth
In the previous chapter, we elected a Leader. Now, that Leader has a job: Replicate Data.
In a distributed system, how do we get multiple nodes to agree on a sequence of values?
- Problem: Nodes crash. Packets vanish. The network partitions.
- Goal: All honest nodes agree on the same log of commands (Safety) and eventually decide (Liveness).
1. The Trinity: Paxos vs Raft vs ZAB
While Paxos is the academic holy grail, Raft is the engineer’s choice.
| Feature | Paxos (1989) | Raft (2014) | ZAB (ZooKeeper) |
|---|---|---|---|
| Philosophy | Theoretical purity. No specific leader required. | Understandability. Strong Leader is mandatory. | Primary-Backup ordering (FIFO). |
| Complexity | Extreme. (Multi-Paxos is hard to implement). | Moderate. Decomposed into Election & Log Replication. | High. |
| Use Case | Google Spanner, Cassandra (LWT). | Kubernetes (Etcd), Consul, CockroachDB. | Kafka (Old Controller), Hadoop. |
[!TIP] Interview Strategy: Explain Raft. It has a linear narrative:
Leader Election→Log Replication→Commit. Paxos is a mesh of proposers and acceptors that is easy to get wrong.
2. Raft Log Replication
Once a Leader is elected, it handles all Client requests. The goal: Replicate SET X=5 to a majority of followers.
The Flow Diagram
3. Interactive Demo: The Replication Loop
Cyberpunk Mode: Visualize the commit flow.
- Mission: Replicate
SET X=99to the cluster. - Obstacle: Network Partition.
- Visuals: Watch the logs turn from Yellow (Uncommitted) to Green (Committed).
[!TIP] Try it yourself:
- Click “🚀 Send SET X=99”. Watch the Leader replicate to Followers, get ACKs, and Commit.
- Click “✂️ Network” to cut the connection.
- Try sending again. Watch the Leader retry endlessly because it can’t reach a Quorum.
4. Crash Fault vs Byzantine Fault
Standard Consensus (Raft/Paxos/ZAB) assumes CFT (Crash Fault Tolerance).
- Nodes might fail, pause, or lose disks.
- But nodes do NOT lie.
The Byzantine Generals Problem
What if a node is compromised and sends:
- “I vote for A” to General 1.
- “I vote for B” to General 2.
This is a Byzantine Fault. Raft cannot handle this. You need BFT (Byzantine Fault Tolerance) algorithms like PBFT or Proof of Work (Bitcoin).
| Type | Tolerance | Failure Example | Algo |
|---|---|---|---|
| CFT | (N-1)/2 Failures |
Power outage, GC Pause, Disk crash. | Raft, Paxos |
| BFT | (N-1)/3 Failures |
Hacked node, Software Bug, Malicious actor. | PBFT, Tendermint |
Summary
- Leader orders the log.
- Followers replicate the log.
- Commit happens only when a Majority acknowledges.
- Linearizability is guaranteed by reading from the Leader (who checks with a quorum).