Consensus: Raft / Paxos — Agreement Under Failure

Algorithm for a cluster of nodes to agree on a single value (or log entry) even when some nodes fail.

When to use

Leader election in distributed systems (etcd, ZooKeeper)
Replicated log for consistent state across nodes (Kafka KRaft, CockroachDB)

Tradeoffs

Requires N/2+1 quorum — minority partition becomes unavailable
Latency penalty for every write (must wait for quorum acknowledgment)

Go
Python

type NodeState int

const (
    Follower  NodeState = iota // default, receives log entries from leader
    Candidate                  // seeking votes after election timeout
    Leader                     // sends heartbeats, accepts writes
)

type RaftNode struct {
    state       NodeState
    currentTerm int
    votedFor    *string
    log         []LogEntry
}

// Transition: Follower → Candidate on election timeout
func (n *RaftNode) startElection() {
    n.state = Candidate
    n.currentTerm++
    n.votedFor = &n.id
    // broadcast RequestVote RPCs to all peers
}

from enum import Enum, auto

class NodeState(Enum):
    FOLLOWER  = auto()  # default, receives log entries from leader
    CANDIDATE = auto()  # seeking votes after election timeout
    LEADER    = auto()  # sends heartbeats, accepts writes

class RaftNode:
    def __init__(self, node_id: str):
        self.state = NodeState.FOLLOWER
        self.current_term = 0
        self.voted_for: str | None = None
        self.log: list = []
        self.id = node_id

    def start_election(self) -> None:
        # Transition: FOLLOWER → CANDIDATE on election timeout
        self.state = NodeState.CANDIDATE
        self.current_term += 1
        self.voted_for = self.id
        # broadcast RequestVote RPCs to all peers

Gotcha: Raft is Paxos made understandable. etcd uses Raft. Kafka replaced ZooKeeper with KRaft (also Raft). If you're building on these, you're already using consensus.