Replication maintains copies of your database on multiple servers. It's the primary mechanism for high availability (surviving server failures) and read scalability (distributing read load).
The most common replication pattern. One node (the leader/primary) accepts all writes. Changes are replicated to one or more follower/replica nodes. Reads can go to any replica.
Writer → [Primary DB]
↓ replication
[Replica 1] ← Readers
[Replica 2] ← Readers
[Replica 3] ← Readers
Replication Lag: Replicas may be slightly behind the primary. This lag is usually milliseconds but can grow under heavy write load.
The primary waits for at least one replica to acknowledge each write before confirming to the client.
Pros: No data loss on primary failure (replica has all data) Cons: Higher write latency; if a replica is slow, the primary slows down
The primary writes locally and acknowledges to the client immediately. Replication happens in the background.
Pros: Lower write latency; replicas can lag without affecting write performance Cons: If primary fails before replication, the replica is missing recent writes (data loss)
Most systems use asynchronous replication and accept a small risk of data loss for better performance.
Multiple nodes accept writes. Each node replicates to the others.
Use cases:
Key challenge: Write conflicts — two leaders can accept conflicting writes to the same record simultaneously. Requires a conflict resolution strategy:
All nodes accept writes. Used by Cassandra, DynamoDB, and Riak.
Writes go to multiple nodes (quorum write: W nodes must acknowledge). Reads query multiple nodes (quorum read: R nodes must respond). With W + R > N (total nodes), you're guaranteed at least one node has the latest data.
Typical settings with N=3:
When the primary fails, a replica must be promoted to primary. This is called failover.
Manual failover: An operator manually promotes a replica. Safer but slower. Automatic failover: The system detects failure and promotes automatically. Faster but can cause split-brain.
Split-brain: Two nodes both think they're the primary. Writes go to both, creating a conflict. Use an odd number of nodes with leader election (Raft, Paxos, etcd) to prevent this.
A common consistency challenge: you write data, then immediately read it — but get the old value because you hit a replica that hasn't caught up.
Solutions: