Distributed SystemsCAP Theorem

CAP Theorem states that a distributed system can guarantee at most two of three properties: Consistency, Availability, and Partition tolerance. In practice, since network partitions are unavoidable, you're always choosing between consistency and availability.

The Three Properties

Consistency (C)

Every read receives the most recent write or an error. After a write completes, all subsequent reads see that write — regardless of which node they hit.

Example: You transfer $100. After the transfer completes, any query to any node shows the updated balance.

Availability (A)

Every request receives a response (not an error), even if it's not the most recent data. The system is always up and responding.

Example: Even if some nodes are unreachable, the system still serves reads — potentially with slightly stale data.

Partition Tolerance (P)

The system continues to operate despite network partitions (nodes can't communicate with each other).

Example: If the network link between two data centers breaks, each data center continues to serve requests independently.

The Catch: P is Non-Optional

In any distributed system (multiple machines communicating over a network), network partitions will happen. You cannot build a distributed system without P.

Therefore, the real choice is: CP or AP?

  • CP systems: When a partition occurs, refuse requests to guarantee consistency (accept unavailability)
  • AP systems: When a partition occurs, continue serving requests but accept potentially stale data (sacrifice consistency)

CP Examples

  • HBase, Zookeeper, etcd: Prefer consistency. During a partition, some nodes will refuse to serve requests
  • Traditional SQL databases with synchronous replication
  • Spanner: Achieves strong consistency globally (though technically uses "external consistency")

Best for: Financial transactions, inventory management, any system where stale data could cause real-world harm

AP Examples

  • Cassandra, DynamoDB, CouchDB: Prefer availability. During a partition, nodes serve their best-available data
  • DNS: Propagates updates eventually, but stays available during partitions

Best for: Social media feeds, shopping carts, recommendation engines, any system where a slightly stale read is acceptable

PACELC Extension

CAP only describes behavior during partitions. PACELC extends it:

"If there's a Partition (P), choose between Availability (A) and Consistency (C). Else (E), choose between Latency (L) and Consistency (C)."

This captures a key real-world trade-off: even without partitions, achieving strong consistency requires coordination (slower latency) vs. accepting eventual consistency (faster).

Consistency Models

Consistency isn't binary. There's a spectrum:

| Model | Guarantee | Examples | |---|---|---| | Strong/Linearizable | Reads always see the latest write | etcd, Spanner | | Sequential | Operations appear in some global order | Zookeeper | | Causal | Causally related ops are ordered | Cosmos DB | | Eventual | Replicas will converge — eventually | Cassandra, DynamoDB |

Interview Tips

  • Know the theorem but go deeper: mention PACELC and the consistency spectrum
  • When asked about database choice, frame it as a CP vs. AP trade-off based on use case
  • Key phrase: "We need strong consistency for payments (CP), so I'd use PostgreSQL with synchronous replication, but for the user activity feed (AP), eventual consistency is fine, so Cassandra works well"
  • Don't say "I'd choose CA" — that's only possible in a single-node system (no distribution, no partition tolerance needed)