Posting this one as a deliberate "what if", because I keep seeing people reach for it.
The PUT budget is 200ms. Quorum acks across 3 replicas spanning AZs eat into that. So the tempting move: buffer writes in a queue and ack immediately, drain to the shards in the background. p99 drops, everyone's happy. That's the queue in the diagram, and the db is set to async ack to match.
Except it violates the durability contract. RPO 0 means no acked write is ever lost. With a write buffered in a queue (or a primary that acks before replicating), a crash between ack and durable-commit loses a write the client believes is committed. RF 3 doesn't save you — replication factor is about copies, ack policy is about when you promise.
So: keep RF3 but this should be quorum, not async, and the write path should be synchronous. I left it wrong here to make the failure mode concrete. Don't ship it.
Sign in to join the discussion.
Good that you flagged it yourself. The one-liner I use: replicationFactor is durability *potential*, writeAck is durability *guarantee*. The queue throws the guarantee away.
The 'replicationFactor is potential, writeAck is the guarantee' framing should be on a poster.
Posting the anti-pattern with its failure mode spelled out is more useful than a perfect answer. RPO 0 ≠ async ack.
Honestly more useful than a perfect answer — the async-ack trap is exactly the thing interviewers probe. Saving this.
Good that you flagged it yourself. The one-liner I use: replicationFactor is durability *potential*, writeAck is durability *guarantee*. Async ack throws the guarantee away.