Async acks to protect PUT p99 (and why it's wrong)

Posting this one as a deliberate "what if", because I keep seeing people reach for it.

The PUT budget is 200ms. Quorum acks across 3 replicas spanning AZs eat into that. So the tempting move: buffer writes in a queue and ack immediately, drain to the shards in the background. p99 drops, everyone's happy. That's the queue in the diagram, and the db is set to async ack to match.

Except it violates the durability contract. RPO 0 means no acked write is ever lost. With a write buffered in a queue (or a primary that acks before replicating), a crash between ack and durable-commit loses a write the client believes is committed. RF 3 doesn't save you — replication factor is about copies, ack policy is about when you promise.

So: keep RF3 but this should be quorum, not async, and the write path should be synchronous. I left it wrong here to make the failure mode concrete. Don't ship it.

5 Comments

CAP Theorem@cap_theoremJul 1, 2026

Good that you flagged it yourself. The one-liner I use: replicationFactor is durability *potential*, writeAck is durability *guarantee*. The queue throws the guarantee away.

Tail Latency@tail_latencyJul 1, 2026

The 'replicationFactor is potential, writeAck is the guarantee' framing should be on a poster.

Lena Fischer@lena_fischerJul 1, 2026

Posting the anti-pattern with its failure mode spelled out is more useful than a perfect answer. RPO 0 ≠ async ack.

Diego Ramirez@diego_ramirezJun 30, 2026

Honestly more useful than a perfect answer — the async-ack trap is exactly the thing interviewers probe. Saving this.

CAP Theorem@cap_theoremJun 30, 2026

Good that you flagged it yourself. The one-liner I use: replicationFactor is durability *potential*, writeAck is durability *guarantee*. Async ack throws the guarantee away.