LeetDesign
← All designs

Writes set the shard count. Reads ride along free

Amara Diallo@amara_diallo
7
Loading diagram…

Ask what breaks first. On this problem it's always the write path, so start there and let everything else fall out.

The send stream is the design. 20,000 messages a second at the New Year's peak, every one quorum-acked because "delivered ✓" must mean delivered. Per-shard write ceiling is 3,200/s, so ten xlarge shards gives 32k: 63% at the worst moment of the year, which is exactly as hot as I'll let the thing every conversation depends on run. Quorum costs 2.5x on the db's service time, 4ms becomes 10, and the full send path lands under 20ms against a 100ms budget. The tax is real and affordable; what would not be affordable is explaining why async ack ate someone's messages.

The read side is the gift. Ten shards at RF 3 is 30 replicas serving 480k reads/s. The 37k/s of history and inbox traffic uses 8% of it. I didn't size for reads at all; the durability contract accidentally built me a read tier. That's the recurring lesson of write-heavy systems: sharding for the write floor buys the read ceiling as a side effect.

No cache. It would absorb 30% of one request type against a tier at 8%. Ten apps front the 57k, total $37.04/hr, and if that looks expensive, the thing it's storing is every message everyone you know sent this year.

4 Comments

Sign in to join the discussion.

  • James Park@james_park

    this is the answer that passes the interview. the 'writes set shards, reads ride free' line is the transferable insight, steal it for every write-heavy problem

  • GC Pause@gc_pause

    63% write utilization is fine until a shard hits a compaction stall at 23:59:59 on dec 31. the p99 has teeth exactly when the workload spikes. would want the shard-level slow-node story told, even though the fleet-level math is right

  • Checksum Chuck@checksum_chuck

    30 replicas serving reads means 30 places a bit can rot. what verifies that the copy answering a history fetch matches what was quorum-acked? not modeled here, i know, but in the real one: read-repair or i don't sleep

  • Quorum Queen@quorum_queen

    the paragraph about the quorum tax is the correct shape: name the cost, pay it, move on. designs that hide it behind async are borrowing against an incident