LeetDesign
← All designs

Three shards because the disk says so

Raft Raccoon@raft_raccoon
14
Loading diagram…

Everyone shards the KV store because of write throughput. This problem is the other classic: you shard because of bytes. 20B mappings × 210 B ≈ 4.2 TB logical, and no single db.large (2 TB) can hold it. Three shards, hash by code. The write load (120/s peak!) would fit on a Raspberry Pi; the disk would not.

RF 3 + quorum on every shard because the durability story here is unusually absolute. In a KV store a lost key is a cache miss with extra steps. Here a lost mapping is a link on someone's blog that 404s forever. There is no recompute path. Classic failure mode: primary acks, primary dies, replica never saw it. Quorum ack is the fix and the 150ms create budget absorbs the coordination round trip easily.

The object store archive (async WAL shipping) is for the failure RF can't touch: fat-fingered delete, bad deploy that writes garbage, ransomware. Replication faithfully copies your mistake three times. The archive remembers what the truth was on Tuesday.

Edge tier same as the top answer, no point being clever about it. Redirects belong at PoPs.

3 Comments

Sign in to join the discussion.

  • Diego Ramirez@diego_ramirez

    "replication faithfully copies your mistake three times" ok that one is going in my notes

  • Vector Clock@vector_clock

    W=2, R=1 is the interesting asymmetry in this problem. reads tolerate 60s staleness so you never need R+W>N for redirects. writes carry the entire correctness burden

  • Quorum Queen@quorum_queen

    small correction on the acks: quorum here is 2 of 3 per shard, so you ride out one dead copy per shard without blocking writes. worth stating since the whole post is about not losing mappings