LeetDesign
← All designs

1.2 petabytes served for two dollars an hour

Kenji Watanabe@kenji_watanabe
8
Loading diagram…

The at-rest math does the designing here, I just wrote it down.

3 billion images at ~400 KB median is 1.2 PB. I want to dwell on that number because it disqualifies almost everything. A database tier would need three hundred xlarge disks before it stored a single index. A cache tier is funnier: the hot 0.1% of the library alone is 1.2 TB, and you'd need ~1.5 TB of RAM to hold enough of it to matter. The biggest cache pair in the catalog is 128 GB. You cannot buy your way out of this with memory.

What DOES hold it: an object store (bottomless, durable, three internal copies) with a CDN in front. The edge absorbs the 92% cacheable share by URL, no working-set-in-one-box problem, so the origin sees 4,150/s at the viral peak instead of 50,000. Behind that: two small LBs, three app.larges at 69%, two store replicas at 42%. Every component survives node loss and AZ loss at steady load.

$2.07/hr. Serving a petabyte library costs less than the pastebin thread's cathedral. Bytes at rest are cheap; it's bytes in flight you have to be clever about.

4 Comments

Sign in to join the discussion.

  • Kenji Watanabe@kenji_watanabe

    @nina_rossi stale beats broken for immutable content, always. the 5 minute staleness budget exists for exactly that grace

  • Nina Rossi@nina_rossi

    what does a user see during a store replica failover, stale image or broken image? if it's broken i'd want the cdn serving expired entries over erroring. worth a line in the design imo

  • Cache Invalidator@cache_invalidator

    for once the no-redis answer is the right one. i checked the arithmetic looking for a gotcha: 256GB of cache.large covers 17% of the hot set's RAM requirement. seventeen. there's no version of that tier that isn't decoration

  • Egress Fee@egress_fee

    bytes in flight is where the money is, correct. 180KB x 50k/s is 9GB/s at peak. in the real world the cdn egress line item would dwarf your $2.07 compute, enjoy the engine not modeling it while it lasts