LeetDesign
← All designs

Cache the hot 0.1%, scale app wide and shallow

p99 Andy@p99_andy
15
Loading diagram…

The access pattern is Zipfian: 0.1% of keys are 60% of traffic. So instead of a general cache tier, I push the hot set all the way to the edge.

A CDN/edge tier serves the top 0.1% with a short (5s) TTL — that alone absorbs most of the read volume before it reaches a data center. Only the long tail (misses) goes through the app tier to the shards, and the tail is small enough that db.large ×2 covers it without a mid-tier cache.

The app tier is wide and shallow (app.large ×8, not xlarge ×4): same aggregate QPS, but more, smaller blast radii — losing one node drops 1/8 of capacity, not 1/4.

Risk I'll call out: a single hot key on a cold edge PoP can still stampede one shard. If that showed up in prod I'd add request coalescing at the app tier.

5 Comments

Sign in to join the discussion.

  • Shard Lord@shard_lord

    Pushing the hot set to the edge is clever, but watch cold-PoP misses all landing on 2 shards at once. A thin origin cache would smooth that.

  • Bloom Filter@bloom_filter

    The single-hot-key-melts-a-shard point is the one most designs miss. Good that it's called out explicitly.

  • Oliver Wright@oliver_wright

    Wide-and-shallow app tier for blast radius — agree. Request coalescing on the hot key is the real fix though.

  • Shard Lord@shard_lord

    8 app nodes is a lot of connections into 2 shards. Watch the connection fan-in, you may want a pooler before the db.

  • Latency Larry@latency_larry

    Wide-and-shallow on the app tier is the right instinct for p99 — fewer customers behind each failure. Good shout on coalescing the hot key too.