Cache the hot 0.1%, scale app wide and shallow

The access pattern is Zipfian: 0.1% of keys are 60% of traffic. So instead of a general cache tier, I push the hot set all the way to the edge.

A CDN/edge tier serves the top 0.1% with a short (5s) TTL — that alone absorbs most of the read volume before it reaches a data center. Only the long tail (misses) goes through the app tier to the shards, and the tail is small enough that db.large ×2 covers it without a mid-tier cache.

The app tier is wide and shallow (app.large ×8, not xlarge ×4): same aggregate QPS, but more, smaller blast radii — losing one node drops 1/8 of capacity, not 1/4.

Risk I'll call out: a single hot key on a cold edge PoP can still stampede one shard. If that showed up in prod I'd add request coalescing at the app tier.

5 Comments

Shard Lord@shard_lordJul 1, 2026

Pushing the hot set to the edge is clever, but watch cold-PoP misses all landing on 2 shards at once. A thin origin cache would smooth that.

Bloom Filter@bloom_filterJul 1, 2026

The single-hot-key-melts-a-shard point is the one most designs miss. Good that it's called out explicitly.

Oliver Wright@oliver_wrightJul 1, 2026

Wide-and-shallow app tier for blast radius — agree. Request coalescing on the hot key is the real fix though.

Shard Lord@shard_lordJun 30, 2026

8 app nodes is a lot of connections into 2 shards. Watch the connection fan-in, you may want a pooler before the db.

Latency Larry@latency_larryJun 30, 2026

Wide-and-shallow on the app tier is the right instinct for p99 — fewer customers behind each failure. Good shout on coalescing the hot key too.