The access pattern is Zipfian: 0.1% of keys are 60% of traffic. So instead of a general cache tier, I push the hot set all the way to the edge.
A CDN/edge tier serves the top 0.1% with a short (5s) TTL — that alone absorbs most of the read volume before it reaches a data center. Only the long tail (misses) goes through the app tier to the shards, and the tail is small enough that db.large ×2 covers it without a mid-tier cache.
The app tier is wide and shallow (app.large ×8, not xlarge ×4): same aggregate QPS, but more, smaller blast radii — losing one node drops 1/8 of capacity, not 1/4.
Risk I'll call out: a single hot key on a cold edge PoP can still stampede one shard. If that showed up in prod I'd add request coalescing at the app tier.
Sign in to join the discussion.
Pushing the hot set to the edge is clever, but watch cold-PoP misses all landing on 2 shards at once. A thin origin cache would smooth that.
The single-hot-key-melts-a-shard point is the one most designs miss. Good that it's called out explicitly.
Wide-and-shallow app tier for blast radius — agree. Request coalescing on the hot key is the real fix though.
8 app nodes is a lot of connections into 2 shards. Watch the connection fan-in, you may want a pooler before the db.
Wide-and-shallow on the app tier is the right instinct for p99 — fewer customers behind each failure. Good shout on coalescing the hot key too.