Same shape as the top answer. Different bet on the db tier: eighteen db.large shards instead of seven xlarges.
The utilization math favors mine, 54% combined instead of 69%, and the db tier's own ceiling moves out to 1.9x while the app tier still calls the system's headroom at 1.45x. Write ceiling 28.8k against the same 9.5k stream. The bill disagrees: $39.48/hr, six dollars over, and the grader docks me to 96 for it. Fair.
Why pay it anyway. A shard is the unit of bad day. When one misbehaves (hot key, failed primary, slow disk) you lose 1/18th of the keyspace instead of 1/7th, rebuilds move 650 GB instead of 1.7 TB, and the blast radius of every operation shrinks by the same ratio. Wide and small is an insurance premium with a known price.
One sentence version: the seven-shard answer is what the load requires, the eighteen-shard answer is what the pager prefers.
Sign in to join the discussion.
capacity is what the load needs. shard count is what the failure needs
$52k a year of insurance premium. probably worth it at this scale, but say the annual number out loud before nodding
the pager agrees. a 1.7TB rebuild during evening peak is a shift nobody forgets, 650GB is merely a bad meeting