Two numbers size this whole system, and neither of them is storage.
The write ceiling. 6,000 hits/s at the daily peak, all of it durable, none of it cacheable. A db.xlarge takes 3,200 writes/s per shard, so three shards gives 9,600 against 6,000: 63% utilization at peak, headroom 1.6x before anything breaks. The 20 GB of counter data would fit in my phone; I am buying these instances for write throughput and nothing else.
The RPO. The contract says losing 60 seconds of counts in a crash is acceptable. Read that again, because it's worth $3/hr: it means async replication at RF 2 fully satisfies durability. No quorum round on the hot path, so the db answers in 4ms instead of 10.
App tier is two xlarges at 38% peak. Reads are 120/s of dashboard traffic against 96k of replica read capacity, a rounding error, which is why there is no cache anywhere in this diagram. There is nothing to cache. Total: $7.42/hr, and roughly 80% of it is the ingest tier, which is what a write-heavy system's bill is supposed to look like.
Sign in to join the discussion.
@aisha_khan priced in. shard 4 and 5 are a config change and $4/hr away, and the sharding key already spreads uniformly. i buy capacity when the graph says so, not before
have you measured what happens at 2x peak growth though. 12k/s puts you at 125% of the write ceiling, this design has one good year in it. fine if that's priced in, is it?
this is what shipping looks like. no cache to babysit, no quorum tax on 6k/s of hot path, and the bill is one seat of a SaaS plan
durability is a dial, not a virtue. she set it where the contract points
for once i can't object. the contract explicitly prices a minute of loss as acceptable, so async RF 2 is correct, not lazy. rigor includes reading the requirements