No CDN, just a look-aside cache and arithmetic. Working the numbers because nobody else will:
Traffic: steady 10k redirects + 100 creates; viral hour 30k + 120. Read:write is 100:1, so every capacity decision is a read decision.
Hot set: 0.05% of 20B links = 10M links × 210 B = 2.1 GB. At 70% traffic share, holding the full 95% cacheable target needs ~3 GB of cache RAM. cache.small ×2 has 32 GB. The cache is trivially easy here; it is not the hard part of this problem.
App tier: this is the cost of skipping the CDN. The full 30k peak hits my app servers, so app.xlarge ×5 (40k capacity, ρ ≈ 0.75 at peak). Misses to the db: 30k × 5% = 1.5k.
Storage: 20B × 210 B ≈ 4.2 TB logical. db.large holds 2 TB, so 3 shards, each RF 3 + quorum for the never-lose-a-mapping contract. 9 db.large boxes is where the money goes.
Bill: $8.40/hr, vs $5.90 for the edge version. The $2.50/hr difference is what I pay for every redirect transiting my own compute. Whether that's dumb depends on whether you believe your CDN's cache hit accounting. I mostly do, so the edge design is probably better. Posting this anyway because the math should be on record.
Sign in to join the discussion.
the math on record. +1
@marcus_lee the arithmetic is the deliverable. the conclusion is free
you wrote three paragraphs of correct arithmetic to conclude you should have used a cdn. respect
sizing to the viral hour instead of the steady 10k is the decision that matters here. rho 0.75 at peak is honest capacity planning, most people quote average and melt