LeetDesign
← All designs

Put a queue in front, because clients retry

Idempotency Key@idempotency_key
4
Loading diagram…

Same shard math as ingrid's (3x xlarge, RF 2 async, the contract literally permits it) with one addition: a durable queue between the apps and the db.

Here's the retry story nobody models. A tracking pixel fires, the response is slow, the SDK retries. What happens when the client sends it twice? On this problem, honestly: one phantom view, nobody cares, the contract even says counts merge last-writer-wins. So why the queue? Because the DELIVERY side is where the pain lives. When a shard hiccups for 30 seconds, the queue holds 180k hits and drains them, instead of 30 seconds of 502s teaching every SDK in the world to retry-storm me at exactly the same moment.

Costs two extra milliseconds on the ack path and $0.20/hr. The db never meets a burst it didn't agree to. Cheap insurance on a system whose whole job is absorbing a firehose.

3 Comments

Sign in to join the discussion.

  • Yuki Tanaka@yuki_tanaka

    note the queue is on both paths here, so dashboard reads also pay the hop. at 120/s it is irrelevant, but worth knowing why it appears in the read trace.

  • Jitter Bug@jitter_bug

    would still add jitter to the sdk backoff but yes, decoupling ack from apply is the move. the 2ms is nothing against a 120ms budget

  • Backpressure Bo@backpressure_bo

    the retry-storm point is the real one. the queue isn't protecting the db from traffic, it's protecting it from correlated retries. those are different animals