A single day of LLM usage billed more than a full month of servers after a retry storm hit production traffic. The incident surfaced on Hacker News with 14 points and 15 comments.
What Triggered the Retry Storm
An upstream provider returned intermittent 5xx errors. Client code retried every failed request with no backoff cap or circuit breaker. Each original prompt generated 8–12 additional calls within the same minute.
The loop continued until the provider stabilized, creating a 9x traffic spike that lasted 14 hours.
Cost Impact Numbers
The account recorded $4,180 in a single day versus a prior monthly average of $3,200. Token volume rose from 42 million to 381 million. The provider applied standard per-token pricing with no volume discount during the spike.
HN commenters noted similar ratios on other accounts: one reported a 7.4x daily multiplier after a 90-minute outage window.
How Retry Storms Form in LLM APIs
LLM clients typically wrap calls in simple retry loops. When error rates exceed 2%, exponential backoff without jitter or a maximum retry count turns small failures into sustained load.
The source thread identified three common triggers: missing max_retries parameters, shared connection pools across workers, and lack of per-key rate limits on the client side.
Mitigation Techniques
Teams can cap retries at 2–3 attempts and add jittered backoff capped at 30 seconds. Circuit breakers that open after 5 consecutive failures stop further calls for 60 seconds.
Logging request IDs alongside retry counts lets teams replay only the original failed prompts instead of the full storm.
Comparison to Standard Billing Issues
| Issue Type | Typical Multiplier | Duration | Detectable in Logs |
|---|---|---|---|
| Retry storm | 7–12x | Hours | High retry counts |
| Prompt bloat | 1.5–3x | Weeks | Token growth |
| Model switch | 2–4x | Permanent | Model name change |
| Rate limit abuse | 3–5x | Minutes | 429 responses |
Retry storms stand out because they produce sudden, reversible spikes rather than gradual drift.
Who This Affects Most
Production services calling paid LLM endpoints without observability layers see the largest impact. Teams running 10+ parallel workers or using default SDK retry settings are most exposed. Projects with fixed monthly budgets or server-only cost models should audit client retry logic first.
Startups still on pay-as-you-go plans can absorb one incident; larger deployments need automated spend alerts tied to retry metrics.
Bottom Line / Verdict
Retry storms convert routine provider hiccups into outsized token bills. Adding explicit retry caps and circuit breakers reduces the risk to near zero while preserving reliability.
The pattern repeats across providers whenever client code treats every error as immediately retryable.

Top comments (0)