Short answer (June 2026): Gemini 3 Pro is the cheapest frontier model at $2 input / $12 output per million tokens, undercutting both Claude and OpenAI. Claude Opus 4.8 ($5 / $25) is cheaper on output than GPT-5.5 ($5 / $30). If you need rock-bottom cost, Gemini 3 Flash ($0.50 / $3) is the budget champion. Caching changes the math dramatically — read on.
- Cheapest frontier model: Gemini 3 Pro
- Cheapest overall: Gemini 3 Flash
- Best value premium output: Claude Opus 4.8
At a glance (price per 1M tokens)
| Model | Input | Output | Cached input | Notes |
|---|---|---|---|---|
| Gemini 3 Pro | $2.00 | $12.00 | — | Cheapest frontier tier |
| Gemini 3 Flash | $0.50 | $3.00 | — | Budget champion |
| Claude Opus 4.8 | $5.00 | $25.00 | ~$0.50 | Cache reads ~90% off |
| GPT-5.5 | $5.00 | $30.00 | $0.50 | Cached input tier |
Prices as of June 2026. Always confirm on the provider's pricing page before budgeting.
How we compared
API cost is driven by three things: input price, output price, and how much you can reclaim with prompt caching. Output tokens are usually the bigger line item because models generate more than they read in chat and agent workloads. We list all three so you can model your real traffic.
Gemini 3 Pro & Flash
Google prices aggressively. Gemini 3 Pro at $2 input / $12 output is the cheapest frontier-class model in 2026, and Gemini 3 Flash drops to $0.50 / $3 for high-volume, latency-sensitive, or background tasks.
For cost-sensitive production workloads, Gemini is the value leader. The trade-off is ecosystem and tooling maturity outside Google's stack, where Claude and OpenAI are more entrenched.
Claude Opus 4.8
Claude Opus 4.8 runs $5 input / $25 output per million tokens, with cache reads discounted roughly 90% (to about $0.50 per million input). For coding and long-form work where Claude leads on quality, its output price undercuts GPT-5.5.
Claude Opus 4.8 is the sweet spot when you want top-tier output quality without GPT-5.5's output premium. It's still markedly pricier than Gemini for raw throughput.
GPT-5.5
GPT-5.5 matches Claude on input ($5) but is the priciest on output at $30 per million tokens. Its cached input tier ($0.50 per million) softens the cost for repeated context, which matters for agents and RAG that resend large system prompts.
GPT-5.5 makes sense when you need its ecosystem and all-round strength — and you lean on caching to control output-heavy costs. On pure price, it's the most expensive of the three.
How caching changes everything
If your workload resends the same large context (system prompts, documents, codebases), prompt caching can cut input costs ~90%. Claude's cache reads drop to ~$0.50/M and GPT-5.5's cached input is $0.50/M. For agentic and RAG apps, cached-input pricing often matters more than the headline input rate — model your cache-hit ratio before picking a provider.
Which API should you choose on price?
- Lowest cost, high volume → Gemini 3 Flash.
- Cheapest frontier quality → Gemini 3 Pro.
- Top output quality, reasonable price → Claude Opus 4.8.
- Ecosystem + heavy caching → GPT-5.5.
- Mixed workload → route cheap/background tasks to Flash, quality tasks to Claude or GPT-5.5.
Frequently asked questions
Which LLM API is cheapest in 2026?
Gemini 3 Flash is the cheapest at $0.50 input / $3 output per million tokens. Among frontier models, Gemini 3 Pro ($2 / $12) is the cheapest.
Is Claude cheaper than GPT-5.5?
On output, yes — Claude Opus 4.8 is $25 per million output tokens versus GPT-5.5's $30. Input is the same at $5 per million.
How much can prompt caching save?
Roughly 90% on input for repeated context. Both Claude (~$0.50/M cache reads) and GPT-5.5 ($0.50/M cached input) offer steep discounts, which dominates cost in agent and RAG workloads.
Why is output more expensive than input?
Generating tokens is more compute-intensive than reading them, so providers price output 2–6× higher. Most chat and agent workloads are output-heavy, so output price usually drives your bill.
Conclusion
On price in 2026, Gemini wins outright, Claude beats GPT-5.5 on output, and caching can reshape the whole comparison for context-heavy apps. Model your real input/output split and cache-hit ratio before committing. What's your monthly token bill looking like? Share your numbers below.
Top comments (0)