AI API Pricing Comparison
Every major model's API cost in one table. Prices per million tokens, updated monthly.
Gemini 2.0 Flash is the cheapest quality option at $0.10/M input. For flagship models, GPT-4o offers the best price-to-performance ratio. Claude Opus 4 is the most expensive but leads in long-document analysis and coding tasks.
All Models — March 2026
| Model | Provider | Input $/M | Output $/M | Context | Best For |
|---|---|---|---|---|---|
| GPT-4o | OpenAI | $2.50 | $10.00 | 128K | Multimodal, general |
| Claude Opus 4 | Anthropic | $15.00 | $75.00 | 200K | Long docs, analysis |
| Claude Sonnet 4 | Anthropic | $3.00 | $15.00 | 200K | Balanced performance |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M | Massive context | |
| GPT-o3 | OpenAI | $10.00 | $40.00 | 200K | Complex reasoning |
| GPT-o3-mini | OpenAI | $1.10 | $4.40 | 200K | Budget reasoning |
| Grok 3 | xAI | $3.00 | $15.00 | 131K | Real-time data |
| Llama 4 Maverick | Meta | $0.20 | $0.60 | 1M | Cost efficiency |
| Mistral Large 2 | Mistral | $2.00 | $6.00 | 128K | EU compliance |
| Qwen 3 235B | Alibaba | $0.80 | $3.20 | 128K | Multilingual |
| Command R+ | Cohere | $2.50 | $10.00 | 128K | RAG / Enterprise |
| Claude Haiku 4.5 | Anthropic | $0.80 | $4.00 | 200K | Fast & cheap |
| GPT-4o-mini | OpenAI | $0.15 | $0.60 | 128K | Budget tasks |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M | Cheapest quality |
Key Takeaways
- Cheapest flagship: GPT-4o at $2.50/$10.00 per M tokens
- Best budget: Gemini 2.0 Flash at $0.10/$0.40 — 25x cheaper than GPT-4o
- Largest context: Gemini and Llama 4 at 1M tokens (2,500 pages)
- Open source winner: Llama 4 Maverick at $0.20/$0.60 via API providers
- Most expensive: Claude Opus 4 at $15/$75 — justified for complex analysis
Frequently Asked Questions
What are tokens? ▼
Tokens are chunks of text — roughly 3/4 of a word. "Hello world" is 2 tokens. Pricing is per million tokens (M), so $1/M input means processing 750,000 words costs about $1.
Why is output more expensive than input? ▼
Generating text (output) requires more computation than reading it (input). The model has to make decisions for each token it produces, which is more GPU-intensive than encoding input.
Which model offers the best value? ▼
For most use cases, Gemini 2.0 Flash offers the best quality-per-dollar ratio. For tasks requiring top-tier reasoning, Claude Sonnet 4 offers the best balance of capability and cost.