Stop overpaying. Most teams send every request to a flagship model. Here's exactly which model to use for every task — and how to cut your API bill by 60–80%.
| Use Case | Model | When to Use |
|---|---|---|
| Prototyping / Vibe Coding | Gemini Flash$0.30/1M in | Low stakes, high iteration. Don't burn flagship budget while you're figuring out what to build. |
| Production Coding & Agents | Claude Sonnet 4.6$3/$15 per 1M | Best tool use and agent reliability in production. 79.6% SWE-bench Verified — within 1.2pts of Opus at 40% less cost. |
| Hard Reasoning & Research | Claude Opus 4.7$5/$25 per 1M | Complex analysis, deep research, long documents. Reserve for tasks where quality directly impacts outcome. |
| High Volume / Routing Layer | Haiku 4.5 or Gemini Flash$1/$5 per 1M (Haiku) | Classification, routing, simple extraction. 90% of your requests belong here. ~5x cheaper than Opus. |
| Large Codebase / Long Context | Gemini 3.1 Pro$2/$12 per 1M | 2M token context window. Load an entire codebase in one prompt. Nothing else at this price comes close. |
| Broad Generalist Tasks | GPT-5.5$5/$30 per 1M | Multi-step knowledge work, writing, research, data analysis. Most versatile model when you need one that does everything. |
| Image Generation | FluxVaries | Diffusion model — completely different architecture from LLMs. Best-in-class for image generation. |
| Video Generation | Veo 3.1 or Kling 3.0~$0.15/sec (Veo fast) | Sora shut down April 2026. Veo 3.1 leads on quality + native audio. Kling 3.0 is cheapest premium option at ~$0.10/sec. |
| Fine-tuning / Privacy / Self-host | Gemma 4Free (open source) | Full control, air-gapped environments, custom fine-tuning. No data leaves your infra. |
Most teams make one mistake: they send every request to a flagship model. That kills your budget fast. Here's what engineers actually do in production:
Stack prompt caching (up to 90% savings) and batch processing (50% off) on top of routing. On a high-volume workload this can cut your bill by up to 95%.
Most engineers default to proprietary APIs. But there are specific situations where open source wins every time.
Google's open model. Strong coding performance. Runs locally. Free. Best for custom fine-tuning and self-hosted environments.
Meta's flagship open model. Large community, lots of tooling. Great for self-hosted agents and general production use.
~40% of GPT-5.4 cost, comparable performance on most tasks. Best for cost-sensitive production workloads.
$0.28 per 1M input tokens. Tier A coding performance. Best value for coding at scale when cost is the primary constraint.
Open source requires you to manage infra, handle scaling, and maintain the model yourself. Real engineering overhead. Only go this route if the use case genuinely requires it.
"The model is not the product.A cheap model in a great system beats a frontier model in a bad one.
The system around it is."
Bookmark these to keep your model choices up to date:
I've mapped it all out. Follow @aiwithunnati on Instagram for weekly AI content, or comment "AI" to get the full model decision framework.
Follow @aiwithunnati → Back to Top