Free Guide by Unnati Tripathi · @aiwithunnati · June 2026

The AI Model
Decision
Cheat Sheet

Stop overpaying. Most teams send every request to a flagship model. Here's exactly which model to use for every task — and how to cut your API bill by 60–80%.

Based on real production data · Prices current as of June 2026
The Full Breakdown

Model Decision
Table

Use Case Model When to Use
Prototyping / Vibe Coding Gemini Flash$0.30/1M in Low stakes, high iteration. Don't burn flagship budget while you're figuring out what to build.
Production Coding & Agents Claude Sonnet 4.6$3/$15 per 1M Best tool use and agent reliability in production. 79.6% SWE-bench Verified — within 1.2pts of Opus at 40% less cost.
Hard Reasoning & Research Claude Opus 4.7$5/$25 per 1M Complex analysis, deep research, long documents. Reserve for tasks where quality directly impacts outcome.
High Volume / Routing Layer Haiku 4.5 or Gemini Flash$1/$5 per 1M (Haiku) Classification, routing, simple extraction. 90% of your requests belong here. ~5x cheaper than Opus.
Large Codebase / Long Context Gemini 3.1 Pro$2/$12 per 1M 2M token context window. Load an entire codebase in one prompt. Nothing else at this price comes close.
Broad Generalist Tasks GPT-5.5$5/$30 per 1M Multi-step knowledge work, writing, research, data analysis. Most versatile model when you need one that does everything.
Image Generation FluxVaries Diffusion model — completely different architecture from LLMs. Best-in-class for image generation.
Video Generation Veo 3.1 or Kling 3.0~$0.15/sec (Veo fast) Sora shut down April 2026. Veo 3.1 leads on quality + native audio. Kling 3.0 is cheapest premium option at ~$0.10/sec.
Fine-tuning / Privacy / Self-host Gemma 4Free (open source) Full control, air-gapped environments, custom fine-tuning. No data leaves your infra.
The Big Mistake

The Routing
Rule

Most teams make one mistake: they send every request to a flagship model. That kills your budget fast. Here's what engineers actually do in production:

Result: 60–80% cost reduction. Users notice nothing.
Claude Pricing Tiers

Cost Breakdown

Fast / Cheap
Haiku 4.5
$1 in
$5 out
per 1M tokens
Production
Sonnet 4.6
$3 in
$15 out
per 1M tokens
Frontier
Opus 4.7
$5 in
$25 out
per 1M tokens
Pro Tip

Stack prompt caching (up to 90% savings) and batch processing (50% off) on top of routing. On a high-volume workload this can cut your bill by up to 95%.

When Proprietary Isn't the Answer

When to Use
Open Source

Most engineers default to proprietary APIs. But there are specific situations where open source wins every time.

The Open Source Stack 2026

Your Options

Fine-tuning
Gemma 4

Google's open model. Strong coding performance. Runs locally. Free. Best for custom fine-tuning and self-hosted environments.

General Purpose
Llama 4

Meta's flagship open model. Large community, lots of tooling. Great for self-hosted agents and general production use.

Cost-sensitive
Mistral Small 4

~40% of GPT-5.4 cost, comparable performance on most tasks. Best for cost-sensitive production workloads.

Budget Coding
DeepSeek V4

$0.28 per 1M input tokens. Tier A coding performance. Best value for coding at scale when cost is the primary constraint.

The Tradeoff

Open source requires you to manage infra, handle scaling, and maintain the model yourself. Real engineering overhead. Only go this route if the use case genuinely requires it.

"The model is not the product.
The system around it is."
A cheap model in a great system beats a frontier model in a bad one.
Optimize your architecture before you optimize your model choice.
Models change every few weeks in 2026

Stay Current

Bookmark these to keep your model choices up to date:

Want More?

Want the Full
AI Guide?

I've mapped it all out. Follow @aiwithunnati on Instagram for weekly AI content, or comment "AI" to get the full model decision framework.

Follow @aiwithunnati → Back to Top