Free Guide by Unnati Tripathi · @aiwithunnati · June 2026

The AI Model
Decision
Cheat Sheet

Stop overpaying. Most teams send every request to a flagship model. Here's exactly which model to use for every task — and how to cut your API bill by 60–80%.

Based on real production data · Prices current as of June 2026

The Full Breakdown

Model Decision
Table

Use Case	Model	When to Use
Prototyping / Vibe Coding	Gemini Flash$0.30/1M in	Low stakes, high iteration. Don't burn flagship budget while you're figuring out what to build.
Production Coding & Agents	Claude Sonnet 4.6$3/$15 per 1M	Best tool use and agent reliability in production. 79.6% SWE-bench Verified — within 1.2pts of Opus at 40% less cost.
Hard Reasoning & Research	Claude Opus 4.7$5/$25 per 1M	Complex analysis, deep research, long documents. Reserve for tasks where quality directly impacts outcome.
High Volume / Routing Layer	Haiku 4.5 or Gemini Flash$1/$5 per 1M (Haiku)	Classification, routing, simple extraction. 90% of your requests belong here. ~5x cheaper than Opus.
Large Codebase / Long Context	Gemini 3.1 Pro$2/$12 per 1M	2M token context window. Load an entire codebase in one prompt. Nothing else at this price comes close.
Broad Generalist Tasks	GPT-5.5$5/$30 per 1M	Multi-step knowledge work, writing, research, data analysis. Most versatile model when you need one that does everything.
Image Generation	FluxVaries	Diffusion model — completely different architecture from LLMs. Best-in-class for image generation.
Video Generation	Veo 3.1 or Kling 3.0~$0.15/sec (Veo fast)	Sora shut down April 2026. Veo 3.1 leads on quality + native audio. Kling 3.0 is cheapest premium option at ~$0.10/sec.
Fine-tuning / Privacy / Self-host	Gemma 4Free (open source)	Full control, air-gapped environments, custom fine-tuning. No data leaves your infra.

The Big Mistake

The Routing
Rule

Most teams make one mistake: they send every request to a flagship model. That kills your budget fast. Here's what engineers actually do in production:

1

Put Haiku 4.5 or Gemini Flash in front as a cheap classifier
2

Route 90% of requests there — simple, fast, cheap
3

Escalate the hard 10% to Sonnet or Opus

Result: 60–80% cost reduction. Users notice nothing.

Claude Pricing Tiers

Cost Breakdown

Fast / Cheap

Haiku 4.5

$1 in
$5 out
per 1M tokens

Production

Sonnet 4.6

$3 in
$15 out
per 1M tokens

Frontier

Opus 4.7

$5 in
$25 out
per 1M tokens

Pro Tip

Stack prompt caching (up to 90% savings) and batch processing (50% off) on top of routing. On a high-volume workload this can cut your bill by up to 95%.

When Proprietary Isn't the Answer

When to Use
Open Source

Most engineers default to proprietary APIs. But there are specific situations where open source wins every time.

Privacy is non-negotiable — your data cannot leave your infrastructure. Healthcare, legal, finance, government.
You need to fine-tune — proprietary models don't let you train on your own data. Open source does.
Massive scale — self-hosting eliminates per-token costs entirely. At high enough volume, infra cost beats API pricing.
Air-gapped environments — no internet access. Open source is your only option.
Full control — model behavior, versioning, deployment, everything.

The Open Source Stack 2026

Your Options

Fine-tuning

Gemma 4

Google's open model. Strong coding performance. Runs locally. Free. Best for custom fine-tuning and self-hosted environments.

General Purpose

Llama 4

Meta's flagship open model. Large community, lots of tooling. Great for self-hosted agents and general production use.

Cost-sensitive

Mistral Small 4

~40% of GPT-5.4 cost, comparable performance on most tasks. Best for cost-sensitive production workloads.

Budget Coding

DeepSeek V4

$0.28 per 1M input tokens. Tier A coding performance. Best value for coding at scale when cost is the primary constraint.

The Tradeoff

Open source requires you to manage infra, handle scaling, and maintain the model yourself. Real engineering overhead. Only go this route if the use case genuinely requires it.

The AI Model
Decision
Cheat Sheet

Model Decision
Table

The Routing
Rule

Cost Breakdown

When to Use
Open Source

Your Options

Stay Current

Want the Full
AI Guide?

The AI ModelDecisionCheat Sheet

Model DecisionTable

The RoutingRule

Cost Breakdown

When to UseOpen Source

Your Options

Stay Current

Want the FullAI Guide?

The AI Model
Decision
Cheat Sheet

Model Decision
Table

The Routing
Rule

When to Use
Open Source

Want the Full
AI Guide?