Same workflow. Same Claude. Just the right settings, cache habits, and a few small swaps that most people never make.
Root Cause 1
If your prompt cache breaks mid-session, you pay full price every turn instead of 0.1×.
⚠ What breaks the cache
✓ What to do
Root Cause 2
The longer your session runs, the more Claude has to remember — and that costs tokens. Opus 4.6 defaults to a 1M context window which is overkill.
Settings to add
CLAUDE_CODE_DISABLE_1M_CONTEXT = 1
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE = 80
5 habits to keep context clean
Subagent model guide
Haiku
boring mechanical work
Sonnet
research, code exploration
Opus
planning, complex decisions only
Root Cause 3
Most people use Opus-level reasoning on tasks that only need Haiku. Default reasoning uses ~2× more tokens than medium.
Effort levels to set per prompt
Model routing strategy
Root Cause 4
Some file types cost way more tokens than necessary by default.
Don't use
Screenshots for web pages
Use instead
agent-browser
⚡ ~90% fewer tokens
Don't use
Claude's built-in PDF reader
Use instead
pdftotext
⚡ Avoids loading PDFs as images
Don't use
Re-reading whole repo each task
Use instead
code-review-graph
⚡ 6.8× fewer tokens on reviews, up to 49× on daily tasks
Root Cause 5
You can't fix what you can't see. Three tools to track it.
✦ The One-Line Summary
Lock your tools and model before you start. Compact early and often. Use cheaper models for simple tasks. Use the right input format. Track your usage. That's it.