Index
4 min read Year 2026

Claude Code Sub-Agents Save 25x Tokens in Your Main Session

Your AI isn't getting dumber. Your main session is overloaded. Sub-agents keep it lean and accurate for over an hour.

Quick take

Your AI isn't getting dumber. Your main session is overloaded. Sub-agents keep it lean and accurate for over an hour.

I keep hearing the same complaint: “Claude Code gets worse the longer I use it.”

The cause is almost always the same. Everything — file reads, searches, code exploration — is piled into a single main session. As tokens accumulate in the context window, the AI retains information at the beginning and end but starts missing what’s buried in the middle. If the session relies on simple message concatenation rather than compaction, the earliest content can be deleted entirely.

Sub-agents change the equation. By offloading work to independent agent processes, the tokens that land in your main session can drop to one twenty-fifth of what they’d otherwise be. Sessions that used to degrade after 30 minutes now hold up for over an hour at the same quality level.

After sharing this pattern with the team, the complaints disappeared.

What Lands in Your Main Session Determines Answer Quality

Read three files directly in the main session, and you dump 15,000+ tokens of raw source code into your context. Delegate the same work to three sub-agents, and each returns a 200-token summary. Total: 600 tokens in main.

The larger the context window grows, the better the AI handles the beginning and end — but the worse it handles information in the middle. Stanford researchers call this “Lost in the Middle”: retrieval accuracy for information placed in the middle of a long context drops by over 30%.

Keeping the main session lean eliminates this problem structurally. One teammate who used to see quality degrade after 30 minutes now runs sessions for over an hour without issues.

  • Inline exploration: 15,000+ tokens in main vs. agent summaries: 600 tokens
  • Sub-agents work in isolated context and return only the essentials
  • A shorter main context means fewer mid-context blind spots
  • 30-minute quality ceiling → 1+ hour sessions at the same quality

Starting with General Agents Wastes the Most Money

There are four built-in agent types.

Explore is read-only and runs on Haiku — fast and cheap. Plan and General inherit the main session model: if you’re on Sonnet, they use Sonnet; if Opus, Opus. Bash is for terminal commands only.

Here’s the trap: many people use General for tasks that only require reading — code exploration, structure analysis, pattern searches. Few realize that Explore produces nearly identical results for those tasks at a fraction of the cost.

  • Explore (Haiku-based) saves 80%+ compared to General
  • Use General only for implementation work; Explore handles the rest
  • Plan is for wide-scope reads like architecture analysis
  • Bash is for test runs and build isolation

One Prompt, Three Parallel Agents, Half the Onboarding Time

Independent tasks can run concurrently. A single prompt like “Investigate the auth system, the database schema, and the API routes separately” spins up three Explore agents at once.

A new team member I taught this pattern to finished understanding the codebase in half the time. The only rule: never run parallel agents that modify the same file — they’ll conflict.

  • Independent tasks → parallel; dependent tasks → sequential
  • Same-file edits in parallel = guaranteed conflicts
  • Add “in parallel” to your prompt and Claude splits automatically
  • Three concurrent summaries occupy roughly 600 tokens in main

Ctrl+B Lets You Start the Next Feature While Tests Run

Press Ctrl+B and the current agent moves to the background. Run your full test suite while you immediately start building the next feature. Without this, you’re just watching a progress bar.

Background agents can’t ask questions and can’t use MCP tools. They only have file read/write access — but that’s enough for test runs and code reviews.

  • Ctrl+B sends the current agent to background
  • Check results later: “What did the tests return?”
  • Background agents: no MCP tools, file read/write only
  • Run code review in background while you keep implementing

One Custom Agent File Gets Reused Across Five Tools

Create a single file at .claude/agents/reviewer.md. Add a YAML frontmatter with name, description, and model — Claude Code picks it up automatically and routes matching tasks to it.

This file format follows the agentskills.io standard, which means agents you build once work across Cursor, Copilot, Codex, and Gemini CLI without modification.

Run npx ai-agent-skills install code-review to download 47 pre-built, vetted agents instantly.

  • Drop a markdown file in .claude/agents/ → auto-detected
  • Set model: haiku for cheap reviews; use opus or sonnet for security audits and error-handling checks
  • Compatible with Claude Code, Cursor, Copilot, and Codex

The Real Problem Isn’t AI Getting Dumber

Your AI isn’t losing capability. Your main session is gaining too much context for it to see everything clearly. Sub-agents aren’t about using more AI — they’re about protecting the space where your AI thinks.

Join the newsletter

Get updates on my latest projects, articles, and experiments with AI and web development.