Multi-Agent Architecture: Splitting Blindly Will Backfire
Not all multi-agent patterns are equal. Learn when subagents, skills, handoffs, and routers actually outperform a single agent - with real scenarios and numbers.
“Will splitting my agent into multiple agents make it smarter?”
The answer is “it depends.” Anthropic’s research shows multi-agent systems outperforming single agents by 90% - but only when the right architecture is chosen. In practice, performance gaps vary dramatically based on the type of task you’re solving.
Here are three representative scenarios that reveal when each pattern actually delivers.
The Four Patterns at a Glance
Before diving in, a quick summary of the architectures:
- Subagents: A main agent invokes specialized agents as tools. Strong at parallel execution, but every result must route back through the main agent
- Skills: A single agent dynamically loads expert prompts on demand. Lightweight, but context accumulates over time
- Handoffs: The active agent is swapped out at each stage. Purpose-built for sequential workflows, but cannot run tasks in parallel
- Router: Classifies queries, dispatches them in parallel, and aggregates results. Stateless - no conversational context is retained
Now let’s see how these play out in real scenarios.
Scenario 1: One-Shot Requests
Imagine a user says “Buy me a coffee” - a single request where a specialized agent can call a buy_coffee tool.
Performance comparison:
- Subagents: 4 calls (main → sub → tool execution → return to main → response)
- Skills / Handoffs / Router: 3 calls (direct execution)
Key insight: One-shot tasks don’t need state management, so Skills, Handoffs, and Router are the most efficient. Subagents add an extra round-trip back through the main agent, and that translates directly to latency. For simple tasks, there’s no reason to reach for a multi-agent architecture.
In practice: FAQ bots, simple command execution, and one-off data lookups work perfectly fine with a single agent. Multi-agent here is over-engineering.
Scenario 2: Repeated Requests
Now the user says “Buy me another coffee” - the same request made twice. Conversational context carries over.
Performance comparison (second turn):
- Subagents: 4 calls → 8 total (stateless, full cycle every time)
- Skills / Handoffs: 2 calls → 5 total (40% reduction)
- Router: 3 calls → 6 total (25% reduction)
Key insight: Stateful patterns like Skills and Handoffs dominate here. They reuse previously loaded context and skip the routing and initialization steps entirely. Subagents, being stateless by design, repeat the full cycle every time. The trade-off is that subagents provide better context isolation - worth the overhead if security or sandboxing is a priority.
In practice: Chatbots, conversational assistants, and session-based services need stateful patterns. If users frequently say things like “do it the same way as before,” prioritize Skills or Handoffs. A Router can be wrapped inside a stateful agent as a tool if needed.
Scenario 3: Multi-Domain Queries
A user asks “Compare Python vs JavaScript vs Rust” - a query that spans multiple specialized domains. Assume roughly 2K tokens of reference documentation per language.
Performance comparison:
- Subagents: 5 calls, ~9K tokens (each sub works in isolated context)
- Skills: 3 calls, ~15K tokens (all three skill contexts accumulate in main)
- Handoffs: 7+ calls, ~14K+ tokens (sequential only)
- Router: 5 calls, ~9K tokens (parallel execution)
Key insight: Patterns with parallel execution - Subagents and Router - win decisively. Subagents process each language’s documentation in isolated contexts, using 67% fewer tokens than Skills (9K vs 15K). Skills make fewer calls but stack all three domains’ knowledge into the main context, causing token costs to spike. Handoffs, limited to sequential execution, are the worst fit for this type of work.
In practice: Research systems, multi-source comparative analysis, and enterprise knowledge bases that query multiple independent domains simultaneously call for Subagents or Router. When handling large domain-specific knowledge, context isolation directly impacts token costs.
Pattern Selection Guide
| Scenario | Recommended Pattern |
|---|---|
| One-shot tasks | Single agent is enough |
| Frequent repeated requests | Skills or Handoffs |
| Multiple domains queried simultaneously | Subagents or Router |
| Sequential workflows | Handoffs |
One More Practical Tip
Don’t start with multi-agent. Begin with a single agent backed by good prompts and well-defined tools. Only reach for multi-agent patterns when you hit a clear limitation - then use the scenarios above to pick the right one.
The lesson from Anthropic’s research isn’t “more agents = better.” It’s that the right architecture for the right task is what delivers that 90% improvement.
Join the newsletter
Get updates on my latest projects, articles, and experiments with AI and web development.