Index
5 min read Year 2026

4 Tool Design Principles Claude Code Learned After 3 Rebuilds

Anthropic's Claude Code team rebuilt their tools three times. Fewer tools made the AI perform better. Here are four hard-won design principles.

Quick take

Anthropic's Claude Code team rebuilt their tools three times. Fewer tools made the AI perform better. Here are four hard-won design principles.

Fewer tools made the AI perform better. When building an agent, the most natural instinct is “if it can’t do something, give it another tool.” The Anthropic team spent a year building Claude Code and discovered the opposite. Every additional tool increases the cognitive overhead for the AI — “should I call this or not?” — and that cost compounds.

I felt this pain firsthand while building my own agents, which is why this story from Thariq, a developer on the Claude Code team, hit so hard. Here’s a chronological breakdown of how they added, removed, and redesigned their tools.

One Tool, One Job — Or the AI Freezes

This was the very first problem the Claude Code team hit. They needed a way to ask users questions, so they bundled a question feature into the “planning” tool. Implementation was fast, but the AI would try to formulate a plan and ask a question simultaneously. When the user’s answer conflicted with the plan, the AI couldn’t resolve it.

Their second attempt had the AI output questions in Markdown format. The AI kept ignoring the format or appending extra text.

On the third try, they split the question feature into a dedicated tool — AskUserQuestion. That’s when things finally stabilized. One tool, one role. It sounds obvious, but you don’t truly feel it until you’ve been burned.

  • Plan + question combined — The AI called the same tool twice in error
  • Markdown format output — The AI appended sentences or ignored the structure
  • Dedicated tool separation — Structured responses finally worked reliably
  • No matter how good the design, it’s meaningless if the AI doesn’t want to call it

Tools Have an Expiration Date

Building well-separated tools isn’t the end of the story. I’d call this “tool decay” — the phenomenon where a once-essential tool becomes a bottleneck after a model upgrade.

Early Claude Code had a Todo tool, and the system sent reminders every five turns: “Don’t forget your task list.” After the model improved, these reminders backfired. The AI stubbornly stuck to its original plan even when it should have adapted. When Opus 4.5 introduced sub-agent collaboration, the existing Todo structure couldn’t share tasks between agents at all.

They ended up replacing it entirely with the Task Tool.

  • TodoWrite replaced by Task Tool — Enabled dependency sharing between agents
  • “Is this tool still valid?” requires periodic review, as important as adding new tools
  • Supporting fewer models speeds up these judgment calls
  • The shape of the tool matters more than the number of tools — match it to model capabilities

Spoon-Feeding Context Makes the AI Worse

Through the process of adding and removing tools, the Claude Code team uncovered a more fundamental pattern: letting the AI find information itself beats injecting it.

Initially, they used a RAG vector database to pre-load context. It was fast and powerful, but indexing broke across environments, and the AI became passive — relying only on what it was given.

When they gave the AI a Grep tool to search the codebase directly, context quality improved. They added Skills files on top, creating a structure where the AI could recursively explore referenced files within files.

This is what I’d call progressive disclosure — instead of dumping all information at once, the AI discovers what it needs on its own.

  • RAG — High environment dependency, AI passively consumed context
  • Grep + Skills — AI actively explored multiple layers of files
  • Over one year, evolved from “AI that can’t find context” to “AI that finds it itself”
  • Capabilities expanded through Skills files alone, no new tools needed

Expanding Capabilities Without Adding Tools

This progressive disclosure pattern proved its value in another case. Users would ask how to use Claude Code, and it couldn’t answer. The team could have stuffed all usage docs into the system prompt, but that question only comes up occasionally. When rarely-used information permanently occupies the context window, it degrades performance on the core task — writing code. I’d call this context rot.

The solution was a dedicated sub-agent. When a usage question came in, a guide agent searched the docs and returned just the answer. The number of tools stayed the same, but the AI’s capabilities grew.

  • All info in system prompt — Context rot degraded code quality
  • Just providing doc links — AI loaded too many results into context
  • Dedicated sub-agent + search instructions — Clean, focused answers returned
  • Solved by changing structure, not by adding tools

There Is No Formula

Adding, removing, and redesigning tools — there’s no universal formula for this process. When models change, tools must change too. Yesterday’s optimal structure can become tomorrow’s bottleneck.

One thing the Anthropic team repeated throughout the year: read the AI’s output, experiment, and fix it again. In the end, the people who build great agents are the ones who can see from the AI’s perspective.

Join the newsletter

Get updates on my latest projects, articles, and experiments with AI and web development.