Claude Code 29 Tools vs Codex 7 Tools: The Design Philosophies Are Polar Opposites
I dug into SDK type definitions and system prompts for both tools. The 29 vs 7 gap isn't about feature count. It's about two fundamentally different answers to the same question: how should an AI coding agent interact with your system?
I got tired of chasing weekly updates for both Claude Code and Codex, so I went back to first principles. I opened every SDK type definition, every system prompt, every settings schema I could find. I wanted to understand not just what each tool does, but why the tool counts diverge so dramatically.
Claude Code exposes 29 tools. Codex exposes 7. That ratio kept nagging at me, because it can’t simply be a feature gap. Two well-funded teams with top-tier engineers don’t accidentally land on a 4:1 ratio. The gap is intentional, and the reasoning behind it reveals two genuinely different philosophies about how AI should interact with your development environment.
Tool Granularity Is a Security Decision
The most striking difference is how each tool handles file operations. Claude Code splits file manipulation into four separate tools: Read, Write, Edit, and MultiEdit. Search gets its own dedicated tools too, with Grep and Glob fully independent from Bash. This means you can configure settings.json to allow Read but block Write. You can let the agent search your codebase without ever granting it permission to modify a single file. Permission control happens at the tool level.
Codex takes a different path. It gives the agent shell, apply_patch, and file_read as the core primitives. Everything else goes through the shell. You want to search files? That’s a shell command. You want to list directories? Shell again. Security doesn’t come from tool-level permissions but from execpolicy rules that pattern-match against specific shell commands, classifying them into allow, prompt, or block categories.
Neither approach is wrong. Claude Code’s model gives you fine-grained locks but requires maintaining a larger tool surface. Codex’s model is simpler to reason about but pushes security enforcement into string-matching on shell commands, which gets fragile when commands get creative. I’ve seen cases where a well-crafted pipe chain bypasses an execpolicy rule that was written for the straightforward version of the same command.
The full breakdown:
- Claude Code (29 tools): 4 file tools (Read/Write/Edit/MultiEdit), 3 search tools (Glob/Grep/LS), 2 web tools, 3 cron tools, 4 MCP tools, Bash, and more
- Codex (7 tools): shell, apply_patch, file_read, web_search, update_plan, write_stdin, js_repl
Skill Deployment Splits the Ecosystem
Both tools adopted the Agent Skills open standard, where a single SKILL.md file defines a skill’s behavior. The structure is identical. The distribution model is not.
Codex built a centralized distribution system. Running $skill-installer pulls curated skills from OpenAI’s official skills repository. Pass a GitHub URL and you can install third-party skills too. There’s even $skill-creator for generating new skills interactively through conversation. The experience feels like npm: one command, one registry, instant availability.
Claude Code went the other direction. You create SKILL.md files in .claude/skills/ manually, or you install bundles from git repositories through /plugin marketplace add. There’s no single official registry. Skills get discovered through community repos, shared links, and word of mouth.
I initially preferred Codex’s centralized model because discoverability is better. But after using both for several weeks, the decentralized approach has a genuine advantage: I can edit a skill file mid-session and the changes apply immediately without restarting. With Codex’s installed skills, changes require reinstallation. When you’re iterating on a custom workflow, that difference matters more than I expected.
Comparison at a glance:
- Invocation: Claude Code uses
/skill-name, Codex uses$skill-name - Storage:
.claude/skills/vs.agents/skills/ - Built-in skills: Claude Code ships
/simplify,/batch,/loop,/claude-api; Codex ships$skill-installer,$skill-creator - Distribution: Decentralized marketplace vs centralized repository
Session Diagnostics Is Where the Gap Gets Real
Both tools share the basics: /model, /plan, /review, /clear, /fast. The divergence shows up in session introspection.
Claude Code invested heavily in letting you understand what’s happening inside your session. /compact manually triggers context compression. /context shows what’s loaded. /cost tracks token spend in real time. /doctor diagnoses configuration problems. /rewind rolls back to a previous conversation state. /insights analyzes a month of usage patterns and suggests improvements. /usage shows cumulative consumption across sessions. That’s seven commands dedicated purely to understanding and managing session state.
Codex focused elsewhere. /personality adjusts the agent’s communication style. /theme changes the visual appearance. /apps manages connected applications. These are UX customization features, not diagnostic tools.
This reflects a deeper philosophical split. Claude Code treats the session as something you should actively monitor and steer. Codex treats it as something that should just work in the background while you focus on customizing the experience. After months of use, I find myself wanting both. The diagnostics save me when a session goes sideways, but I also appreciate being able to adjust personality when I switch between detailed architecture work and quick bug fixes.
- Claude Code (~35 commands + 4 bundled skills): heavy on session diagnostics like
/compact,/context,/cost,/doctor,/rewind,/insights,/usage - Codex (~19 commands): stronger in UX customization with
/personality,/theme,/copy,/apps,/skills,/agent,/tools
Team Architectures Start From Different Assumptions
How each tool handles multi-agent collaboration reveals perhaps the deepest design difference.
Claude Code’s Agent Teams use peer-to-peer communication. Teammates send messages directly to each other without routing through a lead agent. They share a task list and coordinate autonomously. You can run 2 to 16 agents, and they’ll negotiate among themselves who handles what. I tested this with three agents on a refactoring task, and token consumption was 3 to 7 times higher than a single session doing the same work. The coordination overhead is real. But when the task genuinely benefits from parallel exploration (like debugging a race condition where you want agents probing different hypotheses simultaneously), the P2P model finds answers faster.
Codex uses a hub-spoke model. Child agents report only to the parent. There’s no lateral communication. The spawn_agents_on_csv command creates agents in bulk from a CSV file, which is optimized for embarrassingly parallel tasks where each unit of work is independent. Think: “apply this migration to 200 files” or “run this check against every endpoint in this list.”
P2P isn’t universally better. I wasted significant tokens on a straightforward batch task because Claude Code’s agents kept discussing their overlapping work with each other. Codex’s hub-spoke would have been the right choice for that particular job.
- Claude Code: P2P messaging with shared task list, 2 to 16 agents, tmux split-pane support
- Codex: Hub-spoke architecture, CSV-based bulk agent spawning via
spawn_agents_on_csv
Hook Granularity Determines Automation Depth
Claude Code lets you intercept tool execution at multiple lifecycle points. PreToolUse fires before a tool runs, letting you validate or modify the call. PostToolUse fires after, so you can attach a formatter that auto-runs on every file save. Notification hooks capture agent communications. PreCompact fires before context compression, giving you a chance to preserve critical information. HTTP Hooks can POST JSON to external URLs, connecting Claude Code to CI pipelines, Slack, or custom dashboards.
Codex keeps it simple. One execpolicy file with allow/prompt/block rules applied to shell commands. That’s the entire extensibility surface for controlling agent behavior.
I set up a PostToolUse hook that runs Prettier after every Write operation. It took five minutes and eliminated an entire category of formatting-related follow-up prompts. That kind of surgical automation isn’t possible in Codex’s model, where you’d need to include “and run prettier after writing” in every prompt or build it into a skill.
But Codex’s simplicity has value too. I’ve never accidentally broken my Codex setup with a misconfigured hook. I’ve done that twice with Claude Code, once with a PreToolUse hook that silently blocked legitimate file reads and caused twenty minutes of confused debugging.
- Claude Code: PreToolUse, PostToolUse, Notification, PreCompact, and HTTP Hooks
- Codex: execpolicy rule file with three levels (allow/prompt/block)
Choose the Architecture, Not the Feature List
The 29 vs 7 comparison is not about one tool being more capable than the other. It’s about two different answers to the same design question: how much should an AI coding agent decompose its capabilities into individually controllable units?
Claude Code says “everything.” Every operation gets its own tool, its own permission surface, its own hook points. This gives you maximum control at the cost of configuration complexity. Codex says “only the essentials.” Core operations get dedicated tools; everything else flows through the shell with policy-based guardrails. This gives you simplicity at the cost of granularity.
When I pick which tool to use for a given project, the feature list barely matters. What matters is whether the project needs fine-grained permission control (regulated codebase, multiple contributors with different access levels) or lightweight simplicity (solo project, fast iteration, minimal setup). The architecture you build on top of shapes every workflow decision that follows.
Join the newsletter
Get updates on my latest projects, articles, and experiments with AI and web development.