Index
4 min read 2026

Create Three Spec Files Before Using Claude Code and Codex

I spent a year getting wildly inconsistent results from Claude Code and Codex. Three spec files, each with a distinct role, fixed it.

After nearly a year of using AI coding agents daily, inconsistency bothered me more than hallucinated code or wrong APIs. The same task, phrased slightly differently, produced clean output one day and a mess the next. Longer sessions made it worse because the agent would forget agreements made twenty messages earlier.

The root cause was structural. I had no framework for communicating intent to the agent across sessions, so every conversation started from zero and drifted in unpredictable directions.

Three layers that changed the outcome

Splitting instructions into three files, each with a distinct role, fixed the consistency problem almost overnight.

Layer 1, CLAUDE.md loads automatically every session. It holds project-level constants: build commands, tech stack versions, folder structure, coding conventions, and behavioral boundaries (always do X, ask before Y, never do Z).

Layer 2, SPEC.md captures the what and why of a specific feature. Purpose, requirements, success criteria, technical constraints, and scope boundaries. Nothing about implementation details.

Layer 3, plan.md breaks a spec into two-to-five-minute executable tasks with explicit file paths, test-first ordering, and commit checkpoints.

Why cramming everything into one file fails

Research on LLM instruction-following calls this the “Curse of Instructions.” As the number of directives in a single context grows, compliance with each individual directive drops sharply. I tested this directly: a single file containing project rules, feature specs, and task lists caused the agent to ignore roughly half the instructions near the bottom.

Separating by role solved it. The agent reads universal rules from Layer 1, feature intent from Layer 2, and execution steps from Layer 3. Each file stays short enough for the model to fully attend to.

CLAUDE.md works best when kept lean

Because this file loads on every session, it needs to contain only what applies universally. Build commands, stack versions, folder layout, naming conventions. Add a three-tier boundary: things to always do, things to ask about first, things to never do.

The approach that worked best was starting minimal and adding rules incrementally whenever I noticed the agent repeating a mistake. Trying to write a comprehensive CLAUDE.md upfront led to bloat that triggered the same compliance drop I was trying to avoid.

A few specific traps: don’t put API keys or code snippets in here because they go stale fast. Don’t duplicate rules your linter already enforces. And move anything specific to a single feature into Layer 2.

Spec files focus on what and why, leave how to the agent

The difference from a traditional PRD is that these specs are structured for agent consumption. Five sections are enough: purpose, requirements, success criteria, technical constraints, and boundaries.

Leave implementation decisions to the agent. It can analyze the existing codebase and choose the right approach. When I tried dictating the how, the agent either followed it blindly into conflicts with existing patterns or ignored it entirely.

An alternative to writing specs manually: have the agent interview you. Set two rules, one question at a time and multiple-choice whenever possible. The resulting specs had fewer gaps than ones I wrote from scratch.

Save completed specs to docs/specs/YYYY-MM-DD-topic.md and commit them to Git. This gives the agent access to spec change history through git diff and provides reviewers with a record of intent behind code changes.

Session separation is the most overlooked practice

Brainstorming and Q&A during spec writing consume a significant portion of the context window. Jumping straight into implementation from the same session means the agent gradually loses access to earlier context.

I restructured into three sessions: Session 1 produces SPEC.md, Session 2 creates plan.md, Session 3 onward executes tasks one by one. The accuracy improvement was noticeable within the first week.

Three habits that made this work: write explicit file paths in every task so the agent stops guessing, lock TDD ordering (test first, implement, verify, commit) into plan.md, and start a new session whenever context usage hits roughly 50%.

The maturity model points toward spec-as-source

Martin Fowler’s team recently defined a maturity model for agent-based software development with three levels. Level 1: write a spec, implement, discard the spec. Level 2: maintain the spec as a living document alongside the code. Level 3: the spec becomes the single source of truth, and code is a generated artifact.

Most teams and individuals I know haven’t reached Level 1 yet. They send one-shot prompts and hope for the best.

Moving from Level 1 to Level 2 requires one habit: commit your spec files to Git. Moving from Level 2 to Level 3 means updating the spec before changing the code. The smallest step you can take today is creating a CLAUDE.md for your project and writing a SPEC.md before your next feature.

Join the newsletter

Get updates on my latest projects, articles, and experiments with AI and web development.