Index
5 min read 2026

Create Three Spec Files Before Giving Work to Claude Code or Codex

After a year of agent-assisted development, I found that structured spec files fixed the inconsistency problem better than any prompt technique.

If you’ve used AI coding agents for more than a week, you’ve probably hit this: the same task produces clean output on Monday and garbage on Tuesday. The prompt barely changed. The codebase didn’t change. But the result is completely different.

I spent nearly a year chasing this problem. I tried better prompts, more detailed instructions, different models. None of it solved the core issue. The real problem was that I had no structured way to communicate intent. I was relying on the prompt alone to carry everything: project rules, feature requirements, task steps, coding conventions. That’s too much weight for a single input.

The fix turned out to be splitting my instructions into three files at three different layers. Since making that change, the quality gap between sessions has mostly disappeared.

Why a Single File Breaks Down

Research on LLM instruction-following shows a pattern called the “Curse of Instructions.” As the number of directives in a single context increases, compliance with each individual directive drops sharply. I tested this directly: when I packed project rules, feature specs, and task lists into one CLAUDE.md file, the agent ignored roughly half the instructions toward the end.

This makes sense if you think about how attention works. A long document full of mixed concerns forces the model to decide what matters most in real time. It makes bad trade-offs. The solution is separating information by role so each layer stays focused.

Layer 1: CLAUDE.md as a Project Constitution

CLAUDE.md (or the equivalent for your agent) loads automatically in every session. That makes it the wrong place for feature-specific instructions. It should contain only things that are true across the entire project:

  • Build commands and tech stack versions
  • Folder structure and naming conventions
  • Three tiers of behavioral rules: “always do,” “ask first,” and “never do”

I learned not to front-load this file with everything I could think of. Instead, I watched where the agent made repeated mistakes and added rules incrementally. That approach stuck better than any upfront specification.

A few things to keep out of CLAUDE.md: API keys (they go stale), code snippets (they drift from the real code), style rules that a linter already enforces, and anything specific to a single feature. Those belong in the next layer.

Layer 2: SPEC.md for the What and Why

A spec file describes one feature. It answers what the feature does, why it exists, what the success criteria are, what the technical constraints are, and where the boundaries lie. That’s it. Five sections.

The critical difference from a traditional PRD: spec files are structured for agent consumption, not human review meetings. And they deliberately omit “how.” The agent should figure out implementation by analyzing the existing codebase, not by following step-by-step coding instructions from the spec.

There are two ways to write a good spec. You can draft it yourself using the five-section structure above. Or you can have the agent interview you. The interview approach works surprisingly well if you enforce two rules: one question at a time, and prefer multiple-choice over open-ended questions. The resulting spec tends to cover edge cases I would have missed writing it alone.

Save completed specs to docs/specs/YYYY-MM-DD-topic.md and commit them. This creates a trail the agent can follow with git diff, and it gives code reviewers a way to distinguish intentional decisions from accidental ones.

Layer 3: plan.md for Executable Tasks

The plan file is the narrowest layer. It breaks the spec into tasks that each take two to five minutes, with explicit file paths and a fixed execution order. I anchor every plan to a TDD cycle: write the test first, implement the minimum code to pass it, confirm the test passes, then commit.

Explicit file paths matter more than I expected. Without them, the agent guesses where files should go, and those guesses diverge across sessions. Pinning the paths removes that variance.

The Session Split That Made It Work

Here’s the part I missed for months. Even with three clean files, quality still degraded when I wrote the spec and ran the implementation in the same session. The reason is straightforward: brainstorming and Q&A during spec writing consume a large chunk of the context window. By the time implementation starts, the agent is already forgetting earlier decisions.

The fix is simple. Session 1: write SPEC.md and save it. Session 2: generate plan.md from the spec. Session 3 onward: execute tasks one at a time. I also switch to a new session whenever the context window is roughly half full. The accuracy improvement was immediately noticeable.

Where This Is Heading

The team behind the Agile Manifesto published a maturity model for agent-based software development that maps to this layered approach:

  • Level 1: Write a spec, implement, then discard the spec.
  • Level 2: Keep the spec alive as a living document that evolves with the code.
  • Level 3: The spec becomes the single source of truth, and code is a generated artifact.

Most developers I talk to haven’t reached Level 1 yet. They’re still operating in a mode where the prompt is the only input and the code is the only output. That works for small tasks, but it falls apart as complexity grows.

The transition from Level 1 to Level 2 is just one habit: commit your spec files to Git. The transition from Level 2 to Level 3 is a workflow change: when you need to modify the code, update the spec first and regenerate.

You can start today. Create a CLAUDE.md with your project basics, write a SPEC.md for your next feature before touching any code, and see what happens. The difference compounds fast.

Join the newsletter

Get updates on my latest projects, articles, and experiments with AI and web development.