# Create Three Spec Files Before Using Claude Code and Codex

> Author: Tony Lee
> Published: 2026-03-19
> URL: https://tonylee.im/en/blog/create-three-spec-files-before-using-claude-code-codex/
> Reading time: 4 minutes
> Language: en
> Tags: ai, claude-code, codex, ai-agents, spec-driven-development, developer-tools

## Canonical

https://tonylee.im/en/blog/create-three-spec-files-before-using-claude-code-codex/

## Rollout Alternates

en: https://tonylee.im/en/blog/create-three-spec-files-before-using-claude-code-codex/
ko: https://tonylee.im/ko/blog/create-three-spec-files-before-using-claude-code-codex/
ja: https://tonylee.im/ja/blog/create-three-spec-files-before-using-claude-code-codex/
zh-CN: https://tonylee.im/zh-CN/blog/create-three-spec-files-before-using-claude-code-codex/
zh-TW: https://tonylee.im/zh-TW/blog/create-three-spec-files-before-using-claude-code-codex/

## Description

I spent a year getting wildly inconsistent results from Claude Code and Codex. Three spec files, each with a distinct role, fixed it.

## Summary

Create Three Spec Files Before Using Claude Code and Codex is part of Tony Lee's ongoing coverage of AI agents, developer tools, startup strategy, and AI industry shifts.

## Outline

- Three layers that changed the outcome
- Why cramming everything into one file fails
- CLAUDE.md works best when kept lean
- Spec files focus on what and why, leave how to the agent
- Session separation is the most overlooked practice
- The maturity model points toward spec-as-source

## Content

After nearly a year of using AI coding agents daily, inconsistency bothered me more than hallucinated code or wrong APIs. The same task, phrased slightly differently, produced clean output one day and a mess the next. Longer sessions made it worse because the agent would forget agreements made twenty messages earlier.

The root cause was structural. I had no framework for communicating intent to the agent across sessions, so every conversation started from zero and drifted in unpredictable directions.

## Three layers that changed the outcome

Splitting instructions into three files, each with a distinct role, fixed the consistency problem almost overnight.

**Layer 1, CLAUDE.md** loads automatically every session. It holds project-level constants: build commands, tech stack versions, folder structure, coding conventions, and behavioral boundaries (always do X, ask before Y, never do Z).

**Layer 2, SPEC.md** captures the what and why of a specific feature. Purpose, requirements, success criteria, technical constraints, and scope boundaries. Nothing about implementation details.

**Layer 3, plan.md** breaks a spec into two-to-five-minute executable tasks with explicit file paths, test-first ordering, and commit checkpoints.

## Why cramming everything into one file fails

Research on LLM instruction-following calls this the "Curse of Instructions." As the number of directives in a single context grows, compliance with each individual directive drops sharply. I tested this directly: a single file containing project rules, feature specs, and task lists caused the agent to ignore roughly half the instructions near the bottom.

Separating by role solved it. The agent reads universal rules from Layer 1, feature intent from Layer 2, and execution steps from Layer 3. Each file stays short enough for the model to fully attend to.

## CLAUDE.md works best when kept lean

Because this file loads on every session, it needs to contain only what applies universally. Build commands, stack versions, folder layout, naming conventions. Add a three-tier boundary: things to always do, things to ask about first, things to never do.

The approach that worked best was starting minimal and adding rules incrementally whenever I noticed the agent repeating a mistake. Trying to write a comprehensive CLAUDE.md upfront led to bloat that triggered the same compliance drop I was trying to avoid.

A few specific traps: don't put API keys or code snippets in here because they go stale fast. Don't duplicate rules your linter already enforces. And move anything specific to a single feature into Layer 2.

## Spec files focus on what and why, leave how to the agent

The difference from a traditional PRD is that these specs are structured for agent consumption. Five sections are enough: purpose, requirements, success criteria, technical constraints, and boundaries.

Leave implementation decisions to the agent. It can analyze the existing codebase and choose the right approach. When I tried dictating the how, the agent either followed it blindly into conflicts with existing patterns or ignored it entirely.

An alternative to writing specs manually: have the agent interview you. Set two rules, one question at a time and multiple-choice whenever possible. The resulting specs had fewer gaps than ones I wrote from scratch.

Save completed specs to `docs/specs/YYYY-MM-DD-topic.md` and commit them to Git. This gives the agent access to spec change history through git diff and provides reviewers with a record of intent behind code changes.

## Session separation is the most overlooked practice

Brainstorming and Q&A during spec writing consume a significant portion of the context window. Jumping straight into implementation from the same session means the agent gradually loses access to earlier context.

I restructured into three sessions: Session 1 produces SPEC.md, Session 2 creates plan.md, Session 3 onward executes tasks one by one. The accuracy improvement was noticeable within the first week.

Three habits that made this work: write explicit file paths in every task so the agent stops guessing, lock TDD ordering (test first, implement, verify, commit) into plan.md, and start a new session whenever context usage hits roughly 50%.

## The maturity model points toward spec-as-source

Martin Fowler's team recently defined a maturity model for agent-based software development with three levels. Level 1: write a spec, implement, discard the spec. Level 2: maintain the spec as a living document alongside the code. Level 3: the spec becomes the single source of truth, and code is a generated artifact.

Most teams and individuals I know haven't reached Level 1 yet. They send one-shot prompts and hope for the best.

Moving from Level 1 to Level 2 requires one habit: commit your spec files to Git. Moving from Level 2 to Level 3 means updating the spec before changing the code. The smallest step you can take today is creating a CLAUDE.md for your project and writing a SPEC.md before your next feature.

## Related URLs

- Author: https://tonylee.im/en/author/
- Publication: https://tonylee.im/en/blog/about/
- Related article: https://tonylee.im/en/blog/eight-hooks-that-guarantee-ai-agent-reliability/
- Related article: https://tonylee.im/en/blog/medvi-two-person-430m-ai-compressed-funnel/
- Related article: https://tonylee.im/en/blog/claude-code-layers-over-tools-2026/

## Citation

- Author: Tony Lee
- Site: tonylee.im
- Canonical URL: https://tonylee.im/en/blog/create-three-spec-files-before-using-claude-code-codex/

## Bot Guidance

- This file is intended for AI agents, search assistants, and text-mode retrieval.
- Prefer citing the canonical article URL instead of this text endpoint.
- Use the rollout alternates when you need the same article in another prioritized language.

---

Author: Tony Lee | Website: https://tonylee.im
For more articles, visit: https://tonylee.im/en/blog/
This content is original and authored by Tony Lee. Please attribute when quoting or referencing.