Index
4 min read Updated Feb 18, 2026

How OpenAI Built 1 Million Lines of Code Using Only Agents: 5 Harness Engineering Principles

OpenAI's Codex team built a 1M-line codebase using only AI agents. Here are the five harness engineering principles they discovered along the way.

The word “harness” has been showing up everywhere lately. A blog post released by OpenAI finally gave this concept a clear definition. Here’s what engineers actually need to do in the age of agents.

A harness is the tool shell that allows an AI agent to affect the real world. If the reasoning model is the brain, the harness is the hands and feet. Reading files, fixing code, running tests, deploying to production: all of it happens inside the harness.

An internal OpenAI team started from an empty repository in late August 2025 and built a 1-million-line product using only Codex agents. The condition was simple: no human-written code. They reported it took one-tenth the time compared to doing it manually. The five principles they discovered during this process are outlined below.

Knowledge the Agent Can’t See Doesn’t Exist

From Codex’s perspective, information it can’t access at runtime might as well not exist. Planning docs in Google Docs, architecture decisions agreed upon in Slack, tacit knowledge locked inside someone’s head. None of it is visible. It’s the same situation a new hire joining three months from now would face.

So the team pushed every decision into the repository as markdown, schemas, and execution plans (ExecPlans).

  • ExecPlan is a self-contained design document defined in PLANS.md
  • The passing criteria: a beginner should be able to read it and implement the feature end to end
  • There are cases where Codex worked continuously for over 7 hours on a single prompt
  • The structure extends matklad’s ARCHITECTURE.md concept for agent use

Ask “What Capability Is Missing” Instead of “Try Harder”

Early on, agent velocity was slower than expected. The cause wasn’t model performance. It was an under-equipped environment. Every time something failed, the team asked: “What capability is missing, and how do we make it readable and verifiable by the agent?”

  • Built custom concurrency helpers instead of external libraries, achieving 100% integration with OpenTelemetry
  • So-called “boring technology” turns out to favor agents (due to API stability and higher representation in training data)

Mechanical Enforcement, Not Documentation, Maintains Code Consistency

Documentation alone couldn’t keep the agent-generated codebase consistent. So the team chose to mechanically enforce invariant rules rather than prescribing implementation details. They mandated parsing at data boundaries but left the choice of library to the agent. Architecture was locked into a layered domain structure with dependency directions verified by linters.

  • Fixed layers per business domain: Providers → Service → Runtime → UI
  • Cross-cutting concerns structure where Types, Config, and Repo are shared at lower levels
  • Custom linters and structural tests fail the build immediately on violation
  • The linters themselves were also written by Codex

Give the Agent Eyes and It Works Alone for 6 Hours

The team connected Chrome DevTools Protocol to the agent runtime, giving Codex access to DOM snapshots, screenshots, and navigation capabilities. The structure compares pre- and post-task snapshots, observes runtime events, then applies fixes in a loop until everything is clean.

Observability tools were attached the same way. A temporary observability stack spins up per git worktree and disappears when the work is done.

  • Victoria Logs (LogQL) and Victoria Metrics (PromQL) let the agent query logs and metrics directly
  • Prompts like “make the service start in under 800ms” become executable
  • Single Codex runs regularly sustain focus on one task for over 6 hours

Give a Map, Not a 1,000-Page Manual

Context management determines agent effectiveness. The team initially tried cramming everything into one massive AGENTS.md file. It failed. The ARCHITECTURE.md concept written by matklad in 2021 proved its worth here. The principle: provide a brief bird’s-eye view of the project structure, including only what rarely changes. The same principle applies to agents.

  • ARCHITECTURE.md is a code map, not a code atlas
  • Architectural invariants are often expressed in the form of “something does not exist”
  • Stating boundaries explicitly constrains all downstream implementation

Open Questions

Even for the Codex team, some questions remain unanswered. No one knows whether a system built entirely by agents can maintain architectural consistency over years. How this framework itself will evolve as models improve is also uncertain.

One thing is clear: the era of writing code well is ending, and the era of designing environments well has begun.

Join the newsletter

Get updates on my latest projects, articles, and experiments with AI and web development.