Index
3 min read

I Was Too Lazy to Write CLAUDE.md — Turns Out That Was the Right Call

New benchmark data shows AGENTS.md and CLAUDE.md context files actually hurt coding agent performance. Sometimes laziness is the best engineering decision.

Every time a post about CLAUDE.md (or AGENTS.md) showed up in my timeline, I told myself “I’ll set it up later” and scrolled past. Watching others build elaborate AGENTS.md configurations made me a little anxious. Was I falling behind?

Then recent benchmark data came out, and that anxiety vanished. Turns out my laziness was a fairly rational engineering decision.

LLM-generated context files make things worse

“Surely giving the agent more context helps, right?” That’s what I thought too.

When researchers tested LLM-auto-generated context on SWE-bench Lite, the success rate dropped by 0.5%. On AgentBench, it fell another 2%. Even carefully hand-written files only managed a 4% improvement. I’d call this “context overfitting.”

  • 0.5% success rate decrease with LLM-generated context on SWE-bench Lite
  • Additional 2% drop on AgentBench
  • 20–23% increase in inference costs
  • Positive effect (2.7%) observed only in repos with zero documentation

The paper “Evaluating AGENTS.md” by Gloaguen et al. confirmed it: context files tend to reduce task success rates compared to providing no repository context at all.

Agents follow instructions too well — and that’s the problem

The issue isn’t that agents ignore your instructions. It’s the opposite.

Write one line in your context file telling the agent to use uv, and it will install and run uv even in situations where it’s completely unnecessary, adding extra steps every time.

With GPT-5.2, inference tokens increased 14–22% when context files were present. The agent was so busy following instructions that it lost focus on actually solving the problem.

  • Unnecessary pytest runs increased
  • grep and read tool usage expanded far beyond what was needed

”Don’t do X” makes agents think about X more

I covered how SKILL.md body content gets read at specific timings in a previous post, and AGENTS.md has a similar problem.

It sits in the “developer message” layer between the system prompt and the user prompt. This position heavily constrains agent reasoning.

Write “don’t touch this file” and the agent will think about that file an extra time. Researchers called this the “pink elephant effect.” Tell someone not to think about a pink elephant, and that’s exactly what pops into their head.

  • Priority order: provider instructions → system prompt → AGENTS.md → user prompt
  • Manually maintained files can’t keep up with code changes, so the information goes stale fast

If you must write one, keep it minimal

If your repo has absolutely zero documentation, context files can help — the data showed a 2.7% positive effect in those cases. But if you do write one, keep the volume to a minimum.

One line for repo-specific build tool usage. One line for correcting a pattern the agent keeps getting wrong.

Add a hack like “if you find something structurally odd, flag it immediately” and the agent becomes a tool that reports codebase vulnerabilities. Beyond that, making your code structure more intuitive is far more effective than writing instructions about it.

  • Strengthening unit tests and type checks beats context files
  • If file locations are confusing, move the files instead of writing directions

Writing good context files isn’t necessarily a sign of skill. Understanding the structure of context files and designing meta-systems around them — that’s skill. And sometimes, “being lazy” is the best engineering decision you can make.

Join the newsletter

Get updates on my latest projects, articles, and experiments with AI and web development.