Eight Hooks That Guarantee AI Agent Reliability
CLAUDE.md rules get followed about 80% of the time. Hooks get followed 100% of the time. After six months of testing, these are the eight I never removed.
CLAUDE.md is a suggestion. If you write “run Prettier before committing,” the agent will skip it roughly one time out of three. Writing more precise instructions doesn’t fix the problem. An 80% compliance rate isn’t a model quality issue. It’s a structural limitation of putting rules inside the context window and hoping the model follows them.
Hooks operate on a completely different plane. They are scripts that execute automatically at specific lifecycle points, regardless of what the model decides. PreToolUse fires right before the agent modifies a file or runs a command. PostToolUse fires right after. The model doesn’t choose whether to run them. They just run.
The practical difference is immediate. Adding ten lines of rules to CLAUDE.md and adding one Hook to .claude/settings.json feel like entirely different interventions. Exit code 2 blocks the agent’s action outright. Exit code 0 lets it pass. Any other exit code logs a warning but doesn’t block. And because hooks live in settings.json, you commit them once and the whole team gets them through git.
The four guards before execution
I’ve been running hooks for over six months. These four PreToolUse hooks survived every project without being removed once.
Block Dangerous Commands catches destructive patterns like rm -rf, git reset --hard, and DROP TABLE through regex matching, then returns exit code 2 to kill the action before it happens. I’ve watched agents attempt rm -rf on directories they shouldn’t touch. Without this hook, the damage would have been real.
Protect Sensitive Files blocks any modification attempt on .env, package-lock.json, *.pem, and similar files. The agent never gets the chance to overwrite your lock file or leak credentials into a commit.
Require Tests Before PR gates pull request creation on passing tests. Set the matcher to mcp__github__create_pull_request and the agent literally cannot open a PR until tests pass. No more “I’ll fix the tests in a follow-up.”
Log Every Command writes every bash command the agent runs to .claude/command-log.txt with timestamps. Three days later, when something looks wrong, you can trace exactly what happened.
The three inspectors after execution
PostToolUse hooks run immediately after the agent modifies a file. I chain three of them in sequence.
Auto-Format runs Prettier on every changed file. For Python projects, swap in Black. For Go, use gofmt. The formatter runs whether or not the agent remembered to format.
Auto-Lint runs ESLint right after formatting. If ESLint finds errors, the agent sees them immediately and fixes them in the same turn. The number of lint issues that make it to human code review drops to nearly zero.
Auto-Test runs the relevant test suite after each file change. When a test fails, the agent knows within seconds and attempts a fix. I pipe the output through tail -5 to keep only the summary, which prevents test output from flooding the context window.
The order matters. Prettier first, then ESLint, then tests. By the time a human looks at the code, formatting and lint have already passed. Style comments in code review disappear.
Preserving work when the agent stops
One Stop hook handles this: Auto-Commit runs git add -A && git commit every time the agent finishes a response. Each unit of work gets its own commit. Two tasks never bleed into a single commit.
Combine this with git worktrees and you get automatic per-branch commits on feature branches. If the agent crashes or you interrupt it, you never lose the last chunk of work.
What didn’t work cleanly
Hook chaining sounds elegant, but debugging a failing chain is harder than debugging a single script. When the auto-test hook started failing silently because the test runner wasn’t installed in a new project, I spent an hour tracing why the agent kept producing untested code. The hook returned exit code 0 (pass) because the test script itself wasn’t found, and the shell treated “command not found” as a non-blocking condition. I had to add an explicit check for the test runner’s existence before invoking it.
Performance is the other constraint. The conventional worry is that many hooks slow things down, but that’s not quite right. The real question is whether each individual hook finishes in under 200 milliseconds. A Prettier run on a single file takes about 50ms. An ESLint check takes about 80ms. Tests vary, but scoping to affected files keeps most runs fast. If any single hook takes over a second, the agent’s feedback loop starts to feel sluggish.
Why this maps to a broader pattern
OpenAI’s Harness Engineering blog made a point that agents work best inside rigid boundaries and predictable structure. React’s design philosophy says the same thing about components: composable units with defined lifecycle phases and state.
Hooks in Claude Code follow the same abstraction. State corresponds to sessions and memory. Hooks are the functions that intervene at lifecycle boundaries. PreToolUse sets the boundaries. PostToolUse makes the structure predictable. Stop preserves the result.
The “run Prettier” line I used to keep in CLAUDE.md is gone now. The hook runs it every time, without being asked.
Join the newsletter
Get insights on the latest AI.