# Eight Hooks That Guarantee AI Agent Reliability

> Author: Tony Lee
> Published: 2026-04-05
> URL: https://tonylee.im/en/blog/eight-hooks-that-guarantee-ai-agent-reliability/
> Reading time: 5 minutes
> Language: en
> Tags: claude-code, ai-agents, hooks, developer-tools, harness-engineering

## Canonical

https://tonylee.im/en/blog/eight-hooks-that-guarantee-ai-agent-reliability/

## Rollout Alternates

en: https://tonylee.im/en/blog/eight-hooks-that-guarantee-ai-agent-reliability/
ko: https://tonylee.im/ko/blog/eight-hooks-that-guarantee-ai-agent-reliability/
ja: https://tonylee.im/ja/blog/eight-hooks-that-guarantee-ai-agent-reliability/
zh-CN: https://tonylee.im/zh-CN/blog/eight-hooks-that-guarantee-ai-agent-reliability/
zh-TW: https://tonylee.im/zh-TW/blog/eight-hooks-that-guarantee-ai-agent-reliability/

## Description

CLAUDE.md rules get followed about 80% of the time. Hooks get followed 100% of the time. After six months of testing, these are the eight I never removed.

## Summary

Eight Hooks That Guarantee AI Agent Reliability is part of Tony Lee's ongoing coverage of AI agents, developer tools, startup strategy, and AI industry shifts.

## Outline

- The four guards before execution
- The three inspectors after execution
- Preserving work when the agent stops
- What didn't work cleanly
- Why this maps to a broader pattern

## Content

CLAUDE.md is a suggestion. If you write "run Prettier before committing," the agent will skip it roughly one time out of three. Writing more precise instructions doesn't fix the problem. An 80% compliance rate isn't a model quality issue. It's a structural limitation of putting rules inside the context window and hoping the model follows them.

Hooks operate on a completely different plane. They are scripts that execute automatically at specific lifecycle points, regardless of what the model decides. PreToolUse fires right before the agent modifies a file or runs a command. PostToolUse fires right after. The model doesn't choose whether to run them. They just run.

The practical difference is immediate. Adding ten lines of rules to CLAUDE.md and adding one Hook to `.claude/settings.json` feel like entirely different interventions. Exit code 2 blocks the agent's action outright. Exit code 0 lets it pass. Any other exit code logs a warning but doesn't block. And because hooks live in `settings.json`, you commit them once and the whole team gets them through git.

## The four guards before execution

I've been running hooks for over six months. These four PreToolUse hooks survived every project without being removed once.

**Block Dangerous Commands** catches destructive patterns like `rm -rf`, `git reset --hard`, and `DROP TABLE` through regex matching, then returns exit code 2 to kill the action before it happens. I've watched agents attempt `rm -rf` on directories they shouldn't touch. Without this hook, the damage would have been real.

**Protect Sensitive Files** blocks any modification attempt on `.env`, `package-lock.json`, `*.pem`, and similar files. The agent never gets the chance to overwrite your lock file or leak credentials into a commit.

**Require Tests Before PR** gates pull request creation on passing tests. Set the matcher to `mcp__github__create_pull_request` and the agent literally cannot open a PR until tests pass. No more "I'll fix the tests in a follow-up."

**Log Every Command** writes every bash command the agent runs to `.claude/command-log.txt` with timestamps. Three days later, when something looks wrong, you can trace exactly what happened.

## The three inspectors after execution

PostToolUse hooks run immediately after the agent modifies a file. I chain three of them in sequence.

**Auto-Format** runs Prettier on every changed file. For Python projects, swap in Black. For Go, use gofmt. The formatter runs whether or not the agent remembered to format.

**Auto-Lint** runs ESLint right after formatting. If ESLint finds errors, the agent sees them immediately and fixes them in the same turn. The number of lint issues that make it to human code review drops to nearly zero.

**Auto-Test** runs the relevant test suite after each file change. When a test fails, the agent knows within seconds and attempts a fix. I pipe the output through `tail -5` to keep only the summary, which prevents test output from flooding the context window.

The order matters. Prettier first, then ESLint, then tests. By the time a human looks at the code, formatting and lint have already passed. Style comments in code review disappear.

## Preserving work when the agent stops

One Stop hook handles this: **Auto-Commit** runs `git add -A && git commit` every time the agent finishes a response. Each unit of work gets its own commit. Two tasks never bleed into a single commit.

Combine this with git worktrees and you get automatic per-branch commits on feature branches. If the agent crashes or you interrupt it, you never lose the last chunk of work.

## What didn't work cleanly

Hook chaining sounds elegant, but debugging a failing chain is harder than debugging a single script. When the auto-test hook started failing silently because the test runner wasn't installed in a new project, I spent an hour tracing why the agent kept producing untested code. The hook returned exit code 0 (pass) because the test script itself wasn't found, and the shell treated "command not found" as a non-blocking condition. I had to add an explicit check for the test runner's existence before invoking it.

Performance is the other constraint. The conventional worry is that many hooks slow things down, but that's not quite right. The real question is whether each individual hook finishes in under 200 milliseconds. A Prettier run on a single file takes about 50ms. An ESLint check takes about 80ms. Tests vary, but scoping to affected files keeps most runs fast. If any single hook takes over a second, the agent's feedback loop starts to feel sluggish.

## Why this maps to a broader pattern

OpenAI's Harness Engineering blog made a point that agents work best inside rigid boundaries and predictable structure. React's design philosophy says the same thing about components: composable units with defined lifecycle phases and state.

Hooks in Claude Code follow the same abstraction. State corresponds to sessions and memory. Hooks are the functions that intervene at lifecycle boundaries. PreToolUse sets the boundaries. PostToolUse makes the structure predictable. Stop preserves the result.

The "run Prettier" line I used to keep in CLAUDE.md is gone now. The hook runs it every time, without being asked.

## Related URLs

- Author: https://tonylee.im/en/author/
- Publication: https://tonylee.im/en/blog/about/
- Related article: https://tonylee.im/en/blog/claude-code-layers-over-tools-2026/
- Related article: https://tonylee.im/en/blog/codex-folder-structure-why-config-breaks/
- Related article: https://tonylee.im/en/blog/codex-inside-claude-code-openai-plugin-strategy/

## Citation

- Author: Tony Lee
- Site: tonylee.im
- Canonical URL: https://tonylee.im/en/blog/eight-hooks-that-guarantee-ai-agent-reliability/

## Bot Guidance

- This file is intended for AI agents, search assistants, and text-mode retrieval.
- Prefer citing the canonical article URL instead of this text endpoint.
- Use the rollout alternates when you need the same article in another prioritized language.

---

Author: Tony Lee | Website: https://tonylee.im
For more articles, visit: https://tonylee.im/en/blog/
This content is original and authored by Tony Lee. Please attribute when quoting or referencing.