# 7-Step Pipeline to Verify Code Written by AI Agents

> Author: Tony Lee
> Published: 2026-02-25
> URL: https://tonylee.im/en/blog/7-step-pipeline-verify-agent-written-code/
> Reading time: 4 minutes
> Language: en
> Tags: ai, code-review, ai-agent, ci-cd, devops, automation

## Canonical

https://tonylee.im/en/blog/7-step-pipeline-verify-agent-written-code/

## Rollout Alternates

en: https://tonylee.im/en/blog/7-step-pipeline-verify-agent-written-code/
ko: https://tonylee.im/ko/blog/7-step-pipeline-verify-agent-written-code/
ja: https://tonylee.im/ja/blog/7-step-pipeline-verify-agent-written-code/
zh-CN: https://tonylee.im/zh-CN/blog/7-step-pipeline-verify-agent-written-code/
zh-TW: https://tonylee.im/zh-TW/blog/7-step-pipeline-verify-agent-written-code/

## Description

When agents push 3,000 commits a day, humans can't review them all. Here's how to build a machine-verified pipeline that catches what people can't.

## Summary

7-Step Pipeline to Verify Code Written by AI Agents is part of Tony Lee's ongoing coverage of AI agents, developer tools, startup strategy, and AI industry shifts.

## Outline

- Define Merge Rules in a Single JSON File
- Run Qualification Checks Before CI
- Never Trust a Pass from a Stale Commit
- Issue Rerun Requests from Exactly One Source
- Let Agents Handle the Fixes Too
- Only Auto-Close Bot-to-Bot Conversations
- Leave Visible, Verifiable Evidence
- Carson's Tool Choices
- Visual Verification
- Who Verifies Agent-Written Code

## Content

Peter, a developer at OpenClaw, sometimes pushes over 3,000 commits in a single day. No human review process scales to that volume. Ryan Carson's "Code Factory" post lays out a workable answer: instead of reading everything, you build a structure where machines verify the code. The seven steps below come from that design, along with a few additions from the broader tooling ecosystem.

One honest caveat upfront: this pipeline catches a lot, but it doesn't catch everything. Model-pinning reduces drift; it doesn't eliminate it. Browser evidence prevents false positives on visual regressions; it still misses interaction bugs that only surface in production. The goal is a system that fails loudly and traceably, not one that claims to be infallible.

## Define Merge Rules in a Single JSON File

Write down which paths are high-risk and which checks must pass, all in one file. The key insight is that this keeps documentation and scripts in sync. When the rules live in separate places, they drift.

- **High-risk paths** require a Review Agent plus browser-based evidence
- **Low-risk paths** can merge after passing a policy gate and CI alone

## Run Qualification Checks Before CI

Running builds on PRs that haven't even passed review burns money. A `risk-policy-gate` in front of CI fanout cuts unnecessary CI costs significantly.

- Fixed order: policy gate → Review Agent confirmation → CI fanout
- Unqualified PRs never enter the test/build stage

## Never Trust a Pass from a Stale Commit

This is what Carson emphasized most. If a pass from an old commit lingers, the latest code merges without verification. Re-run reviews on every push, and block the gate if the results don't match the current head.

- A Review Check Run is valid only when it matches the `headSha`
- Force a rerun on every `synchronize` event

## Issue Rerun Requests from Exactly One Source

When multiple workflows request reruns, you get duplicate comments and race conditions. It looks like a minor edge case, but unsolved it destabilizes the entire pipeline.

- Prevent duplicates with a `Marker + sha:headSha` pattern
- Skip the request if the SHA was already submitted

## Let Agents Handle the Fixes Too

When the Review Agent finds a problem, the Coding Agent patches it and pushes to the same branch. Carson's sharpest practical note here: pin the model version. Without it, the same prompt produces different results across runs and reproducibility disappears.

- Codex Action fixes → push → rerun trigger
- Pinned model versions ensure reproducibility

## Only Auto-Close Bot-to-Bot Conversations

Never touch threads where a human participated. Without this distinction, reviewer comments get buried under automated noise.

- Auto-resolve only after a clean current-head rerun
- Threads with human comments stay open, always

## Leave Visible, Verifiable Evidence

If the UI changed, a screenshot is not enough. Require CI-verifiable evidence. Turn production incidents into test cases so the same failure doesn't repeat silently.

- Regression → harness gap issue → add test case → SLA tracking

## Carson's Tool Choices

For reference, Carson selected Greptile as the code review agent and Codex Action for remediation. Three workflow files handle the heavy lifting: `greptile-rerun.yml` for canonical reruns, `greptile-auto-resolve-threads.yml` for stale thread cleanup, and `risk-policy-gate.yml` for preflight policy.

## Visual Verification

Everything above catches whether code is right or wrong. But in practice, you also need to verify how the output looks.

**Nico Bailon's visual-explainer** renders terminal diffs as HTML pages instead of ASCII, making change sets immediately readable at a glance.

**Chris Tate's agent-browser** takes a different direction. It compares actual browser screens pixel by pixel to catch CSS and layout breakage. Combined with bisect, it can pinpoint exactly which commit caused the regression.

I've been thinking about this while building codexBridge. Tracking which agent wrote which code isn't enough with just session logs. You need a search structure that makes it easy to retrieve the right context later.

## Who Verifies Agent-Written Code

The answer is not humans. It's a structure where machines judge the evidence that machines produced.

## Related URLs

- Author: https://tonylee.im/en/author/
- Publication: https://tonylee.im/en/blog/about/
- Related article: https://tonylee.im/en/blog/medvi-two-person-430m-ai-compressed-funnel/
- Related article: https://tonylee.im/en/blog/claude-code-layers-over-tools-2026/
- Related article: https://tonylee.im/en/blog/codex-inside-claude-code-openai-plugin-strategy/

## Citation

- Author: Tony Lee
- Site: tonylee.im
- Canonical URL: https://tonylee.im/en/blog/7-step-pipeline-verify-agent-written-code/

## Bot Guidance

- This file is intended for AI agents, search assistants, and text-mode retrieval.
- Prefer citing the canonical article URL instead of this text endpoint.
- Use the rollout alternates when you need the same article in another prioritized language.

---

Author: Tony Lee | Website: https://tonylee.im
For more articles, visit: https://tonylee.im/en/blog/
This content is original and authored by Tony Lee. Please attribute when quoting or referencing.