# 7 步流水线：验证 AI 代理编写的代码

> Author: Tony Lee
> Published: 2026-02-25
> URL: https://tonylee.im/zh-CN/blog/7-step-pipeline-verify-agent-written-code/
> Reading time: 1 minutes
> Language: zh-CN
> Tags: ai, code-review, ai-agent, ci-cd, devops, automation

## Canonical

https://tonylee.im/zh-CN/blog/7-step-pipeline-verify-agent-written-code/

## Rollout Alternates

en: https://tonylee.im/en/blog/7-step-pipeline-verify-agent-written-code/
ko: https://tonylee.im/ko/blog/7-step-pipeline-verify-agent-written-code/
ja: https://tonylee.im/ja/blog/7-step-pipeline-verify-agent-written-code/
zh-CN: https://tonylee.im/zh-CN/blog/7-step-pipeline-verify-agent-written-code/
zh-TW: https://tonylee.im/zh-TW/blog/7-step-pipeline-verify-agent-written-code/

## Description

当代理每天推送 3000 次提交，人工根本审查不过来。这里介绍如何构建一套由机器验证的流水线，捕捉人类遗漏的问题。

## Summary

7 步流水线：验证 AI 代理编写的代码 is part of Tony Lee's ongoing coverage of AI agents, developer tools, startup strategy, and AI industry shifts.

## Outline

- 将合并规则写入单一 JSON 文件
- 在 CI 之前先跑资格检查
- 永远不要信任旧提交的"通过"状态
- 重新运行请求只从唯一来源发出
- 让代理来处理修复
- 只自动关闭机器人之间的对话
- 留下可见、可验证的证据
- Carson 的工具选择
- 正确性之外：视觉验证
- 结语

## Content

这是目前最热门的话题。代理每天产出数百次提交，没有人能全部审查。

OpenClaw 的开发者 Peter，有时一天推送超过 3000 次提交。这远超任何人能处理的上限，已经成为一项人类单靠自己根本无法完成的任务。

一开始我以为无解。后来读了 Ryan Carson 的"Code Factory"，思路豁然开朗：与其试图阅读所有代码，不如构建一套由机器来验证代码的结构。

## 将合并规则写入单一 JSON 文件

把哪些路径属于高风险、哪些检查必须通过，全部写在一个文件里。关键在于，这样可以防止文档与脚本之间出现偏差。

- **高风险路径**需要 Review Agent 加上浏览器截图作为佐证
- **低风险路径**通过策略门控和 CI 后即可合并

## 在 CI 之前先跑资格检查

对那些连审查都没通过的 PR 跑构建，纯属烧钱。在 CI 扇出之前加一个 `risk-policy-gate`，仅此一项就能大幅削减不必要的 CI 成本。

- 固定顺序：策略门控 → Review Agent 确认 → CI 扇出
- 不合格的 PR 根本进不了测试/构建阶段

## 永远不要信任旧提交的"通过"状态

这是 Carson 最着重强调的一点。如果旧提交的"通过"状态残留，最新代码就会在没有验证的情况下被合并。每次推送都重新触发审查，若 SHA 不匹配则阻断门控。

- Review Check Run 仅在与 `headSha` 匹配时有效
- 每次 `synchronize` 事件都强制重新执行

## 重新运行请求只从唯一来源发出

当多个工作流同时请求重跑时，会产生重复评论和竞态条件。看似小问题，但不解决，整条流水线都会抖动。

- 用 `Marker + sha:headSha` 模式防止重复
- 若该 SHA 已提交过，则跳过请求

## 让代理来处理修复

当 Review Agent 发现问题时，Coding Agent 自动打补丁并推送到同一分支。Carson 文章中最犀利的洞察：锁定模型版本。否则每次结果不一致，可重现性就没了。

- Codex Action 修复 → 推送 → 触发重跑
- 锁定模型版本确保可重现性

## 只自动关闭机器人之间的对话

绝不碰有人工参与的会话线程。没有这个区分，审查者的评论就会被淹没。

- 仅在当前 head 重跑通过后才自动解决
- 有人工评论的线程始终保持开放

## 留下可见、可验证的证据

如果 UI 发生了变化，光截图不够。要求 CI 可验证的证据。把生产故障转化为测试用例，确保同样的失败不再重演。

- 回归 → 找到测试套件缺口 → 补充测试用例 → SLA 跟踪

## Carson 的工具选择

以下是 Carson 选用的工具，供参考：代码审查代理选用 Greptile，修复代理选用 Codex Action，三个工作流文件各司其职：`greptile-rerun.yml` 负责标准重跑，`greptile-auto-resolve-threads.yml` 负责过期线程清理，`risk-policy-gate.yml` 负责预检策略。

## 正确性之外：视觉验证

以上步骤都是在检查代码逻辑对不对。但实际上，还需要验证输出呈现的样子。

有两种方案值得关注。

**Nico Bailon 的视觉说明器**将终端 diff 渲染成 HTML 页面，而非 ASCII 字符，让变更集一眼就能看明白。

**Chris Tate 的代理浏览器**走的是另一条路。它逐像素对比真实浏览器截图，捕捉 CSS 和布局的异常。结合 bisect，可以精确定位到底是哪次提交引入了回归。

我在构建 codexBridge 的过程中一直在思考这个问题。仅靠会话日志来追踪哪个代理写了哪段代码是不够的，需要一套便于检索的搜索结构。

## 结语

"谁来验证代理写的代码"这个问题，答案不是人类，而是一套让机器来判断机器所产出的证据的结构。这就是答案。

## Related URLs

- Author: https://tonylee.im/zh-CN/author/
- Publication: https://tonylee.im/zh-CN/blog/about/
- Related article: https://tonylee.im/zh-CN/blog/medvi-two-person-430m-ai-compressed-funnel/
- Related article: https://tonylee.im/zh-CN/blog/claude-code-layers-over-tools-2026/
- Related article: https://tonylee.im/zh-CN/blog/codex-inside-claude-code-openai-plugin-strategy/

## Citation

- Author: Tony Lee
- Site: tonylee.im
- Canonical URL: https://tonylee.im/zh-CN/blog/7-step-pipeline-verify-agent-written-code/

## Bot Guidance

- This file is intended for AI agents, search assistants, and text-mode retrieval.
- Prefer citing the canonical article URL instead of this text endpoint.
- Use the rollout alternates when you need the same article in another prioritized language.

---

Author: Tony Lee | Website: https://tonylee.im
For more articles, visit: https://tonylee.im/zh-CN/blog/
This content is original and authored by Tony Lee. Please attribute when quoting or referencing.