# 7 個步驟構建驗證 AI Agent 代碼的流水線

> Author: Tony Lee
> Published: 2026-02-25
> URL: https://tonylee.im/zh-HK/blog/7-step-pipeline-verify-agent-written-code/
> Reading time: 2 minutes
> Language: zh-HK
> Tags: ai, code-review, ai-agent, ci-cd, devops, automation

## Canonical

https://tonylee.im/zh-HK/blog/7-step-pipeline-verify-agent-written-code/

## Rollout Alternates

en: https://tonylee.im/en/blog/7-step-pipeline-verify-agent-written-code/
ko: https://tonylee.im/ko/blog/7-step-pipeline-verify-agent-written-code/
ja: https://tonylee.im/ja/blog/7-step-pipeline-verify-agent-written-code/
zh-CN: https://tonylee.im/zh-CN/blog/7-step-pipeline-verify-agent-written-code/
zh-TW: https://tonylee.im/zh-TW/blog/7-step-pipeline-verify-agent-written-code/

## Description

當 agent 每日推送 3,000 個 commit，人類根本審查唔晒。呢篇文章教你建立一條由機器自動驗證的流水線，捕捉人類看漏的問題。

## Summary

7 個步驟構建驗證 AI Agent 代碼的流水線 is part of Tony Lee's ongoing coverage of AI agents, developer tools, startup strategy, and AI industry shifts.

## Outline

- 將合併規則寫入單一 JSON 文件
- 在 CI 之前先執行資格檢查
- 永遠不要信任舊 commit 的「通過」結果
- 只從單一來源發出重跑請求
- 讓 agent 同時負責修復
- 只自動關閉 bot 與 bot 之間的對話
- 留下清晰、可驗證的證據
- Carson 的工具選擇
- 超越正確性：視覺驗證
- 總結

## Content

呢個係目前最熱門的話題。Agent 每日生成數以百計的 commit，根本無人有能力逐一審查。

OpenClaw 的開發者 Peter，有時單日就會推送超過 3,000 個 commit。這已遠遠超出任何人工審查的極限，成為了一件人類獨力無法應付的任務。

起初我以為無解。直至讀到 Ryan Carson 的「Code Factory」，我才豁然開朗。與其嘗試逐一閱讀，不如建立一個由機器負責驗證代碼的架構。

## 將合併規則寫入單一 JSON 文件

將哪些路徑屬於高風險、哪些檢查必須通過，統一記錄在一個文件裡。關鍵在於避免文檔與腳本之間出現脫節。

- **高風險路徑** 需要 Review Agent 加上瀏覽器截圖作為佐證
- **低風險路徑** 通過策略閘門及 CI 後便可合併

## 在 CI 之前先執行資格檢查

對連審查都未過的 PR 跑 build，純粹是浪費金錢。在 CI fanout 之前加一個 `risk-policy-gate`，單單這一步已能大幅削減不必要的 CI 開支。

- 固定順序：策略閘門 → Review Agent 確認 → CI fanout
- 不合格的 PR 根本進不了測試／build 階段

## 永遠不要信任舊 commit 的「通過」結果

呢點係 Carson 最着重強調的。若舊 commit 的「通過」狀態殘留，最新代碼就會在未經驗證的情況下被合併。每次推送都要重新執行審查，結果不符就封鎖閘門。

- Review Check Run 只有在與 `headSha` 吻合時才算有效
- 每次 `synchronize` 事件都強制重新執行

## 只從單一來源發出重跑請求

當多個 workflow 同時發出重跑請求，就會出現重複評論和競態條件。看似微不足道，但若不處理，整條流水線都會出問題。

- 用 `Marker + sha:headSha` 模式防止重複
- 若該 SHA 已提交過，跳過請求

## 讓 agent 同時負責修復

當 Review Agent 發現問題，Coding Agent 便即時修補並推送到同一分支。Carson 文章中最犀利的洞見：鎖定模型版本。否則每次結果都不一樣，可重現性就蕩然無存。

- Codex Action 修復 → 推送 → 觸發重跑
- 鎖定模型版本確保可重現性

## 只自動關閉 bot 與 bot 之間的對話

千萬不要動有人類參與的討論串。沒有這個區分，審查者的評論就會被淹沒。

- 只有當前 head 重跑通過後才自動解決
- 有人類評論的討論串永遠保持開放

## 留下清晰、可驗證的證據

如果 UI 有改動，唔好只係截圖了事。要求提供 CI 可驗證的證據。將生產事故轉化為測試案例，確保同樣的故障不會重演。

- 回歸問題 → 找出 harness 缺口 → 新增測試案例 → SLA 追蹤

## Carson 的工具選擇

以下係 Carson 所採用的工具，供參考：代碼審查 agent 選用 Greptile，修復 agent 選用 Codex Action，三個 workflow 檔案各司其職 `greptile-rerun.yml` 負責標準重跑，`greptile-auto-resolve-threads.yml` 負責清理舊討論串，`risk-policy-gate.yml` 負責預檢策略。

## 超越正確性：視覺驗證

以上步驟解決的是代碼對不對的問題。但實際上，你還需要驗證輸出的外觀。

有兩種方法值得留意。

**Nico Bailon 的 visual-explainer** 將終端 diff 渲染成 HTML 頁面，而非 ASCII，讓變更集一眼即可閱讀。

**Chris Tate 的 agent-browser** 走另一條路。它逐像素對比實際瀏覽器畫面，捕捉 CSS 和排版問題。配合 bisect，可以精確定位到底哪個 commit 引入了回歸問題。

我在開發 codexBridge 的過程中一直在思考這個問題。光靠 session log 來追蹤哪個 agent 寫了哪段代碼是不夠的，你需要一個易於檢索的搜索架構。

## 總結

「誰來驗證 agent 寫的代碼？」這個問題的答案不是人類，而是一套由機器判斷機器所產生證據的架構。這就是答案。

## Related URLs

- Author: https://tonylee.im/en/author/
- Publication: https://tonylee.im/en/blog/about/
- Related article: https://tonylee.im/zh-HK/blog/medvi-two-person-430m-ai-compressed-funnel/
- Related article: https://tonylee.im/zh-HK/blog/claude-code-layers-over-tools-2026/
- Related article: https://tonylee.im/zh-HK/blog/codex-inside-claude-code-openai-plugin-strategy/

## Citation

- Author: Tony Lee
- Site: tonylee.im
- Canonical URL: https://tonylee.im/zh-HK/blog/7-step-pipeline-verify-agent-written-code/

## Bot Guidance

- This file is intended for AI agents, search assistants, and text-mode retrieval.
- Prefer citing the canonical article URL instead of this text endpoint.
- Use the rollout alternates when you need the same article in another prioritized language.

---

Author: Tony Lee | Website: https://tonylee.im
For more articles, visit: https://tonylee.im/zh-HK/blog/
This content is original and authored by Tony Lee. Please attribute when quoting or referencing.