# 我翻查了 300 條 Agent 失敗記錄，問題從來都不是 Prompt。

> Author: Tony Lee
> Published: 2026-02-26
> URL: https://tonylee.im/zh-HK/blog/context-engineering-agent-skills-10k-github-stars/
> Reading time: 1 minutes
> Language: zh-HK
> Tags: ai, ai-agents, context-engineering, open-source, multi-agent, evaluation

## Canonical

https://tonylee.im/zh-HK/blog/context-engineering-agent-skills-10k-github-stars/

## Rollout Alternates

en: https://tonylee.im/en/blog/context-engineering-agent-skills-10k-github-stars/
ko: https://tonylee.im/ko/blog/context-engineering-agent-skills-10k-github-stars/
ja: https://tonylee.im/ja/blog/context-engineering-agent-skills-10k-github-stars/
zh-CN: https://tonylee.im/zh-CN/blog/context-engineering-agent-skills-10k-github-stars/
zh-TW: https://tonylee.im/zh-TW/blog/context-engineering-agent-skills-10k-github-stars/

## Description

一個開源的 context engineering skillset 剛突破 10,000 個 GitHub stars。實際用到自己的 agent stack 之後，我終於明白 agents 為何會失敗。

## Summary

我翻查了 300 條 Agent 失敗記錄，問題從來都不是 Prompt。 is part of Tony Lee's ongoing coverage of AI agents, developer tools, startup strategy, and AI industry shifts.

## Outline

- 較小的 context window 反而更準確
- Tool descriptions 決定了 agent 80% 的表現
- Multi-agent 系統需要先有架構，再有 agents
- Vector search 單獨應付不了記憶體管理
- Evaluation 是最被低估的 agent 技能

## Content

三百條 agent 失敗記錄。我花了兩個星期逐一翻查，按根本原因分類。結果出乎意料：prompt 問題大概只佔 12%，其餘的問題全都出在 context，不是被污染、就是溢出，要不然就是完全缺失。換模型沒用，換工具也沒用，每次都是同一個規律。

我研究 context engineering 已有一段時間，所以當一個叫 [Agent Skills for Context Engineering](https://github.com/muratcankoylan/Agent-Skills-for-Context-Engineering) 的開源項目突然冒出來，而且迅速突破 10,000 個 GitHub stars，我立即留意到了。這個項目採用 MIT 授權，由一位叫 Muratcan Koylan 的 context engineer 開發，還被北京大學一個 AI 實驗室的論文引用。正是最後這一點讓我決定把它 clone 下來細看。

## 較小的 context window 反而更準確

我一直以為把更多 tokens 塞進 context 只會有好處，結果我錯了。這個 skillset 的第一個原則就是「信息密度，而非信息量」。

隨著 context 愈來愈長，模型會漸漸失去對中間內容的追蹤。這就是 U 形曲線效應：模型對開頭和結尾的處理較好，但中間的部分卻會略過。我自己做過測試，把 context 填滿至 128K tokens，再把同樣的信息壓縮到 32K。壓縮版本在準確度上的得分反而更高。

處理成本並非隨 token 數量線性增長，而是呈指數級上升。把 context 削減一半，回應延遲縮短了 40 至 60%。即使有 prefix caching，長輸入依然昂貴。一句話總結：重要的是你如何在給定的 token 預算內塞入最多有用的信息。

## Tool descriptions 決定了 agent 80% 的表現

Prompt 寫得再完美，如果 tool descriptions 草率，agent 照樣會選錯工具。這個 skillset 有一個很到位的描述：「Tools 是 LLM 閱讀的合約，不是給人看的。」我的團隊在搭建 MCP server 時，按照這份指南重寫了 tool descriptions，工具選擇失誤明顯下降。

每個 tool description 都需要說明何時使用，以及會回傳什麼。當兩個工具功能重疊，人類已經會混淆，agent 只會更混亂。一個功能全面的工具通常勝過幾個功能單一的工具。另外，錯誤訊息要告訴 agent 下一步該怎麼做，而不只是說明出了什麼問題。

## Multi-agent 系統需要先有架構，再有 agents

以為啟動多個 agents 它們就會自動協作，這只是一廂情願。這個 repo 清楚定義了三種模式：由 orchestrator 指揮下屬 agents、agents 以對等方式互相通訊的 peer-to-peer 模型，以及層級化的委派鏈。

在生產環境中三種都試過之後，orchestrator 模式最可預測，也最容易除錯。下屬 agents 透過檔案系統傳遞結果。Peer-to-peer 模型在創意類任務上效果較好，但有陷入無限循環的風險。對於結構化查詢，共享檔案比 vector search 更實際。實際使用下來，我發現三個 agents 是穩定性的上限。

## Vector search 單獨應付不了記憶體管理

Vector search 很容易找到「客戶 X 在某日期購買了產品 Y」。但它無法回答「購買產品 Y 的客戶還買了什麼？」關聯性信息在 embeddings 中會流失。

這個 skillset 提出了一個四層記憶架構：context window 內的 working memory、session 內的短期記憶、跨 session 的長期記憶，以及作為存檔的永久記憶。在我測試過的方案中，「用檔案系統作記憶體」這個模式最實用。你用 `ls` 和 `grep` 來導航 context，而非靠 embedding 查詢。把工具結果倒進一個暫存檔案，省下了相當可觀的 context window 空間。

## Evaluation 是最被低估的 agent 技能

這個章節我差點跳過，結果卻是最有價值的一部分。Repo 內附一個 TypeScript evaluation framework，以 LLMs 作為評判。它甚至能自動生成評分準則。

最令我印象深刻的是位置偏差的消除機制。在並排比較兩個回應時，framework 會調換順序評估兩次，以抵消傾向於給排在前面的答案打高分的偏差。它同時支援直接評分和配對比較。建立 evaluation pipeline 之後，我終於可以實際衡量 prompt 的改動是否真的提升了表現，而不是靠猜。

有一點這個 repo 沒有解決：evaluation rubrics 仍然需要人手校準。自動生成的 rubrics 提供了合理的起點，但我需要針對自己的應用領域調整評分權重，結果才變得可信。

當你的 agent 出錯，先查 context，再去怪模型。[Repo 在這裡](https://github.com/muratcankoylan/Agent-Skills-for-Context-Engineering)。

## Related URLs

- Author: https://tonylee.im/en/author/
- Publication: https://tonylee.im/en/blog/about/
- Related article: https://tonylee.im/zh-HK/blog/eight-hooks-that-guarantee-ai-agent-reliability/
- Related article: https://tonylee.im/zh-HK/blog/medvi-two-person-430m-ai-compressed-funnel/
- Related article: https://tonylee.im/zh-HK/blog/claude-code-layers-over-tools-2026/

## Citation

- Author: Tony Lee
- Site: tonylee.im
- Canonical URL: https://tonylee.im/zh-HK/blog/context-engineering-agent-skills-10k-github-stars/

## Bot Guidance

- This file is intended for AI agents, search assistants, and text-mode retrieval.
- Prefer citing the canonical article URL instead of this text endpoint.
- Use the rollout alternates when you need the same article in another prioritized language.

---

Author: Tony Lee | Website: https://tonylee.im
For more articles, visit: https://tonylee.im/zh-HK/blog/
This content is original and authored by Tony Lee. Please attribute when quoting or referencing.