31 AI Coding Agent Terms You Should Know, Sorted Into Five Pillars
I classified every term I kept encountering while using Claude Code and Codex daily. Five groups emerged, and they map the entire system these tools run on.
Every week a new term shows up in my feed. Context engineering. Harness engineering. RLM. Progressive disclosure. I use AI coding agents daily, and the vocabulary was growing faster than my understanding of it.
So I stopped and sorted all 31 terms I had collected into groups. Five pillars emerged, and once I saw them, the entire architecture of tools like Claude Code and Codex made sense in a way it hadn’t before.
The five pillars follow a logical sequence: you design what the agent sees, you divide the work across agents, you control how they execute, you help them remember across sessions, and you connect them to the outside world.
Design. Divide. Control. Remember. Connect.
Designing What the Agent Sees
An AI model processes exactly one thing: its context window. Every system prompt, user instruction, attached file, conversation history entry, memory block, and loaded skill gets concatenated into a single stream of tokens. That stream is the model’s entire universe. AGENTS.md, the file that many teams use to configure agent behavior, is just another piece of that stream.
Prompt is the direct instruction you give the model. Prompt engineering is the practice of designing those instructions, including examples and output formats, to get reliable results. These two terms are well established, but they only cover a fraction of what actually enters the model.
Context is everything the model can reference: system prompts, conversation history, attached files, memory, skills, and tool outputs combined. Context engineering is the discipline of deciding what goes in, what stays out, and in what order. The difference matters. I have seen identical prompts produce wildly different results depending on whether a 2,000-line file was placed before or after the instruction. Order is not cosmetic.
Intent is the user’s actual goal, which may differ from what they literally type. When you write “fix the tests,” the intent might be “make CI green” or “refactor the test suite to match the new API.” Agent routing starts here, and getting intent wrong cascades through everything downstream.
Skill is a reusable bundle of expert instructions that loads into context when invoked. Think of it as a function for prompts. Instead of pasting the same 200-line instruction every time you want a specific behavior, you call /refactor-clean and the skill’s content enters the context window on demand.
Progressive disclosure is the design pattern where you do not load all skills into context at once. Instead, the agent loads only the skill it needs at the moment. Anthropic published this approach in their skills blog post. It matters because context window space is finite. Loading 40 skills upfront burns tokens before the model even starts working. Progressive disclosure keeps the window lean and the model focused.
The failure mode I hit repeatedly early on: stuffing too much into context and wondering why the model’s output quality degraded. The 200K context window is a theoretical maximum. In practice, once you account for system prompts, MCP server definitions, and conversation history, usable space can drop to 70K or less. Context engineering is about respecting that constraint.
Dividing Work Across Agents
A single agent handling everything sounds simple until the context window fills up and output quality drops. This is why multi-agent architectures exist.
A subagent is a child process that a main agent delegates work to. The main agent keeps its own context clean by offloading specialized tasks. In Claude Code, when you launch a background research task, that is a subagent operating in its own context window and returning only the result.
A swarm is a pattern where multiple agents work in parallel on different parts of the same problem. If you need to analyze five files simultaneously, a swarm lets five agents each handle one file instead of one agent processing them sequentially.
Fleet is the operational view of your running agents. It is a management term, not an architecture term. When you have three subagents and two background agents active, that collection is your fleet.
Handoff is the transfer of work from one agent (or person) to another. In sequential workflows, Agent A completes its phase and hands off to Agent B. The important detail is what gets transferred: just the output, or the full context? Most handoffs transfer a summary, which means information loss is possible and should be accounted for.
A background agent runs asynchronously without user interaction. GitHub’s Copilot Workspace and Anthropic’s Claude Code both support this pattern. You describe a task, close your laptop, and the agent works independently. The results appear when you return.
The trap I fell into: splitting work across too many agents too early. A single agent with well-designed context handles 80% of tasks better than a poorly coordinated multi-agent setup. Split only when you have evidence that a single agent is hitting context limits or quality degradation.
Controlling How Agents Execute
An agent that generates correct code is useless if it also silently calls dangerous tools or modifies files it should not touch. Control is the third pillar, and it is the one most teams underinvest in.
Harness is the operational frame that wraps an agent’s execution, verification, and lifecycle. It includes everything from permission checks to output validation to retry logic. Harness engineering is designing the constraints and feedback loops within that frame. OpenAI brought this term into the mainstream when they published how Codex generated over a million lines of code with structured harness patterns.
Trace is the execution log of every step and decision an agent made. I started taking traces seriously after discovering that an agent was calling a web search tool 14 times per task when it only needed the information once. Without the trace, I would have assumed the agent was working efficiently. Traces are the closest thing to debugging for AI agents.
Diff is the comparison of code before and after an agent’s changes. Together with traces, diffs form the verification backbone. You cannot review what you cannot see, and diffs make agent changes reviewable in the same way pull requests make human changes reviewable.
Guardrails are rules and validation checks that prevent dangerous outputs. They can be as simple as “never execute shell commands containing rm -rf” or as sophisticated as content classifiers that block sensitive data from appearing in outputs.
A sandbox is an isolated execution environment with restricted permissions. Codex runs inside a Docker sandbox where the agent can write code and run tests but cannot access the network or modify the host system. This is the difference between “the agent made a mistake” and “the agent made a mistake that affected production.”
CLI (command-line interface) is having a resurgence in the agent era. Running tools through a terminal turns out to be more token-efficient than routing through protocol layers. When every token costs money and consumes context space, the directness of CLI matters.
REPL (read-eval-print loop) is an interactive environment for executing code immediately. Agents use REPLs to test hypotheses, validate intermediate results, and iterate on solutions without writing files to disk first.
Remembering Across Sessions
Large language models have a hard boundary: the context window. When it fills up, older content gets evicted. For tasks that span hours or days, this creates a real problem.
Memory is any system that stores conversation history and task state beyond a single context window. Memory hierarchy organizes these stores into layers, typically short-term (current conversation), medium-term (recent sessions), and long-term (persistent knowledge). The design parallels CPU cache hierarchies for the same reason: different access patterns need different storage strategies.
Embeddings convert text into numerical vectors that capture semantic meaning. They are the foundation of RAG (retrieval-augmented generation), where an agent searches a vector database to pull relevant information into its context window. When your agent “remembers” something from a previous session, it is usually performing an embedding-based similarity search.
A long-running agent maintains state across multiple context windows, working on tasks that take longer than a single session. This requires external state management because the model itself has no persistent memory.
The Ralph Loop, created by Geoffrey Huntley, is an autonomous coding loop that solves the memory problem pragmatically. Each iteration starts a fresh agent instance, but progress is persisted through git commits and progress files. The new instance reads the git history and progress notes to understand what has been done, then continues from there. It maximizes test-time scaling by iterating repeatedly, with each loop benefiting from the accumulated context in the repository itself.
RLM (Recursive Language Model) takes a fundamentally different approach. Instead of feeding a long input directly into the model (where it would exceed the context window), RLM stores the original data in REPL variables and lets the model write code to explore it. The model issues targeted queries against the stored data through recursive function calls. Because the original data never enters the context window, information is never lost to truncation. The authors claim this approach handles inputs equivalent to 100x the normal context window.
Both approaches acknowledge the same constraint but solve it differently. Ralph Loop works with the context window’s limitations by using external persistence. RLM works around the context window entirely by keeping data outside it. Neither is universally better; the right choice depends on whether your bottleneck is task continuity (Ralph Loop) or input size (RLM).
Connecting Agents to the Outside World
An agent that cannot reach external tools, APIs, or services is limited to text generation. Protocols solve the integration problem.
MCP (Model Context Protocol) standardizes how models connect to external tools. Without MCP, integrating N models with M tools requires N x M custom implementations. With MCP, each model and each tool implements the protocol once, reducing the integration cost to N + M. This is the same principle that made USB successful: agree on one interface, and everything connects.
ACP (Agent Communication Protocol) standardizes communication between editors and coding agents. Zed and JetBrains are leading its development. The problem it solves is similar to MCP but at a different layer: instead of model-to-tool, it is editor-to-agent.
LSP (Language Server Protocol) is the established standard for editor-to-code-analysis-server communication. It is the original proof that protocol standardization works in developer tools. A reference search that took grep 30 seconds completes in 50ms through LSP. Token usage drops from 2,000+ to around 500 because LSP returns structured, precise results instead of raw file contents. LSP is also the reference model for ACP’s design, which makes sense: the problem shapes are nearly identical.
These three protocols operate at different layers but share the same architectural insight. Custom point-to-point integrations do not scale. Standard interfaces do.
The Map, Not the Territory
Most of these terms did not exist six months ago. If they feel unfamiliar, that is expected. The vocabulary is growing because the field is growing, and new concepts need names.
The value of these five pillars is not in memorizing definitions. It is in having a mental framework that tells you where a new term fits the moment you encounter it. When someone mentions “agent memory,” you know it belongs to the fourth pillar. When a new protocol launches, you know it is in the fifth. The framework absorbs new vocabulary without breaking.
I still look up terms regularly. The difference is that now I know which shelf they belong on.
Join the newsletter
Get updates on my latest projects, articles, and experiments with AI and web development.