7-Step Pipeline to Verify Code Written by AI Agents
When agents push 3,000 commits a day, humans can't review them all. Here's how to build a machine-verified pipeline that catches what people can't.
Thoughts on software engineering, product design, and AI.
When agents push 3,000 commits a day, humans can't review them all. Here's how to build a machine-verified pipeline that catches what people can't.
When an agent repeats the same failing API call, code review won't help. Traces are the new source code for debugging AI agents.
New benchmark data shows AGENTS.md and CLAUDE.md context files actually hurt coding agent performance. Sometimes laziness is the best engineering decision.
Citrini Research's 2028 macro scenario makes a chilling case: the more AI succeeds, the worse the economy gets. An analysis of Ghost GDP, the Intelligence Displacement Spiral, and why optimism itself may be the risk.
Three companies updated their coding agents at the same time. The directions overlap. The real battleground isn't models; it's how fast they absorb developer workflows.
From the SaaSpocalypse to model-specific silicon, five bold predictions for where AI is heading in 2026, with roughly 50% confidence of getting them right.
My API costs jumped 10x when the cache broke in production. The same day, Anthropic engineers explained exactly why.
Google Research validated it across 7 models and 7 benchmarks. No training, no prompt engineering. Just copy-paste. I tested it and here's what actually happened.
What LangChain's Terminal Bench results and the hashline format experiment revealed. The same model flipped leaderboard rankings, and the reasons came down to three things: prompts, tools, and middleware.
From Cloudflare and Vercel's Markdown for Agents to Google's WebMCP, reading and writing are being standardized simultaneously, ushering in the Agent-Native Web era.
Have a project in mind or just want to chat? I'd love to hear from you.
I'm always open to a conversation.