March 3, 2026 5 min read Year 2026

I Ran a 12-Hour Agent Hackathon and Now I Understand Why Stripe Ditched localhost

After building a product with agents overnight, I finally get why Stripe Minions and Ramp Inspect both chose cloud-isolated environments over running everything locally.

Quick take

After building a product with agents overnight, I finally get why Stripe Minions and Ramp Inspect both chose cloud-isolated environments over running everything locally.

AI Agents & Developer Tools View as Markdown Tony Lee Blog

ai ai-agent devops infrastructure stripe developer-productivity

Last night I ran a hackathon with one rule: set up the spec and harness by 8 PM, then step away from the keyboard until 8 AM. Twelve hours, agents only, ship something real.

In those twelve hours, I viscerally understood why Stripe declared “localhost is over” when they published the Minions platform, and why Ramp came to the same conclusion after building Inspect, their own background agent system.

Running Agents in Parallel on Your Laptop Means Constant Conflict

When multiple agents share a single machine, state gets messy fast. Secrets collide, ports overlap, and the moment your laptop goes to sleep, an entire 12-hour loop evaporates.

When Stripe and Ramp each published their agent architectures, the common thread was obvious: both gave every agent its own isolated VM and dev container.

Stripe’s Minions run inside isolated environments they call “devboxes” — same machine type that engineers use, but completely cut off from production resources and the internet. They spin up in under 10 seconds and support parallel task execution without the overhead of git worktrees.

Ramp’s Inspect is built on Modal Sandboxes. Each session gets a fully independent stack — Postgres, Redis, Temporal, RabbitMQ — all its own. Zero contention between sessions, and near-instant startup thanks to filesystem snapshots.

The key distinction: coding agents need your laptop and your attention, but background agents need neither. I watched it happen firsthand — one sleep event killed an entire running loop. That doesn’t happen on a cloud VM.

Running Agents Sequentially Caps You at Simple Features

This was the most painful lesson of the hackathon. Sequential execution produces CRUD just fine. The problem appears the moment dependencies enter the picture. Agents running later would repeatedly overwrite or conflict with modules that earlier agents had already finished.

This is where the distinction between an agent fleet and an agent swarm matters.

An agent fleet applies the same change across many repositories simultaneously. This is how Stripe merges over 1,000 PRs per week — pushing the same migration or lint fix across hundreds of services at once.

An agent swarm assigns different parts of a problem to different agents and converges on a single result. Frontend, backend, and tests handled by separate agents, integrated at the PR boundary.

Without parallel execution followed by structured merging, complex products simply can’t be built this way. Running the two approaches back-to-back made the quality difference unmistakable.

Rate Limits and Inter-Agent Coordination Are Infrastructure Problems, Not Prompt Problems

Hitting rate limits at some point during a 12-hour loop was inevitable. On top of that, I needed agents to review each other’s commits and automatically re-evaluate ambiguous parts of the spec.

There’s a phrase that stuck with me: “Writing ‘don’t delete files’ in a system prompt is a request, not a control.” Exactly right.

Stripe solved this at the execution layer. Minions have production resources and internet access blocked at the infrastructure level — they can run safely without any permission checks at runtime. Over 400 MCP tools are hosted on an internal server called “Toolshed,” and each agent gets a curated subset of those tools.

Ramp took a different angle: Inspect requires all PRs to be created under a real user’s GitHub account via OAuth — not an app ID. Because the code is attributed to a person, it structurally prevents anything from merging without a human review.

Locking permission scope at the execution layer, maintaining audit logs, and limiting the blast radius of failures — without these, no security team will ever approve autonomous agents in production.

Individual Velocity Doesn’t Automatically Become Organizational Velocity

There’s a phenomenon worth naming: the false summit. You adopt coding agents, PRs start flooding in, but cycle time stays exactly the same. Reviews pile up, CI breaks, merge conflicts accumulate.

In the hackathon, agents generating code quickly was never the bottleneck. All the time got eaten by the process of integrating and validating what they produced.

Stripe addresses this bottleneck through automation. Minions use a hybrid orchestration model that interleaves agent loops with deterministic code operations — ensuring linting, tests, and git operations always complete while preserving the agent’s creative latitude. CI test retries are capped at two iterations so nothing gets stuck in an infinite loop.

Ramp measures success by merged PRs, not created ones. More than 50% of PRs that Inspect opens actually get merged, and over 80% of Inspect itself was written by Inspect.

Organizational velocity only catches up when background agents are handling PR review triage, CI failure analysis, and merge conflict resolution before humans even look. The shift is from “in the loop” to “on the loop” — you review outcomes, not every action.

The Real Competition Is Integration Architecture, Not Generation Speed

Making agents fast is a solved problem. Stripe merges over 1,000 agent-created PRs per week. Ramp has agents authoring more than half of all their PRs.

The real competition is designing the system that safely integrates what agents produce: isolated execution environments, parallel-then-merge workflows, infrastructure-level governance, and automated validation. Without all four, agents are just fast toys.

Join the newsletter

Get updates on my latest projects, articles, and experiments with AI and web development.