The Token Divide: How AI Pricing Creates New Economic Inequality
Opus 4.6 Fast mode costs $150/output tokens. This isn't just pricing, it's the birth of a new economic divide where token access determines competitive advantage.
When Anthropic released Opus 4.6 Fast mode pricing, I double-checked the numbers. Input at $30, output at $150 per million tokens. For the first time, a single AI model costs more per token than a senior software engineer’s hourly rate.
The pricing story is real enough, but the more durable question is what it means for who can build what.
The 6x Price Multiplier for the Same Intelligence
Opus 4.6 standard mode costs $5 input, $25 output. Toggle Fast mode and you pay 6x more for identical model capabilities. Claude Code team lead Boris Cherny called it “a massive breakthrough for tackling difficult back-and-forth conversations.” That description is accurate. The speed improvement genuinely changes what’s practical in tight iteration loops.
But you’re not buying better reasoning. You’re buying faster iteration cycles, which compounds into substantially higher productivity for teams that can sustain the cost. Teams that can’t face a real choice: stay on slower cadences or constrain usage to stay within budget. Neither option is free.
The 50x Gap Between Best and Cheapest
I subscribe to five AI services simultaneously. The price spectrum has widened more than I expected even a year ago.
Current output token pricing across the services I use: GPT-4.5 at $14, Gemini 3 Pro at $12, Kimi-K2.5 at $3, GLM-4.7 at roughly $1.50, and Opus 4.6 Fast at $150. That’s a 100x spread between premium and commodity, with a 50x gap between Opus 4.6 Fast and Kimi-K2.5 alone.
Price gaps at this scale don’t just separate tiers. They separate what’s computationally practical. A team running Opus 4.6 Fast at sustained volume is operating in a different problem space than a team capped to commodity models, not because the commodity models are useless, but because the cost of iteration at scale changes which approaches are viable.
Token Accessibility and Compounding Productivity
One formula I keep returning to: tokens consumed per hour, weighted by reasoning quality, approximates high-difficulty task throughput. OpenClaw demonstrated this. An AI system running 24/7 without human intervention can sustain output that would require multiple engineers, but at token volumes that would exceed most startups’ monthly AI budgets in days.
The compounding effect is real. One hour of expensive token usage on a hard problem can clear blockers that take much longer with cheaper models. Over weeks and months, that gap accumulates. The uncertainty is whether the productivity differential actually justifies the cost at your scale, or whether you’re paying for speed you don’t need. For most workloads I’ve tested, the cheaper models are good enough more often than the pricing gap implies.
The Economic Reality Contradicts Government Strategy
The U.S. government is betting on AI-driven productivity as an economic counterweight to debt, inflation, and stagnant wage growth. The models that could actually support broad access, those with reasonable pricing and strong general capability, exist. GPT-5.3-Codex and several alternatives sit in a range that smaller teams can afford.
The problem is that the highest-leverage models, the ones producing the results that get cited in productivity studies, are the expensive ones. Affordability has become a genuine policy-relevant question. Ray Dalio has acknowledged job displacement publicly; unemployment in the US, Europe, and South Korea is moving in directions that make the distributional effects of AI productivity harder to ignore. The pricing structure of frontier models doesn’t help.
Finding the Best Cost-Solution Fit
The practical competitive edge right now is knowing which model fits which problem. That means not defaulting to the most expensive option, matching tool to task, optimizing token spend deliberately, and treating cost constraints as design inputs rather than obstacles.
Knowing when Opus matters, when Gemini suffices, and when a smaller model is the right call takes calibration. That calibration is worth building. Teams that develop it are paying significantly less for comparable outputs than teams that default to the frontier model for everything. Whether that skill remains the primary edge as pricing evolves is genuinely unclear; model costs have moved in both directions over the past two years.
What to Watch
Token stratification will likely deepen before it narrows. Cheaper and faster commodity options will continue improving. Frontier models will push into capability ranges that currently require human experts. New business models will emerge around finding exploitable gaps in the pricing spectrum.
The concrete near-term reality: access to expensive tokens at scale is now a form of capital investment with compounding returns, and the gap between what well-funded teams can build and what budget-constrained teams can build is wider than it was eighteen months ago. Whether that gap closes depends on pricing decisions made by a small number of companies, which is its own kind of concentration risk.
Join the newsletter
Get updates on my latest projects, articles, and experiments with AI and web development.