What is the significance of: Why 80% of AI Workloads Will Switch to Cheap Models Within 18 Months?

Coinbase co-founder Brian Armstrong predicts 80% of AI workloads will run on models 99% cheaper in 12-18 months, as Harvey's test shows 3x inference cost reduction without quality loss.

Why 80% of AI Workloads Will Switch to Cheap Models Within 18 Months

80% of AI workloads will shift to models 99% cheaper within 18 months, and the big labs' business models are about to get squeezed. Coinbase co-founder Brian Armstrong laid it out plainly on X: demand for intelligence is near infinite, but the vast majority of tasks don't need frontier models. If he's right, the economics of OpenAI and Anthropic, both heading for IPOs, take a direct hit.

The 80% Prediction That Changes Everything

Armstrong's forecast—80% of workloads running on 99% cheaper models in 12-18 months—is a direct challenge to the scaling-first dogma that has driven AI spending. For years, companies defaulted to the most advanced model because investors subsidized the cost. Now token prices are rising, subsidies are slowing, and clients are feeling real cost pressure for the first time.

Harvey, the legal AI startup, already proved the concept. In a test with Fireworks AI, Harvey combined Claude Opus and GLM 5.1, routing only the most intensive tasks to Opus. Result: 3x reduction in inference costs with no quality degradation. Harvey co-founder Gabe Pereyra said quality will always come first in legal, but the definition of quality is evolving from "biggest model" to "right model for the job."

Large vs Small, Not Closed vs Open

This isn't about proprietary models versus open-weight Chinese models. The real divide is between large and small. You can save money switching from GPT-5.5 to DeepSeek's V4 Flash, but GPT-5.4-mini works just as well. An active price war between in-house inference from big labs and independently served open-weight models is underway, but for the core question—smaller models taking over—it doesn't matter which small model wins.

The bitter lesson pushed labs to train the most compute-intensive models possible, pushing frontiers. That strategy worked when investors footed the bill. With cost pressure now hitting users, the reflexive "use the biggest" habit is breaking. If most deployments run fine on smaller models, demand for inference flattens, and justifying the cost of training a frontier model becomes much harder.

Armstrong's timeline is aggressive, but Harvey's numbers suggest the shift is already underway. The question isn't whether companies will switch—it's whether the big labs can survive losing 80% of their volume to models that cost 1% as much to run.

Source: Can tech companies learn to love cheaper AI models?
Domain: techcrunch.com

Why 80% of AI Workloads Will Switch to Cheap Models Within 18 Months

The 80% Prediction That Changes Everything

Large vs Small, Not Closed vs Open

More in Artificial Intelligence