Source linked

Five reasons frontier AI pricing is about to collapse

aditya.patadia.org@bold_lion2 hours ago·Artificial Intelligence·2 comments

GPT 5.5 costs $5/$30 per million tokens, but open-weight GLM-5.2 beats it at 1/10th the price. Here's why the gap won't last.

aditya patadiagpt 55glm 52open routercerebraslarge language models

$54. That's what it cost Aditya Patadia to fix TypeScript types across 50 files using GPT 5.5, the most expensive model on OpenRouter at $5 per million input tokens and $30 per million output. A single afternoon of auto-complete drained more budget than a small team's monthly AWS bill.

The frontier premium is already cracking

Uber burned through its entire year's AI budget in four months. Microsoft, Salesforce, and GitHub are actively throttling employee AI spend. Meanwhile, open-weight models like GLM-5.2 beat both GPT 5.5 and Claude Opus 4.8 on coding benchmarks, and they cost one-tenth as much per token. That's not a future trend; that's a price list today.

Patadia runs through five structural reasons why frontier AI pricing can't hold. Performance improvements are shrinking with each release - Claude Opus 4.8 costs the same as its predecessor because there's no leap to charge for. Training data is basically exhausted; labs have scraped the entire public internet and print archive. Without a new breakthrough, the performance-per-price curve flattens.

Open weights kill the research tax

Frontier labs are charging for inference, yes, but also for hundreds of millions in training runs, employee salaries, and marketing. Open-weight model hosts skip all that. Once a model like GLM-5.2 is released, any inference provider slaps a small markup on raw compute costs. That gap - 10x in this case - is not a bug. It's the economics of commoditisation.

Specialised silicon is accelerating the drop. Cerebras, Groq, and Google are building AI-native chips. A TPU can be 30-70% cheaper than an Nvidia H100 for inference. Once the architecture is proven, mass production drops the per-token cost again. Model architecture improvements like MoE and caching are piling on, making each token cheaper without sacrificing quality.

Zero switching costs are a pricing bomb

Traditional software (Salesforce, Figma, Office) took months to swap. AI models switch in seconds via gateways like OpenRouter.ai, and you can program them to change providers on every request. That zero friction means the moment a cheaper model matches an expensive one on your task, you route. No contracts, no data migration, no retraining.

Patadia's boldest bet runs on local models. He predicts that within 4-5 years, newer chips and crashing RAM prices will put capable models on every laptop and phone. Operating systems will bundle a local model interface, and apps will connect to it for code completion, proofreading, and fact-checking. Cloud models will handle only the hardest tasks. The $20 or $200 monthly subscription? Gone.

Whether his timeline holds doesn't change the signal: the gap between frontier pricing and real compute cost is widening, and the structural forces pushing it open are not speculative - they're shipping today.


Source: Why current LLM costs are not sustainable
Domain: aditya.patadia.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.