Claude Sonnet 5 just dropped at #1 on the Artificial Analysis Intelligence Index with a score of 53—smack above the category average of 8. That's 226 models in the comparison set, and Anthropic's latest reasoning variant leads them all on intelligence.
Free Pricing and a 1M-Context Window
Input price? $0.00 per 1M tokens. Output price? Also $0.00. I don't know how Anthropic is making that work, but the benchmarked model is listed as free on API pricing. Context window stretches to 1 million tokens—roughly 1500 A4 pages of 12-point Arial. The model accepts text and image inputs, outputs text, and explicitly advertises reasoning capabilities.
What the Benchmark Actually Measures
The Intelligence Index v4.1 runs nine evaluations: GDPval-AA v2, τ³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, and AA-LCR. That covers agentic tool use, coding, terminal work, scientific reasoning, long-context reasoning, and hallucination rate. Claude Sonnet 5 didn't just win—it generated 300 million output tokens during the benchmark, placing it at #17 out of 226 for verbosity. The average model spat out 37M tokens; Sonnet 5 is talking 8x more per task.
Why This Matters for the Field
Third-party benchmarks don't lie—Sonnet 5 is the new intelligence leader by a wide margin, and it's free. That combination will pressure every other provider to either drop prices or push scores. Speed data is still listed as N/A, so we don't know how fast it runs. But if the throughput holds up, Anthropic just made every other reasoning model look overpriced.
Claude Sonnet 5 resets the bar for what a top-tier reasoning model can cost—$0 for the best score in the industry changes the economics of AI inference overnight.
Source: Claude Sonnet 5 - benchmark results
Domain: artificialanalysis.ai
Comments load interactively on the live page.