Source linked

Claude Sonnet 5 Scores 53 on Intelligence Index, Tops All 226 Models

artificialanalysis.ai@systems_wire2 hours ago·Artificial Intelligence·1 comments

Anthropic's latest Claude Sonnet 5 achieves the highest Artificial Analysis Intelligence Index score at 53, offers a 1M-token context window and $0 per 1M tokens pricing, and ranks #1 for both intelligence and input...

anthropicclaude sonnet 5artificial analysisbenchmarkslarge language modelsreasoning

Claude Sonnet 5 just dropped at #1 on the Artificial Analysis Intelligence Index with a score of 53—smack above the category average of 8. That's 226 models in the comparison set, and Anthropic's latest reasoning variant leads them all on intelligence.

Free Pricing and a 1M-Context Window

Input price? $0.00 per 1M tokens. Output price? Also $0.00. I don't know how Anthropic is making that work, but the benchmarked model is listed as free on API pricing. Context window stretches to 1 million tokens—roughly 1500 A4 pages of 12-point Arial. The model accepts text and image inputs, outputs text, and explicitly advertises reasoning capabilities.

What the Benchmark Actually Measures

The Intelligence Index v4.1 runs nine evaluations: GDPval-AA v2, τ³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, and AA-LCR. That covers agentic tool use, coding, terminal work, scientific reasoning, long-context reasoning, and hallucination rate. Claude Sonnet 5 didn't just win—it generated 300 million output tokens during the benchmark, placing it at #17 out of 226 for verbosity. The average model spat out 37M tokens; Sonnet 5 is talking 8x more per task.

Why This Matters for the Field

Third-party benchmarks don't lie—Sonnet 5 is the new intelligence leader by a wide margin, and it's free. That combination will pressure every other provider to either drop prices or push scores. Speed data is still listed as N/A, so we don't know how fast it runs. But if the throughput holds up, Anthropic just made every other reasoning model look overpriced.

Claude Sonnet 5 resets the bar for what a top-tier reasoning model can cost—$0 for the best score in the industry changes the economics of AI inference overnight.

Source: Claude Sonnet 5 - benchmark results
Domain: artificialanalysis.ai

Read original source ->

External source stays available while the OJO article and comment thread stay local.

More in Artificial Intelligence

view topic

Claude Science Autonomously Found Drug Candidates for Phenylketonuria

Anthropic's new flagship product autonomously identifies drug candidates for rare genetic diseases, positioning the company as a serious challenger to DeepMind's decade-long dominance in AI for science.

AI Agents Flunk Enterprise Java Migration with Under 10% Success Rate

IBM Research's ScarfBench benchmark reveals even the best coding agents can't reliably migrate enterprise Java apps across Spring, Jakarta EE, and Quarkus-and agents are wildly overconfident about their results.

Etched Books $1B in Orders for Custom Inference Clusters, Hits $5B Valuation

Etched, a two-year-old startup, has already secured $1 billion in contract orders for its TSMC-built 'frontier inference clusters,' signaling a shift from general-purpose GPUs to specialized chips for AI inference.

TTT-Discover: Open 120B Model Beats Human Experts for $500 per Problem

TTT-Discover uses test-time reinforcement learning to achieve state-of-the-art discoveries in mathematics, GPU kernels, algorithms, and biology, all with an open model and for a few hundred dollars per problem.

Comments load interactively on the live page.