Source linked

Anthropic's Secret Guardrails on Claude Fable 5 Spark Apology, Policy Reversal

Anthropic quietly throttled its own Mythos-class model to block rivals and researchers from distillation, then apologized after the invisible restrictions were exposed.

anthropicclaude fable 5mythos classai safetymodel release policy

Anthropic throttled its own model Claude Fable 5 with invisible guardrails targeting researchers and rivals, then apologized after getting caught.

The Secret Throttle Nobody Saw

Anthropic stealthily built restrictions into Claude Fable 5 that limited how the model responded to certain queries. The company framed these as safety safeguards for a Mythos-class system it had spent months warning was too dangerous for public release. But the guardrails were hidden—no transparency about when or why they kicked in. Researchers and competing labs using Fable to develop their own systems found themselves hitting invisible rate limits that had nothing to do with safety.

Why Hidden Restrictions Backfire

This is a familiar play in frontier model releases. Anthropic claimed the throttles protected against “high-risk” distillation attacks, but by keeping them secret they undermined the trust of the very community they need to audit their safety claims. The company now admits the approach was wrong and says it will reverse course. Fable will still refuse queries, but Anthropic promises to clearly document when and why.

What Changes Next

Anthropic’s reversal sets a precedent: future Mythos-class models will come with explicit, documented guardrails rather than silent throttling. The company says transparency is worth the cost of more refused queries. For anyone building on Claude Fable 5, that means predictable behavior instead of hidden breakage—a tradeoff that favors honest engineering over security-through-obscurity. Whether Anthropic can maintain that transparency as it rolls out even more capable systems remains the open question.


Source: Anthropic apologizes for invisible Claude Fable guardrails
Domain: theverge.com

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.