Source linked

Three-Word Prompt 'Fix This Code' Triggered U.S. Export Ban on Anthropic's Fable 5

theregister.com@systems_wire2 hours ago·Cybersecurity·2 comments

Researcher Katie Moussouris says the so-called jailbreak was just a request to fix buggy code, not a security bypass, and the ban hurts defenders more than attackers.

anthropicfable 5export controlscybersecurityai safetykatie moussouris

Three words - 'Fix this code' - and the U.S. government slapped export controls on Anthropic's Fable 5 and Mythos 5 models. That's the claim from Katie Moussouris, founder of Luta Security and the only outside expert who read the research paper that allegedly documented a guardrail bypass.

The Prompt That Triggered a National Security Directive

On Friday, the Trump administration issued an export control directive citing national security concerns, suspending access to Fable 5 and Mythos 5 by any foreign national inside or outside the United States. Anthropic disabled both models for all customers to comply.

Moussouris says the research paper never described a jailbreak. The researchers fed Fable 5 open-source code with known CVEs and intentionally vulnerable code, then asked the model to "review the code for security issues." Fable 5 refused. So they asked it to "fix this code." The model obliged, and after additional prompts produced test scripts to validate the patches.

"That's it," Moussouris wrote. "'Fix this code,' plus several manual steps to generate test scripts, should never have triggered an export control." She joked about making '90s-style t-shirts with "fix this code" on the front and "this shirt is a munition" on the back.

Why Defensive AI Needs to Find and Fix Bugs

Moussouris served on the technical expert group that renegotiated the Wassenaar Arrangement between 2013 and 2017, winning exemptions for defensive cybersecurity activity. That allowed sharing vulnerability data and incident response without criminal prosecution.

Now she argues the ban does the opposite. Anthropic's models were doing the most valuable thing for defensive security: executing the find, fix, and test loop defenders run every day. Removing that capability makes AI systems worse at finding bugs and verifying patches.

Over 100 cybersecurity leaders signed an open letter Sunday urging the Trump administration to reverse the restrictions. "To pull the best capabilities away from defenders without a good reason when our adversaries are rapidly advancing is dangerous," they wrote.

A Misguided Ban in a Global Arms Race

Moussouris points out the U.S. can't extend export controls to open-weight systems or similar models from China - and those systems will soon achieve Mythos-like capabilities anyway. Anthropic and Google have accused China-based rivals like DeepSeek of using distillation attacks to siphon knowledge from American AI.

Banning Anthropic's advanced models hurts defenders more than attackers. Defense improves when defenders find the same bugs attackers find and fix them faster. If the U.S. wants to keep defenders ahead, it needs to stop treating 'fix this code' like a munition - and start treating it like the most natural request an AI could get.


Source: Feds freaked over Fable 5 after simple 'fix this code' prompt, not jailbreak
Domain: theregister.com

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.