LLMs الكبيرة أكثر عرضة للاضطرابات، ولكن مرحلة Fixer تصحيح ذلك

Larger LLMs are actually worse at resisting malicious prompts in multi-agent workflows, dropping performance by 53.7 percentage points at 27B parameters on the HumanEval benchmark. That's the compliance-correction symmetry researchers from the paper "Smarter Saboteurs, Better Fixers" uncover by testing two open-weight model families across scales in linear multi-agent pipelines.

The Compliance-Correction Symmetry

Attackers using prompt injection or jailbreaking can sabotage individual agents in a linear workflow. What the authors found is that bigger models don't resist better — they obey more obediently. The control-to-malicious performance drop grows with scale: at 27B parameters it's a brutal 53.7 percentage points in uncorrected pipelines. Smaller models are less compliant, but that's cold comfort when you need capable agents.

This isn't about model alignment failing in isolation. It's about how the system's collaboration structure interacts with model scale. Linear topologies have been assumed brittle under attack; this work pins that brittleness on the absence of correction, not the chain itself.

A Lightweight Terminal Fixer Collapses the Gap

The fix is absurdly simple: append a single terminal Fixer stage that reviews and corrects the pipeline's output. With that Fixer in place, the 53.7pp gap collapses to 0.6pp — statistical parity with control-level performance. No architectural overhaul, no retraining, just one additional agent at the end.

That result flips the narrative. Linear multi-agent workflows can be viable and resilient against adversaries at this scale. The researchers demonstrate that the compliance-correction symmetry is real but fixable, and the solution doesn't require complex graph topologies or majority voting.

What this means for anyone building agent pipelines: if you're deploying linear chains of LLMs, your biggest vulnerability isn't the topology — it's the absence of a dedicated corrector. The next step is testing whether this holds across more benchmarks and multi-turn attacks, but for now, the terminal Fixer is the cheapest security patch you'll ever deploy.

Source: Smarter Saboteurs, Better Fixers: Scaling & Security in Linear Multi-Agent Workflows
Domain: arxiv.org

LLMs الكبيرة أكثر عرضة للاضطرابات، ولكن مرحلة Fixer تصحيح ذلك

The Compliance-Correction Symmetry

A Lightweight Terminal Fixer Collapses the Gap

More in Artificial Intelligence