Why LLM Prompt Injection Defenses Break Real Tasks - And How SecFid Measures It

96.5% fidelity at 47.8% security — and 99.3% security at only 71% fidelity. That’s the tradeoff frontier the authors of "Security–Fidelity Tradeoffs" (arXiv:2606.30783) mapped across 1,168 examples and 48 model–defense configurations. Attack-success metrics alone can’t see this, because a model that ignores an injection and one that faithfully processes it as data score identically. Fidelity is the hidden cost.

SecFid Makes Fidelity Measurable

The paper introduces SecFid, a benchmark designed so that executing an injection, processing it as data, and ignoring it produce distinguishable outputs. For tasks like translation and document editing, suppressing untrusted text to resist indirect prompt injection corrupts the very job the model is supposed to do. The highest-fidelity model (no specific name given in the abstract) reaches 96.5% fidelity but only 47.8% security — it faithfully keeps your text intact but leaves the door open to hijacks. Flip the configuration, and the most secure defenses hit 99.3% security while fidelity drops to 71.0%–73.9%. That’s a quarter of useful content thrown away to stop a handful of attacks.

No Free Lunch: Decision-Theoretic Frontier

Even defenses with identical security scores differ in how they earn it. Some repair hijacked outputs into faithful processing; others simply suppress benign content. That distinction matters, and standard reporting hides it. The authors’ decision-theoretic analysis shows why no fixed choice can be right everywhere: the correct behavior is not a property of the defense but of the deployment, determined by the relative cost of a hijack versus a dropped span. Security alone measures only half of robustness. Reporting it without fidelity hides the price at which it was bought.

Deploying an LLM in a high-stakes pipeline? Stop chasing a single security number. You need to know what fidelity you’re trading for it, and that choice — hijack cost vs. drop cost — belongs to your threat model, not the defense vendor.

Source: Security--Fidelity Tradeoffs: The Hidden Cost of Prompt Injection Defense
Domain: arxiv.org

Why LLM Prompt Injection Defenses Break Real Tasks - And How SecFid Measures It

SecFid Makes Fidelity Measurable

No Free Lunch: Decision-Theoretic Frontier

More in Artificial Intelligence