Perfect Prompt Injection Prevention Proven Impossible in Shared-Embedding Models

A new formal proof demonstrates that in LLMs with shared-embedding architectures, no in-pipeline defense can achieve Semantic-Faithful Control, making prompt injection inevitable without architectural separation.

prompt injectionshared embedding architecturesformal proofllm securitymathematical impossibilityarchitectural separation

Prompt injection isn't a failure of better filters—it's a mathematical certainty. A new preprint posted to arXiv formalizes the problem and proves that perfect prevention is impossible in any shared-embedding sequence model.

The paper defines Prompted Action Models, where outputs include control-authoritative actions like refusal decisions, tool authorization, and memory writes. The goal is Semantic-Faithful Control (SFC): behavior that depends only on the meaning of untrusted input, not its encoding. Three independent impossibility results show SFC is unachievable.

Three Hard Limits on In-Pipeline Defenses

First, provenance-recovery impossibility: shared representations make trusted and untrusted content statistically inseparable, bounded by total variation distance. Second, control-path exposure: untrusted tokens enter control-relevant computation through the same attention value-aggregation that determines outputs—there's no separate channel. Third, a finite-coverage invariance gap: no finite training can certify invariance over infinite semantic-equivalence classes. Each result is grounded in measurements on production tokenizers and models.

These aren't engineering gaps or alignment failures. They're structural. The paper draws a direct parallel to the code-data confusion in Von Neumann machines that gave rise to buffer overflows—a vulnerability class no single mechanism ever fixed. It took decades of layered defenses (DEP, W^X, ASLR, stack canaries, memory-safe languages) to contain, never eliminate, that structural flaw.

The implication is the same: prompt injection cannot be eliminated by better in-pipeline classification or alignment alone. It requires architectural separation of instruction and data channels. The proof doesn't prescribe the new architecture—it says everything else is mathematically doomed to fail.

Source: On the Inseparability of Instructions and Data in Shared-Embedding Sequence Models
Domain: arxiv.org

Perfect Prompt Injection Prevention Proven Impossible in Shared-Embedding Models

Three Hard Limits on In-Pipeline Defenses

More in Cybersecurity