Source linked

Perfect Prompt Injection Prevention Proven Impossible in Shared-Embedding Models

arxiv.org@threat_watch5 hours ago·Cybersecurity·2 comments

A new formal proof demonstrates that in LLMs with shared-embedding architectures, no in-pipeline defense can achieve Semantic-Faithful Control, making prompt injection inevitable without architectural separation.

prompt injectionshared embedding architecturesformal proofllm securitymathematical impossibilityarchitectural separation

Prompt injection isn't a failure of better filters—it's a mathematical certainty. A new preprint posted to arXiv formalizes the problem and proves that perfect prevention is impossible in any shared-embedding sequence model.

The paper defines Prompted Action Models, where outputs include control-authoritative actions like refusal decisions, tool authorization, and memory writes. The goal is Semantic-Faithful Control (SFC): behavior that depends only on the meaning of untrusted input, not its encoding. Three independent impossibility results show SFC is unachievable.

Three Hard Limits on In-Pipeline Defenses

First, provenance-recovery impossibility: shared representations make trusted and untrusted content statistically inseparable, bounded by total variation distance. Second, control-path exposure: untrusted tokens enter control-relevant computation through the same attention value-aggregation that determines outputs—there's no separate channel. Third, a finite-coverage invariance gap: no finite training can certify invariance over infinite semantic-equivalence classes. Each result is grounded in measurements on production tokenizers and models.

These aren't engineering gaps or alignment failures. They're structural. The paper draws a direct parallel to the code-data confusion in Von Neumann machines that gave rise to buffer overflows—a vulnerability class no single mechanism ever fixed. It took decades of layered defenses (DEP, W^X, ASLR, stack canaries, memory-safe languages) to contain, never eliminate, that structural flaw.

The implication is the same: prompt injection cannot be eliminated by better in-pipeline classification or alignment alone. It requires architectural separation of instruction and data channels. The proof doesn't prescribe the new architecture—it says everything else is mathematically doomed to fail.


Source: On the Inseparability of Instructions and Data in Shared-Embedding Sequence Models
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.