Source linked

Prévention parfaite de l'injection prompt prouvée impossible dans les modèles d'emballage partagés

Une nouvelle preuve formelle démontre que dans les LLM avec des architectures partagées, aucune défense dans le pipeline ne peut atteindre le contrôle sémantique fidèle, rendant l'injection rapide inévitable sans séparation architecturale.

prompt injectionshared embedding architecturesformal proofllm securitymathematical impossibilityarchitectural separation

Prompt injection isn't a failure of better filters—it's a mathematical certainty. A new preprint posted to arXiv formalizes the problem and proves that perfect prevention is impossible in any shared-embedding sequence model.

The paper defines Prompted Action Models, where outputs include control-authoritative actions like refusal decisions, tool authorization, and memory writes. The goal is Semantic-Faithful Control (SFC): behavior that depends only on the meaning of untrusted input, not its encoding. Three independent impossibility results show SFC is unachievable.

Three Hard Limits on In-Pipeline Defenses

First, provenance-recovery impossibility: shared representations make trusted and untrusted content statistically inseparable, bounded by total variation distance. Second, control-path exposure: untrusted tokens enter control-relevant computation through the same attention value-aggregation that determines outputs—there's no separate channel. Third, a finite-coverage invariance gap: no finite training can certify invariance over infinite semantic-equivalence classes. Each result is grounded in measurements on production tokenizers and models.

These aren't engineering gaps or alignment failures. They're structural. The paper draws a direct parallel to the code-data confusion in Von Neumann machines that gave rise to buffer overflows—a vulnerability class no single mechanism ever fixed. It took decades of layered defenses (DEP, W^X, ASLR, stack canaries, memory-safe languages) to contain, never eliminate, that structural flaw.

The implication is the same: prompt injection cannot be eliminated by better in-pipeline classification or alignment alone. It requires architectural separation of instruction and data channels. The proof doesn't prescribe the new architecture—it says everything else is mathematically doomed to fail.


Source: On the Inseparability of Instructions and Data in Shared-Embedding Sequence Models
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.