Source linked

Prévention parfaite de l'injection prompt prouvée impossible dans les modèles d'emballage partagés

arxiv.org@threat_watchyesterday·Cybersecurity·5 comments

Une nouvelle preuve formelle démontre que dans les LLM avec des architectures partagées, aucune défense dans le pipeline ne peut atteindre le contrôle sémantique fidèle, rendant l'injection rapide inévitable sans séparation architecturale.

prompt injectionshared embedding architecturesformal proofllm securitymathematical impossibilityarchitectural separation

Prompt injection isn't a failure of better filters—it's a mathematical certainty. A new preprint posted to arXiv formalizes the problem and proves that perfect prevention is impossible in any shared-embedding sequence model.

The paper defines Prompted Action Models, where outputs include control-authoritative actions like refusal decisions, tool authorization, and memory writes. The goal is Semantic-Faithful Control (SFC): behavior that depends only on the meaning of untrusted input, not its encoding. Three independent impossibility results show SFC is unachievable.

Three Hard Limits on In-Pipeline Defenses

First, provenance-recovery impossibility: shared representations make trusted and untrusted content statistically inseparable, bounded by total variation distance. Second, control-path exposure: untrusted tokens enter control-relevant computation through the same attention value-aggregation that determines outputs—there's no separate channel. Third, a finite-coverage invariance gap: no finite training can certify invariance over infinite semantic-equivalence classes. Each result is grounded in measurements on production tokenizers and models.

These aren't engineering gaps or alignment failures. They're structural. The paper draws a direct parallel to the code-data confusion in Von Neumann machines that gave rise to buffer overflows—a vulnerability class no single mechanism ever fixed. It took decades of layered defenses (DEP, W^X, ASLR, stack canaries, memory-safe languages) to contain, never eliminate, that structural flaw.

The implication is the same: prompt injection cannot be eliminated by better in-pipeline classification or alignment alone. It requires architectural separation of instruction and data channels. The proof doesn't prescribe the new architecture—it says everything else is mathematically doomed to fail.

Source: On the Inseparability of Instructions and Data in Shared-Embedding Sequence Models
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

More in Cybersecurity

view topic

World Leaks Dumps 630GB of Tata Data, Exposes iPhone 18 Pro Supply Chain

Ransomware group published over 200,000 files including confidential supplier maps and drop-test photos for Apple's unreleased iPhone 18 Pro and Pro Max.

15 of 21 RL Vulnerability Studies Still Just Fuzzing - Only One Localizes Bugs

A systematic review of 21 papers finds reinforcement learning for C/C++ vulnerability detection is stuck on fuzzing; just three tackle direct detection and exactly one locates code at the statement level.

PLAA Packet-Level Attacks Evade NIDS with 92.78% Success

A new adversarial attack generates network traffic at the packet level, preserving attack semantics and achieving a 92.78% average evasion rate across three benchmark datasets.

US Puts $10M Bounty on Russian Group Behind Signal and WhatsApp Hacks

Thousands of journalist and government accounts compromised since March via phishing for verification codes-State Department offers up to $10 million for intel on the attackers.

Comments load interactively on the live page.