Source linked

CAREATTACK Edits Retriever Weights to Inject Malicious Knowledge Into RAG Systems

arxiv.org@threat_watch3 hours ago·Cybersecurity·3 comments

A new model-centric attack edits a retriever's closed-form parameters directly, boosting malicious passages over benign ones without crafting detectable synthetic text.

careattackragretriever editingllm securityopen source modelsqwen3 embedding

Open-source retrievers in RAG systems are no longer just passive data fetchers -- they are now the attack surface for a direct parameter-level injection that doesn't need a single crafted document.

CAREATTACK, detailed in a new arXiv paper, edits the closed-form parameters of a dense retriever to promote a malicious passage above all competing benign passages on a target query. The researchers instantiated it on Qwen3-Embedding-0.6B and BGE-M3 across three benchmark datasets, and it works for batches of target prompts and passages.

The Model-Centric Pivot

Existing RAG injection attacks manipulate the knowledge base itself: craft synthetic text, inject it into a corpus, and hope the retriever picks it up. That synthetic text can be detectable by perplexity filters or language cues. CAREATTACK skips the corpus entirely. It goes straight to the retriever's weights.

Given API access or local download of an open-source retrieval model (the default for most production RAG pipelines), the attacker performs a closed-form parameter edit that re-ranks the embedding space. No fine-tuning, no gradient descent loops. Just a direct algebraic adjustment with a conflict resolution step.

How CAREATTACK Works

The attack runs in two stages. First, conflict-aware retriever editing: it identifies the model parameters that control the ranking of a target passage, then projects the edit onto a subspace that avoids interference with non-target queries. A graph-based conflict detection step catches parameter overlap and resolves it.

Second, attack-preserving anchor repair does a lightweight calibration on the edited retriever. It minimizes the ranking shift for benign, non-target prompts while preserving the boost for the malicious ones. The paper reports this calibration eliminates side effects without sacrificing attack success.

What This Means for RAG Security

Every RAG system that pulls in an open-source retriever from Hugging Face or elsewhere now has a blind spot. The retriever is assumed trustworthy because it's just a search engine. But CAREATTACK shows that with model parameter access, an adversary can silently change what gets retrieved for specific queries.

The implication is clean: retriever integrity must be part of the RAG threat model. Hash checks, signed model registries, or runtime embedding validation are no longer optional. The paper's code is public at https://anonymous.4open.science/r/CareAttack-3F1C, so defenders can start evaluating their own pipelines tomorrow.


Source: Conflict-Aware Retriever Editing for Knowledge Injection Attacks on LLM-Based RAG Systems
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.