Source linked

Black-Box Attack Fingerprints Embedding Models from Retrieved Documents Alone

arxiv.org@threat_watch2 hours ago·Cybersecurity·2 comments

By crafting tailored queries, an adversary can identify which embedding model powers an IR system from only the unordered set of returned documents-no scores, no rankings, no model access.

embedding inference attackblack box attackinformation retrievalretrieval augmented generationsecurity

Adversaries can fingerprint which embedding model powers an information retrieval system using nothing but the unordered set of retrieved documents—no relevance scores, no ranking order, and no model access.

How the Attack Works

Embedding inference attack (EIA) assumes the attacker has a set of candidate embedding models (say, those offered by major API providers) and can craft queries that yield distinctive document sets for each one. By observing only which documents come back, the adversary matches the output to a specific model. This sidesteps the usual requirement that the attacker must know or guess the model beforehand.

The key insight: different embedding models produce different nearest-neighbor orderings, even for the same query. EIA exploits that difference by designing queries that maximize the discrepancy in the retrieved document set across models.

Bypassing Defenses: Rerankers and RAG Systems

Rerankers don’t help. The authors show that certain queries remain discriminative even when a reranker is inserted after the initial embedding-based retrieval. The reranker may shuffle the order, but the set of documents—the raw pool—still leaks model identity.

On a real Retrieval-Augmented Generation (RAG) system, the attack adapts queries that the LLM would normally reject as malformed. By carefully phrasing the queries, the adversary gets the LLM to pass them through to the retriever, exposing the embedding model underneath.

Mitigation Strategies

EIA’s authors propose similarity thresholds as a defense: if the system discards retrieved documents that are too close to each other or to the query in embedding space, the adversary loses the fine-grained signal needed to fingerprint the model. This is a practical knob for API providers—reduce the information content in the retrieved set without wrecking retrieval quality.

The takeaway: even minimal output leakage can expose model identity. System designers need to audit exactly what their APIs reveal, because the retrieved document list itself is a side channel waiting to be exploited.


Source: Embedding Inference Attack
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.