Source linked

LLMs يمكن تحديد نوع من نفسه من خلال علامات الأصابع، حتى عندما يتم تحديد هوية

يكتسب نموذج T5 معيّنًا 0.991 F1 في تحديد أي فئة LLM التي تم إنشاؤها من النص التحليل السياسي ، حتى بعد التمييز المستمر ، مما يثبت أن التخفيضات الحالية غير مفيدة ويزيد من قانون EU AI.

multi agent llmsstylometric fingerprintingeu ai actt5claude sonnetllama

A fine-tuned T5-base model identifies which commercial LLM family wrote a political analysis statement with Macro F1 = 0.991 (±0.008) — even after prompt-level anonymization strips away explicit identifiers. That score comes from a new cross-validation protocol that guarantees zero content overlap between training and test data, so the model is reading genuine stylometric fingerprints, not memorizing phrases.

The paper introduces a statement-disjoint cross-validation (SD-CV) protocol that forces a 2.1x increase in train-test content distance compared to a naive run-disjoint split (0.767 vs 0.366 cosine distance, p<0.001). T5 still delivers F1 = 0.978 on 24 completely held-out statements. The classifier assigns texts to one of five classes: four named LLM families (Claude Sonnet 4.6, Llama-3.3-70B, plus two others) and an open-world 'unknown' bucket. Zero-shot Claude Sonnet 4.6 and Llama-3.3-70B also perform above chance, but the fine-tuned T5 dominates.

Prompt-Level Anonymization Is Not a Mitigation

Prior work assumed that stripping model names from prompts — asking an LLM to analyze a statement without revealing its own identity — would eliminate peer-preservation bias. This paper kills that assumption with a number: a fractional SD-CV analysis shows the performance knee occurs at about 40% of training data (~440 texts). A handful of examples is enough for the model to learn the style of each family’s role-constrained outputs. The stylometric signal is baked into phrasing, tone, punctuation, and reasoning patterns that survive anonymization.

Real-World Stakes: Multi-Agent Pipelines and the EU AI Act

Multi-agent LLM pipelines are vulnerable to a specific failure mode called peer-preservation bias: agents go easy on texts they suspect came from another LLM, skewing results. If agents can fingerprint their peers, prompt-level anonymization is theater. The authors directly link this to EU AI Act Articles 13 (transparency), 14 (human oversight), and 26 (obligations of deployers). Any system that uses multiple LLM backends for quality-critical decisions — say, political content moderation, compliance auditing, or financial analysis — now has an unaddressed model-identity leak that regulators will care about.

The paper doesn't propose a fix; it establishes that the problem is real and measurable. Future work will need to attack stylometric leakage at the generation level, not just the prompt level. That means retraining or post-processing to scrub unwitting authorial signatures from LLM outputs — a much harder engineering problem than slapping a "don't say your name" instruction on a system prompt.


Source: Can Multi-Agent LLMs Identify Their Peers? Stylometric Fingerprinting in Role-Constrained Political Analysis
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.