Source linked

Рамка SIGIL встраивает незаметные Канарские острова, чтобы доказать включение данных обучения LLM

SIGIL достигает общей AUC 0,892 на протяжении 36 000 испытаний, со скоростью обнаружения до 78%, и поддерживает 0,864 AUC даже при 100% парафразе данных обучения.

sigilllm training datamembership inferencecanary sequencesforensic methodsai ethics

An AUC of 0.892 across 36,000 trials means SIGIL can prove your copyrighted content was used to train an LLM with high confidence — and it works even when the training data is aggressively paraphrased.

How SIGIL Works: Five Canary Strategies and a Statistical Test

SIGIL (Subtle Injection for Ground-truth Inference of LLM training data) embeds imperceptible canary sequences into protected text or code. If that document ends up in an LLM’s training set, the model learns those sequences and leaves a detectable behavioral signature when probed with targeted queries. The framework defines five canary strategies: lexical-rare, lexical-phrase, syntactic, semantic, and code-pattern. Each strategy is paired with a Membership Inference Score (MIS) grounded in the Neyman-Pearson hypothesis testing framework, giving formal false-positive rate control.

Detection Performance Across Conditions

SIGIL’s performance varies meaningfully with injection rate, model size, and canary type. Overall AUC hits 0.892, rising from 0.831 at just 0.1% injection to 0.947 at 10%. Detection rates range from 33% to 78% depending on conditions. Code Pattern canaries are the standout — AUC 0.903 with Cohen's d = 1.84. Syntactic Structure lags at 0.875 (d = 1.63). Every experimental factor — injection rate, model size, canary strategy, mixture ratio — had significant independent effects on MIS (p < 0.001).

Robustness Against Paraphrasing: Semantic Leakage That Survives Surface-Level Rewriting

Paraphrasing is the obvious countermeasure. SIGIL shrugs it off. Even under 100% paraphrasing, AUC only drops to 0.864 — still well above 0.86. That’s because canary sequences exploit semantic structure, not just surface form. The framework detects the residual semantic leakage after rewriting, making it far more resilient than simple n-gram based membership inference.

SIGIL gives content owners a practical forensic tool to enforce data rights against unlicensed LLM training, shifting the burden of proof back onto model builders.


Source: Subtle Injection for Ground-truth Inference of LLM Training Data
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.