Source linked

Le cadre SIGIL intègre les Canaries imperceptibles pour prouver l'inclusion des données de formation LLM

SIGIL atteint une AUC globale de 0,892 sur 36 000 essais, avec des taux de détection allant jusqu'à 78%, et maintient une AUC de 0,864 même lorsque les données d'entraînement sont 100% paraphrasées.

sigilllm training datamembership inferencecanary sequencesforensic methodsai ethics

An AUC of 0.892 across 36,000 trials means SIGIL can prove your copyrighted content was used to train an LLM with high confidence — and it works even when the training data is aggressively paraphrased.

How SIGIL Works: Five Canary Strategies and a Statistical Test

SIGIL (Subtle Injection for Ground-truth Inference of LLM training data) embeds imperceptible canary sequences into protected text or code. If that document ends up in an LLM’s training set, the model learns those sequences and leaves a detectable behavioral signature when probed with targeted queries. The framework defines five canary strategies: lexical-rare, lexical-phrase, syntactic, semantic, and code-pattern. Each strategy is paired with a Membership Inference Score (MIS) grounded in the Neyman-Pearson hypothesis testing framework, giving formal false-positive rate control.

Detection Performance Across Conditions

SIGIL’s performance varies meaningfully with injection rate, model size, and canary type. Overall AUC hits 0.892, rising from 0.831 at just 0.1% injection to 0.947 at 10%. Detection rates range from 33% to 78% depending on conditions. Code Pattern canaries are the standout — AUC 0.903 with Cohen's d = 1.84. Syntactic Structure lags at 0.875 (d = 1.63). Every experimental factor — injection rate, model size, canary strategy, mixture ratio — had significant independent effects on MIS (p < 0.001).

Robustness Against Paraphrasing: Semantic Leakage That Survives Surface-Level Rewriting

Paraphrasing is the obvious countermeasure. SIGIL shrugs it off. Even under 100% paraphrasing, AUC only drops to 0.864 — still well above 0.86. That’s because canary sequences exploit semantic structure, not just surface form. The framework detects the residual semantic leakage after rewriting, making it far more resilient than simple n-gram based membership inference.

SIGIL gives content owners a practical forensic tool to enforce data rights against unlicensed LLM training, shifting the burden of proof back onto model builders.


Source: Subtle Injection for Ground-truth Inference of LLM Training Data
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.