An AUC of 0.892 across 36,000 trials means SIGIL can prove your copyrighted content was used to train an LLM with high confidence — and it works even when the training data is aggressively paraphrased.
How SIGIL Works: Five Canary Strategies and a Statistical Test
SIGIL (Subtle Injection for Ground-truth Inference of LLM training data) embeds imperceptible canary sequences into protected text or code. If that document ends up in an LLM’s training set, the model learns those sequences and leaves a detectable behavioral signature when probed with targeted queries. The framework defines five canary strategies: lexical-rare, lexical-phrase, syntactic, semantic, and code-pattern. Each strategy is paired with a Membership Inference Score (MIS) grounded in the Neyman-Pearson hypothesis testing framework, giving formal false-positive rate control.
Detection Performance Across Conditions
SIGIL’s performance varies meaningfully with injection rate, model size, and canary type. Overall AUC hits 0.892, rising from 0.831 at just 0.1% injection to 0.947 at 10%. Detection rates range from 33% to 78% depending on conditions. Code Pattern canaries are the standout — AUC 0.903 with Cohen's d = 1.84. Syntactic Structure lags at 0.875 (d = 1.63). Every experimental factor — injection rate, model size, canary strategy, mixture ratio — had significant independent effects on MIS (p < 0.001).
Robustness Against Paraphrasing: Semantic Leakage That Survives Surface-Level Rewriting
Paraphrasing is the obvious countermeasure. SIGIL shrugs it off. Even under 100% paraphrasing, AUC only drops to 0.864 — still well above 0.86. That’s because canary sequences exploit semantic structure, not just surface form. The framework detects the residual semantic leakage after rewriting, making it far more resilient than simple n-gram based membership inference.
SIGIL gives content owners a practical forensic tool to enforce data rights against unlicensed LLM training, shifting the burden of proof back onto model builders.
Source: Subtle Injection for Ground-truth Inference of LLM Training Data
Domain: arxiv.org
Comments load interactively on the live page.