SIGIL Framework Embeds Imperceptible Canaries to Prove LLM Training Data Inclusion

Q: What is the significance of: SIGIL Framework Embeds Imperceptible Canaries to Prove LLM Training Data Inclusion?

SIGIL achieves an overall AUC of 0.892 across 36,000 trials, with detection rates up to 78%, and maintains 0.864 AUC even when training data is 100% paraphrased.

An AUC of 0.892 across 36,000 trials means SIGIL can prove your copyrighted content was used to train an LLM with high confidence — and it works even when the training data is aggressively paraphrased.

How SIGIL Works: Five Canary Strategies and a Statistical Test

SIGIL (Subtle Injection for Ground-truth Inference of LLM training data) embeds imperceptible canary sequences into protected text or code. If that document ends up in an LLM’s training set, the model learns those sequences and leaves a detectable behavioral signature when probed with targeted queries. The framework defines five canary strategies: lexical-rare, lexical-phrase, syntactic, semantic, and code-pattern. Each strategy is paired with a Membership Inference Score (MIS) grounded in the Neyman-Pearson hypothesis testing framework, giving formal false-positive rate control.

Detection Performance Across Conditions

SIGIL’s performance varies meaningfully with injection rate, model size, and canary type. Overall AUC hits 0.892, rising from 0.831 at just 0.1% injection to 0.947 at 10%. Detection rates range from 33% to 78% depending on conditions. Code Pattern canaries are the standout — AUC 0.903 with Cohen's d = 1.84. Syntactic Structure lags at 0.875 (d = 1.63). Every experimental factor — injection rate, model size, canary strategy, mixture ratio — had significant independent effects on MIS (p < 0.001).

Robustness Against Paraphrasing: Semantic Leakage That Survives Surface-Level Rewriting

Paraphrasing is the obvious countermeasure. SIGIL shrugs it off. Even under 100% paraphrasing, AUC only drops to 0.864 — still well above 0.86. That’s because canary sequences exploit semantic structure, not just surface form. The framework detects the residual semantic leakage after rewriting, making it far more resilient than simple n-gram based membership inference.

SIGIL gives content owners a practical forensic tool to enforce data rights against unlicensed LLM training, shifting the burden of proof back onto model builders.

Source: Subtle Injection for Ground-truth Inference of LLM Training Data
Domain: arxiv.org

SIGIL Framework Embeds Imperceptible Canaries to Prove LLM Training Data Inclusion

How SIGIL Works: Five Canary Strategies and a Statistical Test

Detection Performance Across Conditions

Robustness Against Paraphrasing: Semantic Leakage That Survives Surface-Level Rewriting

More in Artificial Intelligence