Source linked

El marco SIGIL incorpora Canarias imperceptibles para demostrar la inclusión de datos de formación LLM

SIGIL alcanza una AUC total de 0.892 en 36.000 ensayos, con tasas de detección de hasta el 78%, y mantiene la AUC de 0.864 incluso cuando los datos de entrenamiento son 100% parafraseados.

sigilllm training datamembership inferencecanary sequencesforensic methodsai ethics

An AUC of 0.892 across 36,000 trials means SIGIL can prove your copyrighted content was used to train an LLM with high confidence — and it works even when the training data is aggressively paraphrased.

How SIGIL Works: Five Canary Strategies and a Statistical Test

SIGIL (Subtle Injection for Ground-truth Inference of LLM training data) embeds imperceptible canary sequences into protected text or code. If that document ends up in an LLM’s training set, the model learns those sequences and leaves a detectable behavioral signature when probed with targeted queries. The framework defines five canary strategies: lexical-rare, lexical-phrase, syntactic, semantic, and code-pattern. Each strategy is paired with a Membership Inference Score (MIS) grounded in the Neyman-Pearson hypothesis testing framework, giving formal false-positive rate control.

Detection Performance Across Conditions

SIGIL’s performance varies meaningfully with injection rate, model size, and canary type. Overall AUC hits 0.892, rising from 0.831 at just 0.1% injection to 0.947 at 10%. Detection rates range from 33% to 78% depending on conditions. Code Pattern canaries are the standout — AUC 0.903 with Cohen's d = 1.84. Syntactic Structure lags at 0.875 (d = 1.63). Every experimental factor — injection rate, model size, canary strategy, mixture ratio — had significant independent effects on MIS (p < 0.001).

Robustness Against Paraphrasing: Semantic Leakage That Survives Surface-Level Rewriting

Paraphrasing is the obvious countermeasure. SIGIL shrugs it off. Even under 100% paraphrasing, AUC only drops to 0.864 — still well above 0.86. That’s because canary sequences exploit semantic structure, not just surface form. The framework detects the residual semantic leakage after rewriting, making it far more resilient than simple n-gram based membership inference.

SIGIL gives content owners a practical forensic tool to enforce data rights against unlicensed LLM training, shifting the burden of proof back onto model builders.


Source: Subtle Injection for Ground-truth Inference of LLM Training Data
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.