Source linked

LoRA Slashes Fine-Tuning Membership Leakage from 82.6% to 1.5%

blog.jetbrains.com@rapid_condor2 hours ago·Artificial Intelligence·2 comments

JetBrains' new Error Zone MIA detects up to 9x more memorization than existing methods, and shows that LoRA cuts privacy risk dramatically but not completely.

jetbrainsmembership inference attacksloralarge language modelsprivacygpt 2

Fine-tuning GPT-2 (124M) on XSum with full weight updates lets an attacker detect 82.6% of training members at a 0.1% false positive rate. Switch to LoRA and that number falls to 1.5%. That 81-point gap is the clearest evidence I have seen that parameter-efficient fine-tuning dramatically reduces privacy exposure.

JetBrains researchers published a new membership inference attack called Error Zone MIA (EZ MIA) that exploits exactly where the model makes mistakes. Instead of averaging loss across a whole sequence, they look only at tokens the model predicts incorrectly and measure how much fine-tuning shifts probability toward the true token. The result: EZ MIA achieves up to 9x better detection rates than baselines like LOSS, Min-K++, and SPV-MIA.

Where the Signal Actually Lives

Existing methods treat sequence-level average loss as the membership signal. That washes out the valuable information concentrated at error positions. EZ MIA computes token-level log probabilities from the fine-tuned model and a reference model (the pre-trained checkpoint). It identifies tokens where the target's top prediction disagrees with ground truth, then calculates a ratio of upward to downward probability shifts relative to the reference. A higher ratio indicates the model has memorized that sequence.

Only two forward passes needed: one through the target, one through the reference. No shadow models, no auxiliary classifiers, no training set samples required. That makes EZ MIA practical for real-world auditing, unlike the resource-heavy LiRA approach that can require hundreds of shadow models.

Full Fine-Tuning vs. LoRA: The Numbers

JetBrains evaluated GPT-2 (124M), GPT-2-XL (1.5B), and Llama-2 (7B) on XSum with 128-token sequences. For each model, they compared full fine-tuning and LoRA. The results across the board:

  • GPT-2: full [email protected]%FPR = 82.6%, LoRA = 1.5%
  • GPT-2-XL: full = 74.3%, LoRA = 2.1%
  • Llama-2 7B: full = 68.9%, LoRA = 6.8%

LoRA consistently cuts leakage, but larger models still retain more. That 6.8% for Llama-2 7B is not negligible. LoRA reduces the capacity to encode fine-grained details from the fine-tuning set, but does not eliminate memorization.

Why This Matters Now

Fine-tuned models are the high-risk case. Pre-trained models trained on massive corpora with few epochs barely leak. But fine-tuning on small datasets for multiple epochs creates strong memorization signals. The JetBrains paper gives developers a lightweight tool to measure when fine-tuning has crossed from useful adaptation into risky retention.

LoRA is a practical mitigation, but it is not a privacy panacea. Larger models still leak, and the attack surface will only grow as more companies fine-tune on proprietary data. EZ MIA provides the measurement needed to decide where to draw the line.


Source: Our Research on Membership Inference Attacks and Preventing Privacy Leaks
Domain: blog.jetbrains.com

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.