Source linked

6 of 8 LLMs Flunk Secondary Subjectプライバシーテスト in IDP-Bench

新しいベンチマークは、LLMsのACS共同所有の認識(6/8>90%)を明らかにしますが、副学科を特定し、共有の適切性を判断しません、5/8は77%未満です。

llmsidp benchinterdependent privacycontextual integritytisl labai privacy

IDP-Bench drops 8 open-source LLMs into 3 levels of interdependent privacy reasoning, and the results reveal a gap: these models know when data belongs to multiple people, but they’re blind to the secondary victims.

Co-Ownership Recognition Isn't the Problem

6 of 8 models exceed 90% accuracy on co-ownership recognition. That sounds good — until you realize this is the easy part. Interdependent privacy (IDP) means your chat history with a friend also contains their secrets. The LLM can grasp that the data isn't solely yours, but that awareness doesn't translate into protecting the other person.

The real failure shows up when models must identify secondary subjects — the people whose data is exposed without their consent. Here, 7 of 8 models score below 74%. That's not just a statistical dip; it's a systematic blind spot. The Contextual Integrity (CI) framework underpins IDP-Bench, and models consistently miss the information attribute and primary subject too, but secondary subjects are the worst offender.

Judging Sharing Appropriateness: Scale Helps, But Not Enough

5 of 8 models score below 77% on the task of deciding whether sharing a given piece of information is appropriate in an IDP context. Larger models show better judgment — the scaling trend is real. But smaller models get demolished, and prompt sensitivity is high across the board.

When you rephrase an IDP question, the model's answer flips. That's not robustness; that's fragility dressed up as language fluency. For a personal assistant that might decide whether to share your dinner plans or your friend's medical history, a fragile privacy guardrail is no guardrail at all.

Two LLM Judges, Three Reasoning Levels — Same Story

IDP-Bench evaluates models across three levels of IDP reasoning (co-ownership, parameter identification, appropriateness judgment) using two separate LLM judges. The consistency across judges strengthens the finding. The benchmark and code are public on GitHub (tisl-lab/Interdependent_Privacy_Bench), so anyone can reproduce or extend these tests.

Eight open-source models were tested. The authors don't name them in the abstract, but the results are clear: scale improves appropriateness judgment, but the core IDP reasoning — who else is affected, what attribute is at stake — remains weak regardless of size.

If LLMs are to serve as personal assistants without leaking others' secrets, the IDP-Bench results make clear that current alignment techniques aren't enough — the secondary subject remains invisible to most models.


Source: IDP-Bench: Benchmarking ability of LLMs to protect personal information in interdependent privacy contexts
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.