What is the significance of: 76% Canary Extraction Drop Hides a Deletion Fidelity Trap for Agent Memory?

Key-fact summarization slashes adversarial extraction by 76% on Gemma 3 12B while preserving personalization, but raw-only deletion leaves 20% of summaries recoverable - agent memory is now a first-class privacy surface.

76% Canary Extraction Drop Hides a Deletion Fidelity Trap for Agent Memory

Key-fact summarization reduces canary extraction by 76% on Gemma 3 12B and 64% on GPT-4o-mini while keeping personalization recall effectively intact. That sounds like a clear privacy win — until you check what happens when you try to delete that information.

Memory is Now a Deployment-Time Problem

Foundation-model agents are long-lived by design. They remember users across sessions, so memorization isn't just a property of static model weights — it's an explicit function of the memory architecture you build around the LLM. Existing research focuses on parametric memorization or audits of fixed memory configurations. This paper from the arXiv preprint server (2606.10062) takes the next step: it treats agent memory as a privacy-utility frontier and sweeps three design knobs — summarization aggressiveness, retrieval breadth (k), and deletion mode.

The authors introduce two metrics: Personalization Recall (PR) and Adversarial Extraction Rate (AER). They also create the Forgetting Residue Score (FRS) to measure whether deleted information stays recoverable from derived memory tiers. On the LongMemEval benchmark, they test Gemma 3 12B and GPT-4o-mini across these knobs, and the numbers are sharp.

Compression Crushes Extraction, But Kills Deletion

Key-fact summarization — aggressively compressing conversation history into factual snippets — drops AER by 76% on Gemma 3 12B and 64% on GPT-4o-mini. Personalization recall barely budges. Once content is compressed away, increasing retrieval breadth (k) no longer restores leakage. That's the good news.

Here's the trap: the same compression produces a deletion-fidelity failure. When you delete only the raw logs but leave the derived summary copies, those summaries remain recoverable in about 20% of instances. The paper shows that only full-pipeline purge — deleting raw and all derived tiers — or tombstone redaction drives worst-tier FRS to zero. Partial deletion leaves a residue that an adversary with memory-tier access can reconstruct.

These results establish that persistent agent memory must be evaluated as a first-class memorization mechanism — assessed by what it helps agents recall, what it makes extractable, and what it can truly erase. Every agent builder shipping long-term memory should now measure AER and FRS alongside personalization recall, or expect to discover their deletion promises are hollow.

Source: Deployment-Time Memorization in Foundation-Model Agents
Domain: arxiv.org

76% Canary Extraction Drop Hides a Deletion Fidelity Trap for Agent Memory

Memory is Now a Deployment-Time Problem

Compression Crushes Extraction, But Kills Deletion

More in Artificial Intelligence