Memora Cuts Context Tokens 98% While Beating Every Agent Memory Baseline

86.3% LLM-judge accuracy on LoCoMo and 87.4% on LongMemEval — while using up to 98% fewer context tokens than feeding the full history to the model. That's the headline from Microsoft Research's Memora, published at ICML 2026.

Every long-horizon AI agent today hits the same wall: you either dump the entire conversation into context (expensive, doesn't scale) or compress it into summaries that lose the fine-grained details that matter — revised deadlines, stakeholder preferences, numeric constraints. Memora splits that knot by decoupling what gets stored from how it gets retrieved.

Two Components, One Harmonic System

Each memory in Memora consists of a primary abstraction (a 6-8 word phrase that captures the memory's essence) and a memory value holding the full content. Only the primary abstraction gets embedded for similarity search. The value stays rich and untouched. New information about an evolving topic merges into the same memory entry under its primary abstraction rather than fragmenting into dozens of partial duplicates.

Cue anchors complement primary abstractions: short, context-aware tags extracted from each memory value. They act as organic metadata, providing alternative retrieval paths without forcing the system into rigid ontologies. When a user says "Dave and Sarah agreed to push the prototype to April 1, the pilot to May 2, the MVP to May 30," Memora stores the single primary abstraction "Updated Project Orion timeline agreed by Dave and Sarah" and generates cue anchors like "Dave Project Orion update" and "prototype schedule" — all routing to the same memory value.

Policy-Guided Retrieval Beats Top-K Similarity

Memora's retriever treats memory access as an active reasoning process. Instead of returning the top-k semantically similar items, it iteratively refines the query, expands through cue anchors to surface related-but-not-similar memories, and decides when to stop. This catches multi-hop dependencies that pure semantic search misses — the kind of recall a human would use when connecting an offhand comment from three weeks ago to today's question.

The retrieval policy can be hand-prompted with a strong LLM or distilled into a smaller model via reinforcement learning.

Benchmarks That Back the Claims

On LoCoMo (average 600-turn dialogues) Memora scores 86.3%. On LongMemEval (115,000-token contexts) it hits 87.4%. It outperforms RAG, Mem0, Nemori, Zep, LangMem, and even full-context inference. The advantage grows on multi-hop reasoning questions. Efficiency is equally stark: Memora stores 344 memory entries per conversation versus 651 for Mem0, and consumes up to 98% fewer tokens than full-context inference.

Microsoft is already pursuing extensions: MemLoop learns from retrieval and task failures, Deferred Memory delays storage until sufficient context accumulates, and Group Memory shares knowledge across agents while preserving provenance and access boundaries. Code is on GitHub under microsoft/Memora.

Long-horizon AI assistants that remember what happened last month, not just five minutes ago — that's finally within reach.

Source: Memora: A Harmonic Memory Representation Balancing Abstraction and Specificity
Domain: microsoft.com

Memora Cuts Context Tokens 98% While Beating Every Agent Memory Baseline

Two Components, One Harmonic System

Policy-Guided Retrieval Beats Top-K Similarity

Benchmarks That Back the Claims

More in Artificial Intelligence