Source linked

LRU и LFU Beat by FIFO в LLM Caches - SOLAR Fixes It with Bayesian Regret

Классическая кэш-политика не справляется с семантическими буферами поиска: LRU и LFU последовательно уступают наивному FIFO. SOLAR, расширенная система обучения, обеспечивает улучшение на 5–75% при постоянном соотношении конкуренции ≤3.

arxivsolarmemorybench fullloco modialsimsemantic caching

On MemoryBench-Full, across LoCoMo and DialSim with 8 replacement policies, LRU and LFU both underperform FIFO on semantic workloads. That's not a typo — the heuristics that power every web server and database are worse than a naive queue when cache items are matched by embedding similarity and hit quality is continuous.

Jason Wu, David Yang, and their coauthors formalize this as an online semantic cache replacement problem with switching costs. The key insight: semantic retrieval buffers lack the temporal locality and frequency concentration that LRU and LFU exploit. Items are recalled by similarity, not by recency or count, so classic policies degrade.

SOLAR: Learning When to Evict and What to Keep

SOLAR decouples modification timing from content selection. Modification timing uses regret accumulation — the system only modifies the cache when the cumulative regret of holding a suboptimal set crosses a threshold, achieving roughly 17% modification rate. Content selection uses Bayesian online learning over implicit retrieval feedback: each eviction decision treats the cache as a multi-armed bandit where arms are candidate items.

Theoretical guarantees hold up under scrutiny. SOLAR achieves a constant competitive ratio ≤3, independent of cache size and horizon — compare that to FIFO's Omega(K) ratio. Eviction regret is O(sqrt(K T log T)), matching the Omega(sqrt(K T)) lower bound up to log factors.

5–75% Improvement and the Inverted-U Surprise

Experiments at tight cache sizes show SOLAR beating FIFO by 5–75% relative improvement. A clear phase transition emerges at the working set boundary: once the cache fits the active working set, gains shrink. More provocative is the synthetic experiment with 5000-item pools revealing an inverted-U relationship between pool size and retrieval quality. Cache capacity constraints aren't about storage — they're a noise filter. Too many candidates hurt retrieval quality.

The message for anyone building LLM agents: stop cargo-culting LRU into your retrieval buffer. SOLAR's hybrid of regret-gated updates and Bayesian selection points toward caches that treat eviction as an online learning problem, not a bookkeeping chore.


Source: When Classic Cache Policies Fail: Learning-Augmented Replacement for Semantic Retrieval Buffers
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.