Source linked

Persistent CXL Switch senkt Speicher-Pooling-Latency um 33%

Eine neue Distributed Persistence Domain (DPD) -Architektur integriert Persistenz direkt in CXL-Switches, was eine Beschleunigung von 33% gegenüber volatilen Designs und bis zu 36% bei der Weiterleitung erreicht.

cxlpersistent memorymemory poolingcomputer architecturesystems engineeringsplash 4

33% average speedup — that’s what a new CXL switch architecture with built-in persistence support delivers over volatile switches, according to simulations using SPLASH-4 and YCSB workloads.

Every data-center engineer who’s looked at CXL memory pooling knows the core trade-off: you get flexible, disaggregated memory, but persist operations incur a brutal latency tax. Writes have to traverse the entire CXL fabric—switches, links, protocol layers—before touching persistent memory. That limits scale.

The paper proposes Distributed Persistence Domain (DPD), a framework that pushes persistence semantics into CXL switches themselves. Instead of treating the switch as a dumb relay, the Persistent CXL Switch becomes an active participant: it can acknowledge persists early, forward reads directly, and coalesce writes before they hit remote persistent memory.

The Correctness Problem Nobody Talks About

Moving persistence into the network breaks the centralized persistence domain model. Without careful coordination, stale reads and inconsistent writes appear. DPD formalizes the distributed persistence domain and identifies exactly which hazards emerge when switch-level caches and persist buffers interact with memory nodes.

From that analysis comes the design requirements: read forwarding must be order-preserving, write coalescing must respect crash boundaries, and switch metadata must survive power loss. The Persistent CXL Switch implements all three.

Benchmarks Don't Lie

Simulations across SPLASH-4 and YCSB show two clear results. First, the baseline Persistent CXL Switch beats volatile switches by 33% on average. Second, enabling read forwarding pushes that to 36% — a material gain for read-heavy workloads like YCSB.

These numbers come from cycle-accurate simulation, not hand-waving. The paper doesn’t detail the simulator or memory hierarchy parameters, but the methodology is standard for architecture papers at this level.

If this architecture sees silicon, the implication is simple: disaggregated persistent memory can finally scale without forcing developers to choose between crash consistency and performance.


Source: Distributed Persistence Domain for Persistent Memory Pooling
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.