Hash-Based Address Translation Cuts TLB Miss Latency 15% Over Prior Art

Address translation is the silent killer of memory-intensive workloads, and Revelator from CMU-SAFARI just handed it a hash-based knockout. Across 11 data-intensive benchmarks, Revelator delivers 15.3% average speedup over the previous best speculative translation technique when memory is heavily fragmented. In virtualized environments, it pulls 13.6% over classic Nested Paging. On 16-core server mixes from Google, it hits 1.40x speedup over Transparent Huge Pages under medium fragmentation and 1.50x under high fragmentation. The hardware cost? 0.02% area and 0.03% power on a server-grade CPU. Revelator is open source at github.com/CMU-SAFARI/Virtuoso.

Why Address Translation Predictability Matters More Than Page Size

Conventional OS memory allocation scatters physical pages unpredictably, making speculative address translation a guessing game. Prior work forced contiguity or huge pages to create predictable virtual-to-physical (VA-to-PA) mappings, but those approaches break under fragmentation or incur huge hardware overhead. Revelator sidesteps both problems by introducing a tiered hash-based memory allocation policy for both program data and last-level page table entries. After an L2 TLB miss, a lightweight hardware engine runs the same OS hash functions to predict the physical address and the page table entry location, then prefetches the cache blocks before translation completes. No large pages required. No VA-to-PA contiguity required. Just a small OS scheduler tweak and a minimal hash hardware block.

Measured Speedups That Make Server Operators Take Notice

The real test is under realistic memory fragmentation - Google's data center workloads. Revelator runs 11 data-intensive applications (including SPEC CPU 2017 and graph processing) and beats VILO, the prior state-of-the-art speculative translator, by 15.3% on average. In virtualized environments with Nested Paging, Revelator predicts both guest and host physical addresses, yielding 13.6% average speedup. The 1.50x over Transparent Huge Pages under high fragmentation is the headline figure: huge pages can degrade under fragmentation, but Revelator's hash mappings stay predictable. RTL synthesis on a high-end server CPU shows the hardware additions cost less than a rounding error in area and power. Revelator proves that the path to faster memory access is not bigger pages but smarter cooperation between the OS and the hardware TLB speculation engine.

Source: Revelator: Rapid Data Fetching via System-Software-Guided Hash-based Speculative Address Translation
Domain: arxiv.org

Hash-Based Address Translation Cuts TLB Miss Latency 15% Over Prior Art

Why Address Translation Predictability Matters More Than Page Size

Measured Speedups That Make Server Operators Take Notice

More in Systems Engineering