Calibrating Query Costs for Confidential VMs Recovers 48% Performance

Analytical queries in AMD SEV-SNP confidential VMs run up to 48% slower than they should, and the culprit is a stale cost model in the query optimizer.

The Mismatch Between Optimizer Assumptions and CVM Reality

Confidential VMs (CVMs) encrypt memory and CPU state at runtime, but the query optimizer inside the DBMS has no idea. It still assumes KVM-style costs for data movement and page translation. That assumption is wrong, and wrong assumptions penalize plan choices.

The authors of this arXiv preprint (2606.26385) show that the dominant overheads come from two sources: higher data movement latency due to encryption, and extra Remote Memory Permissions (RMP) checks during virtual-to-physical translation. These aren't second-order effects; they shift the relative cost of operators like hash joins versus nested loops.

Lightweight Calibration Using Physical Proxies

The fix doesn't require intrusive DBMS changes. The team proposes a CVM-aware cost calibration that models the two overhead sources using simple physical proxies already visible to the optimizer: cache miss rates for data movement and TLB miss rates for RMP-related translation.

Because these proxies are measured at runtime, the calibration adapts to varying hardware and workload characteristics. No recompilation, no new kernel modules. It's a small set of scaling factors applied to the existing cost model.

Recovering Up to 48% Performance, Sometimes Beating KVM

Experiments on real AMD SEV-SNP hardware show the calibration narrows the KVM/CVM performance gap significantly. Across TPC-H query workloads, the worst-case slowdown drops from over 60% to under 15%, and some queries actually run faster than the unencrypted KVM baseline.

That last result needs repeating: a confidential VM, properly cost-calibrated, can outperform a non-encrypted VM on the same hardware. The calibration recovers up to 48% performance relative to the uncalibrated CVM, and for some queries the ``overhead'' becomes a speedup.

What This Enables

This work turns an annoying benchmarking result (``CVMs are slow'') into an actionable engineering knob. Any DBMS running in AMD SEV-SNP can adopt this calibration with minimal code changes, and the same proxy-driven approach should generalize to Intel TDX and other CVM architectures.

The era of treating confidential VMs as a performance tax is over. The cost model just needs to match the hardware it's running on.

Source: Query Cost Model Calibration in Confidential Virtual Machines
Domain: arxiv.org