Source linked

Compaction-Only Reallocation Captures 96-99% of Adaptive LSM Tuning

A new analysis of LSM Bloom-filter tuning reveals a log-law and a robustness law showing continuous adaptivity offers marginal gains over simple compaction-triggered reallocation, validated on Twitter traces and RocksDB.

lsm treesbloom filtersrocksdbadaptive tuningstorage systemstwitter production traces

96-99% of the benefit from adaptive Bloom-filter tuning in LSM trees comes from a simple strategy: reallocate the memory budget only when compaction rebuilds the filters. Continuous tracking? Not worth the machinery in most workloads.

The Log-Law That Makes Misestimation Cheap

Log-structured merge trees attach an approximate-membership filter (Bloom filter) to each run, splitting a fixed memory budget across them. The static optimum was known from Monkey. A large systems literature then made allocation adaptive, tracking shifting hotness online. This paper asks the prior question: when is adaptivity worth its complexity?

Three analytical answers come out. First, a log-law: optimal bits-per-key is affine in the logarithm of access frequency, at a fixed slope. Because the workload enters only logarithmically, the excess read cost from a hotness misestimate is half the size-weighted variance of the log error. A common-factor misestimate is absorbed by the budget multiplier. Coarse estimates lose little.

Compaction Is a Free Clock for Reallocation

Second, a robustness law: the cost of misestimation grows only with the log error variance, not the raw error. Third, an adaptivity-value frontier: compaction rebuilds filters for free on its own clock. The value of continuous tracking over an allocation recomputed only at compaction grows quadratically in the within-epoch drift, with a closed-form scale. Translation: unless workloads shift wildly between compactions, you get almost all the benefit from a simpler policy.

Three Regimes: When to Track, When to Ignore

The authors propose a three-regime policy: coarse-at-compaction suffices for mild drift, continuous tracking for moderate drift, and at extreme drift fall back to uniform. More skew makes fine tracking matter less, counter to intuition. On a real cluster, reallocating only at compaction captures 96-99% of tracking's benefit. On RocksDB the false-positive primitive holds within four percent to eight bits per key.

No new filter, no engine fork. The contribution is a characterization of when adaptive tuning pays, validated on synthetic sweeps, real Twitter production cache traces, and a live RocksDB engine. Code and pre-registration are public. This paper gives storage engineers a clear rule of thumb: invest in adaptive tracking only when your workload drift exceeds the compaction window; otherwise, compaction-triggered reallocation is enough.


Source: The Value of Adaptivity in LSM Bloom-Filter Tuning: A Log-Law and a Two-Clock Frontier
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.