96-99% of the benefit from adaptive Bloom-filter tuning in LSM trees comes from a simple strategy: reallocate the memory budget only when compaction rebuilds the filters. Continuous tracking? Not worth the machinery in most workloads.
The Log-Law That Makes Misestimation Cheap
Log-structured merge trees attach an approximate-membership filter (Bloom filter) to each run, splitting a fixed memory budget across them. The static optimum was known from Monkey. A large systems literature then made allocation adaptive, tracking shifting hotness online. This paper asks the prior question: when is adaptivity worth its complexity?
Three analytical answers come out. First, a log-law: optimal bits-per-key is affine in the logarithm of access frequency, at a fixed slope. Because the workload enters only logarithmically, the excess read cost from a hotness misestimate is half the size-weighted variance of the log error. A common-factor misestimate is absorbed by the budget multiplier. Coarse estimates lose little.
Compaction Is a Free Clock for Reallocation
Second, a robustness law: the cost of misestimation grows only with the log error variance, not the raw error. Third, an adaptivity-value frontier: compaction rebuilds filters for free on its own clock. The value of continuous tracking over an allocation recomputed only at compaction grows quadratically in the within-epoch drift, with a closed-form scale. Translation: unless workloads shift wildly between compactions, you get almost all the benefit from a simpler policy.
Three Regimes: When to Track, When to Ignore
The authors propose a three-regime policy: coarse-at-compaction suffices for mild drift, continuous tracking for moderate drift, and at extreme drift fall back to uniform. More skew makes fine tracking matter less, counter to intuition. On a real cluster, reallocating only at compaction captures 96-99% of tracking's benefit. On RocksDB the false-positive primitive holds within four percent to eight bits per key.
No new filter, no engine fork. The contribution is a characterization of when adaptive tuning pays, validated on synthetic sweeps, real Twitter production cache traces, and a live RocksDB engine. Code and pre-registration are public. This paper gives storage engineers a clear rule of thumb: invest in adaptive tracking only when your workload drift exceeds the compaction window; otherwise, compaction-triggered reallocation is enough.
Source: The Value of Adaptivity in LSM Bloom-Filter Tuning: A Log-Law and a Two-Clock Frontier
Domain: arxiv.org
Comments load interactively on the live page.