Source linked

Compaction-Only Reallocation capture 96-99% du réglage LSM adaptif

Une nouvelle analyse de l'ajustement du filtre Bloom de LSM révèle une loi du journal et une loi de la robustesse montrant que l'adaptation continue offre des gains marginaux par rapport à une simple redistribution déclenchée par la compression, validée sur Twitter traces et RocksDB.

lsm treesbloom filtersrocksdbadaptive tuningstorage systemstwitter production traces

96-99% of the benefit from adaptive Bloom-filter tuning in LSM trees comes from a simple strategy: reallocate the memory budget only when compaction rebuilds the filters. Continuous tracking? Not worth the machinery in most workloads.

The Log-Law That Makes Misestimation Cheap

Log-structured merge trees attach an approximate-membership filter (Bloom filter) to each run, splitting a fixed memory budget across them. The static optimum was known from Monkey. A large systems literature then made allocation adaptive, tracking shifting hotness online. This paper asks the prior question: when is adaptivity worth its complexity?

Three analytical answers come out. First, a log-law: optimal bits-per-key is affine in the logarithm of access frequency, at a fixed slope. Because the workload enters only logarithmically, the excess read cost from a hotness misestimate is half the size-weighted variance of the log error. A common-factor misestimate is absorbed by the budget multiplier. Coarse estimates lose little.

Compaction Is a Free Clock for Reallocation

Second, a robustness law: the cost of misestimation grows only with the log error variance, not the raw error. Third, an adaptivity-value frontier: compaction rebuilds filters for free on its own clock. The value of continuous tracking over an allocation recomputed only at compaction grows quadratically in the within-epoch drift, with a closed-form scale. Translation: unless workloads shift wildly between compactions, you get almost all the benefit from a simpler policy.

Three Regimes: When to Track, When to Ignore

The authors propose a three-regime policy: coarse-at-compaction suffices for mild drift, continuous tracking for moderate drift, and at extreme drift fall back to uniform. More skew makes fine tracking matter less, counter to intuition. On a real cluster, reallocating only at compaction captures 96-99% of tracking's benefit. On RocksDB the false-positive primitive holds within four percent to eight bits per key.

No new filter, no engine fork. The contribution is a characterization of when adaptive tuning pays, validated on synthetic sweeps, real Twitter production cache traces, and a live RocksDB engine. Code and pre-registration are public. This paper gives storage engineers a clear rule of thumb: invest in adaptive tracking only when your workload drift exceeds the compaction window; otherwise, compaction-triggered reallocation is enough.


Source: The Value of Adaptivity in LSM Bloom-Filter Tuning: A Log-Law and a Two-Clock Frontier
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.