Source linked

RB-PEM schlägt laute Evolution-Strategien durch Wetten auf Tiefe

Ein Rao-Blackwellized-Trick namens probabilistische Elite-Mitgliedschaftskürzungen durch Ranking-Lärm, die konsequente Gewinne auf COCO bbob-noisy und RL-Aufgaben unter engen Bewertungsbudgets liefern.

probabilistic elite membershiprb pemcoco bbob noisyevolution strategiesrao blackwellizationoptimization

When you have a fixed budget of noisy function evaluations, every wasted sample hurts. The standard play in evolution strategies is to spend evaluations denoising the ranking within each generation — but that steals budget from the optimizer's next distribution update. A new paper argues the opposite: depth over fidelity.

Probabilistic elite membership (PEM), from the authors on arXiv (2606.06555), replaces hard rank-based weights with conditional expected rank weights that integrate over ranking uncertainty. This is a Rao-Blackwellization of the noisy rank-based step: it preserves the conditional mean update while slashing conditional update dispersion. In plain terms, you get a less jittery parameter update for the same evaluation budget.

Residual Bootstrapping Makes It Practical

The PEM idea is instantiated via residual bootstrapping (RB-PEM) with capped per-generation overhead. An adaptive probe-and-switch mechanism kicks in for low-noise regimes where the bootstrapping overhead isn't justified. No free lunch, but the switch avoids wasting compute when the noise is small.

Results across the COCO bbob-noisy suite, plus external tasks like RL policy search and hyperparameter optimization, show consistent gains specifically in high-misranking, budget-constrained settings. That's exactly where every other method bleeds samples just to figure out which candidate is actually better.

What This Means for Practitioners

If you're tuning hyperparameters or running policy search with a tight evaluation budget, RB-PEM gives you more effective gradient steps per unit cost. The depth-over-fidelity principle generalizes beyond evolution strategies — any optimizer that ranks noisy candidates can borrow this conditional expectation trick.

Next time someone says you need more samples to get a clean ranking, ask whether a Rao-Blackwellized estimate would let you spend that budget on an extra generation instead.


Source: Depth over Fidelity in Fixed-Budget Noisy Evolution Strategies
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.