What is the significance of: Hybrid Centaur Beats Both LLMs and Classical HPO in Direct Comparison?

Ferreira et al. show CMA-ES and TPE consistently beat LLM-based hyperparameter optimizers, but a new hybrid named Centaur-sharing CMA-ES's internal state with an LLM-achieves the best results using just a 0.8B model.

Hybrid Centaur Beats Both LLMs and Classical HPO in Direct Comparison

Classical hyperparameter optimization algorithms—CMA-ES and TPE—handily beat LLM-based agents when tuning a small language model under a fixed compute budget. That's the headline from Ferreira, Wobbe, Krishnakumar, Hutter, and Zela in their preprint "Can LLMs Beat Classical Hyperparameter Optimization Algorithms?" The gap closes when the LLM can edit training code directly, but even Claude Opus 4.6 and Gemini 3.1 Pro Preview can't surpass the classical methods.

The testbed: autoresearch lets LLMs edit training code

The authors use the autoresearch repository, which gives an LLM agent full access to edit the training script. This isn't a static search space over a fixed grid—the LLM can rewrite any line. Classical methods like CMA-ES and TPE operate on a defined search space, and they win because they avoid out-of-memory failures more reliably than LLMs. The LLMs struggle to track optimization state across trials, a known weakness.

Centaur: the hybrid that wins with a 0.8B model

To combine classical precision with LLM domain knowledge, the team introduces Centaur. It shares CMA-ES's interpretable internal state—mean vector, step-size, covariance matrix—with an LLM. Centaur achieves the best results in all experiments, and a tiny 0.8B LLM already outperforms every pure classical and pure LLM method. Larger models don't help Centaur much; the bottleneck isn't model scale but the ability to use the optimizer's state. Unconstrained code editing, on the other hand, needs frontier models to stay competitive with classical methods.

What this means for HPO practice

LLMs are best as complements to classical optimizers, not replacements. Centaur shows a clean path: keep the interpretable, stateful optimizer and let the LLM propose trials informed by that state. The authors provide code and an interactive demo, so you can poke at the results yourself. If you're reaching for an LLM to tune your next model, think again—CMA-ES with a small language model grafted on might be all you need.

Source: Can LLMs Beat Classical Hyperparameter Optimization Algorithms?
Domain: arxiv.org

Hybrid Centaur Beats Both LLMs and Classical HPO in Direct Comparison

The testbed: autoresearch lets LLMs edit training code

Centaur: the hybrid that wins with a 0.8B model

What this means for HPO practice

More in Artificial Intelligence