Source linked

Diverse First-Query-Initialisierung erhöht die Agentensuche um 5-7 Punkte

Standard-Parallel-Probenahme in der Agent-Suche trifft reduzierte Renditen aufgrund redundanter ersten Abfragen; DivInit wählt vielfältige anfängliche Samen und gewinnt 5-7 Genauigkeitspunkte auf Multi-Hop-QA auf dem gleichen Rechnungsbudget.

divinitagentic searchlarge language modelscarnegie mellon universitymulti hop qareasoning

Standard parallel sampling for agentic search breadth scaling returns drop off quickly after a few trajectories. The CMU team behind DivInit traced that decay to a single cause: models issue nearly identical first-turn queries across rollouts, so every thread retrieves the same evidence and subsequent turns are conditioned on that shared, often shallow, set.

Why Parallel Sampling Stalls

Breadth scaling should be the easy win for test-time compute. Run k independent trajectories, pick the best answer. But the authors show that as k grows, accuracy gains plateau fast. Across five open-weight models and eight benchmarks, the marginal benefit of each new rollout collapses. The bottleneck isn't the model's reasoning ability - it's that every trajectory starts from the same query distribution and converges on the same retrieved documents.

DivInit: Pick Diverse Seeds, Not Independent Samples

DivInit is a training-free swap at the first turn. Instead of sampling k independent first queries from the model, you draw n candidates in a single call, then select k < n of those that are maximally diverse in embedding space. Each diverse seed becomes the starting point for a parallel trajectory. No fine-tuning, no auxiliary models, no extra inference cost beyond that single n-sized generation.

On multi-hop QA benchmarks (HotpotQA, 2WikiMultihop, MuSiQue, and others), DivInit delivers consistent 5-7 point accuracy gains over standard parallel sampling at matched compute. The improvement holds across model sizes and families - Llama, Mistral, Qwen, Gemma, and more. The code is open at https://github.com/cxcscmu/diverse-query-initialization.

This is the sort of fix that feels obvious only after you see it. Expect the pattern of diversifying initial conditions to spread beyond agentic search to any system where parallel trajectories share a fragile first step.


Source: Beyond Parallel Sampling: Diverse Query Initialization for Agentic Search
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.