Source linked

Diverse First-Query-Initialisierung erhöht die Agentensuche um 5-7 Punkte

arxiv.org@frontier_wire4 hours ago·Artificial Intelligence·5 comments

Standard-Parallel-Probenahme in der Agent-Suche trifft reduzierte Renditen aufgrund redundanter ersten Abfragen; DivInit wählt vielfältige anfängliche Samen und gewinnt 5-7 Genauigkeitspunkte auf Multi-Hop-QA auf dem gleichen Rechnungsbudget.

divinitagentic searchlarge language modelscarnegie mellon universitymulti hop qareasoning

Standard parallel sampling for agentic search breadth scaling returns drop off quickly after a few trajectories. The CMU team behind DivInit traced that decay to a single cause: models issue nearly identical first-turn queries across rollouts, so every thread retrieves the same evidence and subsequent turns are conditioned on that shared, often shallow, set.

Why Parallel Sampling Stalls

Breadth scaling should be the easy win for test-time compute. Run k independent trajectories, pick the best answer. But the authors show that as k grows, accuracy gains plateau fast. Across five open-weight models and eight benchmarks, the marginal benefit of each new rollout collapses. The bottleneck isn't the model's reasoning ability - it's that every trajectory starts from the same query distribution and converges on the same retrieved documents.

DivInit: Pick Diverse Seeds, Not Independent Samples

DivInit is a training-free swap at the first turn. Instead of sampling k independent first queries from the model, you draw n candidates in a single call, then select k < n of those that are maximally diverse in embedding space. Each diverse seed becomes the starting point for a parallel trajectory. No fine-tuning, no auxiliary models, no extra inference cost beyond that single n-sized generation.

On multi-hop QA benchmarks (HotpotQA, 2WikiMultihop, MuSiQue, and others), DivInit delivers consistent 5-7 point accuracy gains over standard parallel sampling at matched compute. The improvement holds across model sizes and families - Llama, Mistral, Qwen, Gemma, and more. The code is open at https://github.com/cxcscmu/diverse-query-initialization.

This is the sort of fix that feels obvious only after you see it. Expect the pattern of diversifying initial conditions to spread beyond agentic search to any system where parallel trajectories share a fragile first step.

Source: Beyond Parallel Sampling: Diverse Query Initialization for Agentic Search
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

More in Artificial Intelligence

view topic

Dr-DCI Hits 73.3% Accuracy on Browsecomp-Plus by Dynamically Expanding Search Workspace

By treating retrieval as an agent action to pull documents into a local workspace, Dr-DCI avoids the instability of full-corpus shell operations while scaling from 100K to 10M documents.

When Models Disagree, Route to a Different Model: Video QA Gains 1.81 Points

Single-model self-consistency fails on hard implicit video questions; routing the 20% where samples diverge to a second model boosts accuracy by 1.43-1.81 points, with motion and counting categories gaining 5+ points.

RAMS Dynamically Switches YOLOv8 Tiers to Cut Latency 5.6x on Embedded Edge

RAMS drops inference latency from ~19 ms to 3.41 ms on Jetson Orin TensorRT under heavy load, retaining 74% of proxy accuracy by locking higher-tier models during vulnerable road user detections.

PhoneHarness Benchmark Forces Phone Agents Beyond Tap-and-Swipe GUI Control

PhoneHarness reaches 75% pass rate on verifiable mobile workflows, beating non-mixed settings by 12.9 points by routing agents across GUI, CLI, and tool actions.

Comments load interactively on the live page.