Source linked

RISE сокращает затраты на поиск агентов на 75% без потери точности

Новый метод использует BM25 для разведки агентов, сопоставляя исходную линию с чистой оболочкой на четверть стоимости и масштабируя до 1 млн документов без сбоев стенных часов.

riseagentic searchbm25browsecomp pluslarge language modelsinteraction space

On BrowseComp-Plus, RISE matches a brute-force shell-based search agent at 78% accuracy while costing roughly one quarter per query — but the real insight is why we need to rethink retrieval for agents.

Current retrieval inherits a non-agentic mindset: rank the corpus, feed the top documents to an LLM. That works when you just need facts, but agents need to interact. Recent Direct Corpus Interaction (DCI) work lets agents use shell tools like grep and file reads. Fast, but unbounded. Every broad command scans the whole corpus, and as the corpus grows, latency craters.

Retrieval Should Build a Bounded Playground, Not Just a Reading List

The paper argues that retrieval for agents should construct an interaction space: a bounded subset of the corpus the agent can explore with associated tools. Two design consequences: the space needs a boundary (supplied by retrieval), and the objects inside must be pre-processed for interaction. Enter RISE — Retrieving Interaction SpacE. It uses BM25 to draw that boundary, and during indexing it transforms documents for shell-style navigation.

Quarter the Cost, No Failures at 1M Documents

RISE-BM25 hits 78% accuracy on BrowseComp-Plus using gpt-5.4-mini — the same as the pure DCI baseline — but at roughly 25% of the per-query cost. Scale up to 1 million documents and RISE manages 81% accuracy. Meanwhile, DCI on gpt-5.4-nano (a weaker model) collapses to 60% accuracy, with 33 out of 100 queries hitting wall-clock timeouts. That’s not just slower; it’s broken.

The numbers make the point: unbounded tool use doesn’t scale. RISE shows that a cheap BM25 boundary plus indexed preprocessing lets an agent navigate a large corpus with shell tools without scanning everything each time. No exotic embeddings, no learned retrievers — just a principled shift in what retrieval is supposed to do.

Expect future agentic search systems to borrow this design: tie the retriever to the tool set, not to the prompt. The interaction space is the right abstraction.


Source: Towards Retrieving Interaction Spaces for Agentic Search
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.