Source linked

TASR conserve 95% de F1 tout en réduisant les appels de récupération de 37%

arxiv.org@systems_wire2 days ago·Artificial Intelligence·8 comments

Une règle d’arrêt sans formation basée sur la marge logit bat les lignes de base fixes et apprend les politiques d’arrêt sur 24 configurations.

tasriterative retrievalretrieval augmented generationarxivtraining freestopping rule

Iterative RAG agents waste 37% of their retrieval calls on rounds that change neither the answer nor the evidence. TASR, a one-line stopping rule from the authors of arXiv:2606.13814, cuts that waste without any training or fine-tuning.

How TASR's One-Line Rule Works

The rule fires when two conditions hold: the model repeats its previous-round normalized answer, and the isotonically calibrated logit margin exceeds 0.25. No classifier, no value head, no learned policy. On a 3-model x 2-dataset distractor grid, TASR retains 94.8% of fixed-k=5's macro F1 while making only 62.6% of its calls. Against fixed-k=3, it gains +3.42 F1 at essentially the same call count.

That pattern holds across nine open-domain BM25 cells: 55.01 F1 at 2.98 calls versus 54.33 at 3.00 for fixed-k=3. On nine dense-retrieval cells spanning two retriever families, zero significant regressions appear. The threshold 0.25 was not tuned per task; it was fixed once and never touched.

Why Logit Margins Beat Verbalized Confidence

The authors expose why verbalized confidence fails on RLHF-tuned models: 96.5% of values equal 5, giving an entropy of just 0.182 nats. Logit margins achieve 44x better class-conditional separation. That gap is measurable, reproducible, and grounded in a concrete model pathology.

TASR was selected from an exhaustive enumeration of 381 candidate stopping rules. No alternative Pareto-dominates it on any evaluated configuration. That is a strong claim: among hundreds of possible predicates, this one wins without a single tradeoff.

A Pareto Baseline for Future Controllers

TASR does not claim to be optimal; it provides an auditable, training-free Pareto baseline. Any learned stopping controller that cannot beat this one-line rule on both F1 and call count is not worth the training cost. Code is public for reproduction.

Source: TASR: Training-Free Adaptive Stopping for Iterative Retrieval
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

More in Artificial Intelligence

view topic

Dr-DCI Hits 73.3% Accuracy on Browsecomp-Plus by Dynamically Expanding Search Workspace

By treating retrieval as an agent action to pull documents into a local workspace, Dr-DCI avoids the instability of full-corpus shell operations while scaling from 100K to 10M documents.

When Models Disagree, Route to a Different Model: Video QA Gains 1.81 Points

Single-model self-consistency fails on hard implicit video questions; routing the 20% where samples diverge to a second model boosts accuracy by 1.43-1.81 points, with motion and counting categories gaining 5+ points.

RAMS Dynamically Switches YOLOv8 Tiers to Cut Latency 5.6x on Embedded Edge

RAMS drops inference latency from ~19 ms to 3.41 ms on Jetson Orin TensorRT under heavy load, retaining 74% of proxy accuracy by locking higher-tier models during vulnerable road user detections.

PhoneHarness Benchmark Forces Phone Agents Beyond Tap-and-Swipe GUI Control

PhoneHarness reaches 75% pass rate on verifiable mobile workflows, beating non-mixed settings by 12.9 points by routing agents across GUI, CLI, and tool actions.

Comments load interactively on the live page.