Adaptive Re-Ranking réduit la latence jusqu'à 53x avec une perte minimale de précision

Un classificateur de routage basé sur l'utilitaire décide par requête s'il faut utiliser la récupération rare, un re-ranger léger ou un modèle neuronal lourd, atteignant une latence médiane inférieure de 1,15 à 53 fois avec seulement -17.5% à +4,0% nDCG@10 changement.

adaptive re rankingbm25bge v2 m3minilm l6 v2information retrievalmachine learning

Adaptive Re-Ranking cuts median latency by up to 53x while keeping ranking quality within 4% of a heavy neural re-ranker across multiple datasets. That's the headline from a new arXiv paper that attacks a dumb inefficiency baked into every modern retrieval pipeline.

The One-Size-Fits-All Reranking Tax

Most IR systems run a static retrieve-then-rerank pipeline: BM25 spits out candidates, then a fat cross-encoder like BGE-v2-m3 re-ranks every query the same way. Simple queries get the same billion-parameter treatment as complex ones. The authors of the Adaptive Re-Ranking paper point out that this wastes compute and latency on the easy stuff. Their oracle analysis shows huge potential savings if you could magically pick the right ranker per query.

Utility-Based Routing That Actually Works

The trick is a utility function that weighs ranking gain against computational cost. They train a routing classifier with three options: BM25 alone (cheapest), a dense re-ranker (MiniLM-L6-v2), and the heavy neural re-ranker (BGE-v2-m3). The router learns to assign each query to the cheapest ranker that keeps nDCG@10 above a threshold.

Results are clean. Compared to always using BGE-v2-m3, the adaptive method delivers 1.15-53x lower median latency and 1.11-5.22x lower mean latency across all tested datasets. The nDCG@10 hit ranges from -17.5% (worst case) to +4.0% (actually better on some datasets). That's a trade-off worth taking in production.

Why This Is Harder Than It Looks

The authors admit that learning a good router from limited supervision is non-trivial. Their oracle upper bound shows huge room for improvement over the trained baseline, meaning there is still signal left on the table. But even this first pass is compelling enough to make you question any static pipeline.

If you run IR at scale, Adaptive Re-Ranking gives you a concrete knob to turn latency into quality, query by query. The next step is likely end-to-end learned routers that incorporate feedback from downstream tasks.

Source: Adaptive Re-Ranking
Domain: arxiv.org

Adaptive Re-Ranking réduit la latence jusqu'à 53x avec une perte minimale de précision

The One-Size-Fits-All Reranking Tax

Utility-Based Routing That Actually Works

Why This Is Harder Than It Looks

More in Machine Learning