Adaptive Re-Ranking cuts median latency by up to 53x while keeping ranking quality within 4% of a heavy neural re-ranker across multiple datasets. That's the headline from a new arXiv paper that attacks a dumb inefficiency baked into every modern retrieval pipeline.
The One-Size-Fits-All Reranking Tax
Most IR systems run a static retrieve-then-rerank pipeline: BM25 spits out candidates, then a fat cross-encoder like BGE-v2-m3 re-ranks every query the same way. Simple queries get the same billion-parameter treatment as complex ones. The authors of the Adaptive Re-Ranking paper point out that this wastes compute and latency on the easy stuff. Their oracle analysis shows huge potential savings if you could magically pick the right ranker per query.
Utility-Based Routing That Actually Works
The trick is a utility function that weighs ranking gain against computational cost. They train a routing classifier with three options: BM25 alone (cheapest), a dense re-ranker (MiniLM-L6-v2), and the heavy neural re-ranker (BGE-v2-m3). The router learns to assign each query to the cheapest ranker that keeps nDCG@10 above a threshold.
Results are clean. Compared to always using BGE-v2-m3, the adaptive method delivers 1.15-53x lower median latency and 1.11-5.22x lower mean latency across all tested datasets. The nDCG@10 hit ranges from -17.5% (worst case) to +4.0% (actually better on some datasets). That's a trade-off worth taking in production.
Why This Is Harder Than It Looks
The authors admit that learning a good router from limited supervision is non-trivial. Their oracle upper bound shows huge room for improvement over the trained baseline, meaning there is still signal left on the table. But even this first pass is compelling enough to make you question any static pipeline.
If you run IR at scale, Adaptive Re-Ranking gives you a concrete knob to turn latency into quality, query by query. The next step is likely end-to-end learned routers that incorporate feedback from downstream tasks.
Source: Adaptive Re-Ranking
Domain: arxiv.org
Comments load interactively on the live page.