IRENE Boosts Zero-Shot Retrieval by 15% Points With On-the-Fly Classifiers

Siamese encoders for zero-shot retrieval are too small to hold real world knowledge, and extreme classifiers that train a separate classifier per item can't handle novel items. That's the gap Microsoft Research just closed with IRENE, a method that synthesizes classifiers for never-before-seen items on the fly.

IRENE lives inside a new framework called EMMETT (Extreme Meta-Classification for Large-Scale Zero-Shot Retrieval). Instead of training a fresh classifier for each new item, IRENE builds one from the already-trained classifiers of similar observed items. That means you get the high-capacity representation of extreme classification without the latency penalty at inference time.

15% Points on Recall@10, 4.2% on Real Ad CTR

The numbers tell the story. Across a wide range of retrieval benchmarks, IRENE improves zero-shot Recall@10 by up to 15 percentage points when stacked on top of leading Siamese encoders. That's not a 15% relative gain -- it's 15 absolute points, which in retrieval is a chasm.

More convincing than any benchmark: a live A/B test on a major search engine's ad retrieval pipeline. The IRENE-augmented system lifted ad click-through rate by 4.2%. That's real revenue impact, not another academic SOTA.

How IRENE Works (Skipping the Fluff)

Standard extreme classification learns a weight vector for each training item. For a novel item, IRENE computes a weighted combination of those existing weight vectors, where the weights come from the similarity between the novel item's embedding and the observed items' embeddings. No new training, no huge encoder, no fine-tuning on the fly. The key theoretical contribution is a generalization bound that tells you how to set the weighting scheme and how many training items you need for reliable zero-shot performance.

The source code is available at https://aka.ms/irene, so you can verify this yourself without waiting for a paper release.

Why This Changes How We Build Retrieval Pipelines

If you've ever tried to deploy a large-scale retrieval system that needs to handle thousands of new items per second, you know the trade-off: small Siamese models for speed but poor accuracy, or huge classification layers with fixed vocabularies that can't adapt. IRENE breaks that trade-off by reusing what you already have.

Expect to see this pattern show up in product search, content recommendation, and any system where the item catalog changes faster than you can retrain.

Source: Extreme Meta-Classification for Large-Scale Zero-Shot Retrieval
Domain: arxiv.org

IRENE Boosts Zero-Shot Retrieval by 15% Points With On-the-Fly Classifiers

15% Points on Recall@10, 4.2% on Real Ad CTR

How IRENE Works (Skipping the Fluff)

Why This Changes How We Build Retrieval Pipelines

More in Artificial Intelligence