SproutRAG outperforms the strongest baseline by 6.1% average information efficiency across four long-document benchmarks, without a single extra LLM call during retrieval.
Most retrieval-augmented generation (RAG) pipelines choke on long documents because they either split text into fixed-size chunks (losing context) or rely on expensive LLM-generated summaries (losing fidelity). Amir Abaskohi and collaborators at the University of Toronto and Vector Institute sidestep both traps with a hierarchical framework that learns the document's structure from its own attention patterns.
How SproutRAG Builds a Context Tree From Attention Heads
The core trick: SproutRAG treats sentence-level embeddings as leaves, then iteratively merges the most semantically adjacent pairs by analyzing which attention heads and layers best capture inter-sentence relevance. That produces a binary tree where each internal node represents a progressively larger but coherent text block. No LLM calls, no hand-crafted chunking rules.
At retrieval time, SproutRAG runs a hierarchical beam search over this tree. It gathers candidates at multiple granularities simultaneously, pulling out a multi-sentence passage when a single sentence lacks context, but falling back to finer chunks when the signal is sharp. This multi-granularity retrieval is what flat RAG or single-level context expansion cannot do without blowing up cost.
Joint Training Lifts Embeddings and Tree Structure Together
Standard RAG separates embedding training from retrieval logic. SproutRAG trains end-to-end with a joint objective that optimizes both the embedding space and the tree construction heads. The model learns to allocate attention to heads that produce semantically coherent merges, making the tree itself a better retrieval index.
Evaluation spans HotpotQA (open-domain), 2WikiMultihopQA (multi-hop), QASper (legal), and QASA (scientific). SproutRAG posts consistent IE gains across all four, peaking at 8.1% on the scientific QASA dataset. The paper includes ablation studies confirming that the attention-guided tree construction and hierarchical beam search each contribute roughly half the total lift.
Code is on GitHub at github.com/AmirAbaskohi/SproutRAG, enabling direct reproduction. The framework opens a clear path toward retrieval systems that understand document structure as well as a reader does, without burning inference budget on LLM orchestrators.
Source: SproutRAG: Attention-Guided Tree Search with Progressive Embeddings for Long-Document RAG
Domain: arxiv.org
Comments load interactively on the live page.