Source linked

KGトリプルは単純なテキストよりも2〜3倍のLLMの注意を盗み、無関係でさえ

無関係な知識グラフのトリプルも、自然言語よりもトークンあたり2〜3倍の注意を指揮し、MistralとLLaMAモデルでデモの注意を42%まで圧縮します。

structural attention taxragknowledge graphsin context learningmistral 7bllama 3 8b

Knowledge graph triples capture 2–3× more attention per token than semantically equivalent natural language text, even when the triples are complete noise. That's the central finding from a new formal analysis of retrieval-augmented generation (RAG) that isolates format from content.

The Structural Attention Tax: 0.70 vs 0.25 per Token

The authors—working with Mistral-7B and LLaMA-3-8B across three QA benchmarks—decompose attention scores into semantic and structural components. KG triples, with their relational delimiters and repeated slot patterns, score roughly 0.70 attention per token against 0.25 for neutral natural-language text. This effect compresses demonstration attention by up to 42%, independent of whether the triples are relevant or noise. That's the structural attention tax: format hijacks the model's limited context window before content even gets a vote.

Task-Matched Retrieval Dominates—But Format Still Bites

Source-task alignment still rules overall performance: BM25 retrieval on the matching corpus achieves 58–62% on HotpotQA, while ConceptNet—even with the same model and gating strategy—drops to 25–27%. That's a >30 percentage point gap that dwarfs all gating strategies (≤2 pp). But within a fixed retrieval source, the structural tax persists. The paper derives a formal compression bound (Proposition 1) linking token-level format bias to demonstration attention loss, and shows that the structural term governs how much attention is diverted while the semantic term governs whether it helps or hurts.

Five Mitigation Strategies, From Zero-Cost to Training-Time

The framework yields five structure-aware mitigations. Format flattening (S3)—rewriting triples as verbalized sentences—is validated by both accuracy and attention-level evidence. Structural dispersal (S1) produces mixed results, illuminating the difficulty of format-level intervention. Other options range from zero-cost prompt modifications to training-time regularisation. The key insight: optimising RAG pipelines now has two orthogonal axes—semantic (what to retrieve) and structural (how to present it). If you're building RAG pipelines, the format of your retrieval chunks is now a first-class optimisation axis—not just what you retrieve, but how you present it.


Source: The Structural Attention Tax: How Retrieval Format Hijacks In-Context Learning Independent of Content
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.