Source linked

RAG تعزيز توصيات القراءة LLM بنسبة 26-35 نقطة على ثلاثة النماذج

arxiv.org@systems_wire2 days ago·Artificial Intelligence·10 comments

تمكنت هيكلية أربعة وحدات من الجمع بين إنتاج مكثف للتحقق مع الماجستير في إدارة الموارد البشرية من تحسين الأساسية والتوجيهية بنسبة تصل إلى 35 نقطة بالمائة في Meta LLaMA 4 Scout، LLaMA 3.1 و Google Gemma2.

metallama 4 scoutllama 31google gemma2retrieval augmented generationllm as a judge

Adding retrieval-augmented generation to three modern LLMs boosted groundedness by 26-35 percentage points in a personalized reading content system. That's the headline result from a new architecture by researchers combining RAG with LLaMA 4 Scout, LLaMA 3.1 8B Instant, and Google Gemma2 9B.

Four-Module Pipeline with an Auto-Judge

The system splits into Input, RAG, Generation, and Judging modules. Users specify a question and a target complexity level. RAG pulls relevant info from the Internet to ground the output. Three prompting strategies Chain-of-Thought, zero-shot, and few-shot generate the reading material. An LLM-as-a-Judge module automatically scores answer quality and whether it matches the desired readability.

Consistent Gains Across Models and Prompts

Every model and every prompt strategy saw a lift when RAG was added. Relevance improved, but groundedness the measure of factual anchoring jumped by 26-35 percentage points. That's not a marginal gain; it's the difference between a model making up plausible-sounding text and one sticking to real sources. LLaMA 4 Scout and Gemma2 9B both benefited, though the paper doesn't break out which model gained most.

What This Means for Content Personalization

Tailoring reading material to a user's query and complexity preference is a practical use case that educational platforms and recommendation engines can act on. The architecture is modular: swap in any LLM, any retrieval backend. The auto-judge removes the manual review bottleneck for scaling. I'd like to see a head-to-head comparison of RAG vs. fine-tuning for this task, but the 26-35 point gap makes a strong case for retrieval-first approaches.

Source: Combining Retrieval-Augmented Text Generation with LLMs for Reading Content Recommendations
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

More in Artificial Intelligence

view topic

Budget-Aware Adaptive Patches Expose Query-Visibility Tradeoffs in Black-Box Object Detection

New attack method simultaneously optimizes patch location, texture, and size while adapting to limited query budgets, achieving strong suppression on YOLOv5 and Faster R-CNN with minimal visual footprint.

RegimeVGGT Cuts Cross-Frame Attention Cost 6.7x With Regime-Aware Compression

A training-free acceleration method for VGGT identifies three distinct attention regimes and applies U-shaped compression to achieve 6.7x speedup without quality loss.

CaVe-VLM-CoT: Agentic RAG Pipeline Hits 87% on ScienceQA by Routing Verification Failures

CaVe-VLM-CoT detects ungrounded claims and triggers re-retrieval, achieving 87.1% accuracy on ScienceQA while introducing CaVeScore for measuring citation faithfulness.

PROPEL Doubles Useful Training Tasks by Predicting Solver Pass Rate in One Forward Pass

Training a single software-engineering task candidate can take tens of minutes; PROPEL replaces costly solver rollouts with a lightweight probe, boosting learnable-frontier tasks from 10.1% to 20.0% for a 3B coding...

Comments load interactively on the live page.