Only Strong Teachers Beat Repeated Attempts: Feedback Study on 13 Models

Repeated attempts alone can explain most of the accuracy gains in multi-turn language agent interactions — self-generated feedback adds almost nothing beyond simple retrying.

The Controlled Student-Teacher Protocol That Reveals the Truth

J. Lojek and coauthors designed a protocol to separate feedback-driven improvement from gains that come from resampling, format correction, or extra test-time compute. They ran it across four benchmarks — Omni-MATH, Codeforces, BBEH Linguini, and ARC-AGI1 — using thirteen open-weight models as both students and teachers. The setup compares external feedback, self-feedback, and unguided self-refinement while varying interaction history, task difficulty, and whether the teacher has privileged information.

Self-Feedback Is a Hollow Signal

Across all settings, self-generated feedback added little beyond just letting the model retry the same question without any guidance. The paper is blunt: multi-turn improvement is often not evidence of feedback use. If a model talks to itself and gets a better answer, it’s usually just benefiting from another pass at the problem, not from any new information. Only the strongest external teachers — models that can actually provide guidance beyond a generic "try again" — produced substantial feedback-specific accuracy gains.

The Real Bottleneck: Acting on Feedback, Not Having It

Dense student-teacher interaction matrices revealed something more interesting. Interactive gains are driven more by the student's ability to use feedback than by which teacher is talking. Teacher identity still matters for a fixed student, but the bigger lever is whether the student can actually change its behavior based on what it hears. The authors argue this flips the usual assumption: it’s not about who gives the advice, but whether the agent can take it. They released the full evaluation framework at https://j-lojek.github.io/feedback-generation-is-a-bottleneck/ for anyone to run the same controlled tests on their own agents.

Source: What Drives Interactive Improvement from Feedback?
Domain: arxiv.org

Only Strong Teachers Beat Repeated Attempts: Feedback Study on 13 Models

The Controlled Student-Teacher Protocol That Reveals the Truth

Self-Feedback Is a Hollow Signal

The Real Bottleneck: Acting on Feedback, Not Having It

More in Artificial Intelligence