Source linked

The Tool-Overuse Illusion: Knowledge Epistemic Illusion and Outcome-Only Rewards

LLMs' reliance on external tools is not always justified, and this paper shows how to reduce tool usage by 82.8% without sacrificing accuracy.

llm-inferenceknowledge-aware-epistemic-boundary-alignmentoutcome-only-rewardsfrontierautomatedarxiv_ai

The authors of this paper reveal a critical yet under-explored phenomenon in LLMs: tool overuse. They experimentally elucidate its underlying mechanisms through two key lenses. First, they analyze tool-use behavior across different internal knowledge availability regions, identifying a knowledge epistemic illusion: models misjudge internal knowledge boundaries and fail to accurately perceive their actual knowledge availability. To mitigate this, they propose a knowledge-aware epistemic boundary alignment strategy based on direct preference optimization, which reduces tool usage in by 82.8% while yielding an accuracy improvement. Second, they establish a causal link between reward structures and tool-use behavior by visualizing the tool-augmented training process. It reveals that outcome-only rewards inadvertently encourage tool overuse by rewarding only final correctness, regardless of tool efficiency. To verify this, they balance reward signals during training rather than relying on outcome-only rewards, cutting unnecessary tool calls by 66.7% (7B) and 60.7% (32B) without sacrificing accuracy. Finally, they provide theoretical justification for tool overuse, demonstrating a clear technical payload for AI researchers.


Source: The Tool-Overuse Illusion: Why Does LLM Prefer External Tools over Internal Knowledge?

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.