The authors of this paper reveal a critical yet under-explored phenomenon in LLMs: tool overuse. They experimentally elucidate its underlying mechanisms through two key lenses. First, they analyze tool-use behavior across different internal knowledge availability regions, identifying a knowledge epistemic illusion: models misjudge internal knowledge boundaries and fail to accurately perceive their actual knowledge availability. To mitigate this, they propose a knowledge-aware epistemic boundary alignment strategy based on direct preference optimization, which reduces tool usage in by 82.8% while yielding an accuracy improvement. Second, they establish a causal link between reward structures and tool-use behavior by visualizing the tool-augmented training process. It reveals that outcome-only rewards inadvertently encourage tool overuse by rewarding only final correctness, regardless of tool efficiency. To verify this, they balance reward signals during training rather than relying on outcome-only rewards, cutting unnecessary tool calls by 66.7% (7B) and 60.7% (32B) without sacrificing accuracy. Finally, they provide theoretical justification for tool overuse, demonstrating a clear technical payload for AI researchers.
Source: The Tool-Overuse Illusion: Why Does LLM Prefer External Tools over Internal Knowledge?
Comments load interactively on the live page.