Branch Recall@5 jumped from 0.73 to 0.93 across 37 held-out trace families without running a single replay. That's the concrete result from a new paper that frames multi-agent LLM trace debugging as a knowledge-based prediction problem instead of a brute-force search.
The Replay Tax Is Killing Your Debugging Budget
Multi-agent LLM systems generate execution traces that are long, tangled messes of messages, routing decisions, memory writes, and tool calls. The few causally decisive events are buried in unstructured logs. The standard fix is counterfactual replay: wind back time, tweak an event, and re-run to measure its effect. That works, but the cost grows linearly with the number of candidate events. At scale, exhaustive replay is a non-starter.
The authors treat the problem as decision-support: compile each trace into a structured event knowledge graph covering routing, memory, tool-use, uncertainty, and latent evidence, then use a calibrated predictor to decide where to spend a scarce replay budget. No new replay oracle here; they predict the oracle's results without paying the replay cost.
Zero-Replay Prediction via BranchPoint-Latent
BranchPoint-Latent is a lightweight predictor that operates over observable, structural, uncertainty, and latent features extracted from the knowledge graph. A single learning-to-rank gradient-boosted model, calibrated against a deterministic replay oracle on 37 trace families, lifts per-trace localization from 0.73 to 0.93 at zero oracle-replay cost.
The paper explicitly characterizes when cheap graph centrality suffices and when learned evidence is necessary. That is the honest engineering contribution: not claiming universal dominance, but mapping the cost-accuracy frontier so you know which tool to apply.
What This Enables Next
This is an auditable, cost-efficient decision-support system for AI-reliability debugging. The reproducible artifacts mean teams can integrate zero-replay prediction into their CI pipelines today, saving hours of replay compute per trace and cutting debugging time from days to minutes.
Source: Knowledge-Based Zero-Replay Debugging of Multi-Agent LLM Traces
Domain: arxiv.org
Comments load interactively on the live page.