Source linked

BranchPoint-Latent Predicts Debugging Effects Without Replaying Multi-Agent LLM Traces

A lightweight predictor raises per-trace localization from 0.73 to 0.93 on 37 trace families by predicting counterfactual effects before any replay runs.

branchpoint latentmulti agent systemsllm debuggingknowledge graphai reliabilityarxiv

Branch Recall@5 jumped from 0.73 to 0.93 across 37 held-out trace families without running a single replay. That's the concrete result from a new paper that frames multi-agent LLM trace debugging as a knowledge-based prediction problem instead of a brute-force search.

The Replay Tax Is Killing Your Debugging Budget

Multi-agent LLM systems generate execution traces that are long, tangled messes of messages, routing decisions, memory writes, and tool calls. The few causally decisive events are buried in unstructured logs. The standard fix is counterfactual replay: wind back time, tweak an event, and re-run to measure its effect. That works, but the cost grows linearly with the number of candidate events. At scale, exhaustive replay is a non-starter.

The authors treat the problem as decision-support: compile each trace into a structured event knowledge graph covering routing, memory, tool-use, uncertainty, and latent evidence, then use a calibrated predictor to decide where to spend a scarce replay budget. No new replay oracle here; they predict the oracle's results without paying the replay cost.

Zero-Replay Prediction via BranchPoint-Latent

BranchPoint-Latent is a lightweight predictor that operates over observable, structural, uncertainty, and latent features extracted from the knowledge graph. A single learning-to-rank gradient-boosted model, calibrated against a deterministic replay oracle on 37 trace families, lifts per-trace localization from 0.73 to 0.93 at zero oracle-replay cost.

The paper explicitly characterizes when cheap graph centrality suffices and when learned evidence is necessary. That is the honest engineering contribution: not claiming universal dominance, but mapping the cost-accuracy frontier so you know which tool to apply.

What This Enables Next

This is an auditable, cost-efficient decision-support system for AI-reliability debugging. The reproducible artifacts mean teams can integrate zero-replay prediction into their CI pipelines today, saving hours of replay compute per trace and cutting debugging time from days to minutes.


Source: Knowledge-Based Zero-Replay Debugging of Multi-Agent LLM Traces
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.