Source linked

EvidenceLens audite les LLM financiers avec une matrice de réclamation-évidence

Un nouveau système décompose les réponses LLM en revendications atomiques et les aligne avec les tables source, les graphiques et le texte, rendant la synthèse non soutenue immédiatement visible.

evidencelensfinancial qallm auditingclaim evidence matrixvisual analyticsai safety

You've seen it before: an LLM answers a question about an annual report with perfect fluency, but tracing whether that revenue figure came from the actual table or was just a plausible guess takes twenty minutes of cross-referencing. EvidenceLens, a new visual analytics prototype from a team of researchers, treats this as a claim-evidence alignment problem and makes the audit trail visible in a single matrix.

The Auditability Problem in Financial QA

Financial workflows run on documents like earnings decks, analyst notes, and 10-Ks. LLMs are increasingly asked to answer questions over these multimodal sources, but their outputs blend directly grounded statements, weak synthesis, and complete fabrications. A persuasive answer hides the distinction. EvidenceLens decomposes the LLM's response into atomic claims, then scores each claim by how much support it has across text, tables, and charts.

The system's core representation is a multimodal claim-evidence matrix. Rows are individual claims from the answer; columns represent evidence sources (passages, table cells, chart regions). Color and opacity reveal coverage, contradiction, and modality imbalance at a glance. If a claim about gross margin has support only from a chart region but not from the corresponding table cell, the matrix makes that gap obvious.

How EvidenceLens Visualizes Claim-Evidence Alignment

EvidenceLens doesn't just show a matrix and walk away. It includes a deterministic review-priority ranking that maps backend signals into an auditable visual structure. The prototype also ships a JSON-based artifact schema and a lightweight multimodal alignment pipeline, so the approach can be replicated and extended.

In representative auditing scenarios, the authors demonstrate that analysts can distinguish grounded claims from overconfident synthesis far faster than with conventional chat interfaces. Those interfaces flatten everything into linear text; EvidenceLens preserves the provenance of each atomic piece of the answer.

What This Means for High-Stakes LLM Deployments

The financial industry doesn't tolerate black boxes. When an LLM's answer drives a trading decision or a regulatory filing, being able to prove where each claim came from isn't optional. EvidenceLens shows that auditability isn't a separate post-processing step but a design principle that can be built into the question-answering pipeline itself.

EvidenceLens points toward a future where every LLM answer in regulated workflows comes with a built-in audit trail.


Source: EvidenceLens: A Claim-Evidence Matrix for Auditing Financial Question Answering
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.