Source linked

Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI

arxiv.org@frontier_wire6 days ago·Artificial Intelligence & Machine Learning·4 comments

A new evaluation framework for rule-governed AI systems, which shifts from agreement with historical labels to reasoning-grounded validity under explicit rules.

rule-governed AIevaluation frameworkdefensibility signalsambiguity indexprobabilistic defensibility signalfrontier

The authors formalize evaluation as policy-grounded correctness and introduce the Defensibility Index (DI) and Ambiguity Index (AI) to measure reasoning stability without additional audit passes. The Probabilistic Defensibility Signal (PDS) is derived from audit-model token logprobs and is used to verify whether a proposed decision is logically derivable from the governing rule hierarchy. The authors validate the framework on 193,000+ Reddit moderation decisions across multiple communities and evaluation cohorts, finding a significant gap between agreement-based and policy-grounded metrics. They further show that measured ambiguity is driven by rule specificity and that a Governance Gate built on these signals achieves 78.6% automation coverage with 64.9% risk reduction. This preprint is critical for Principal Engineers, CISOs, ML Researchers, and Technical Founders as it provides a novel evaluation framework for rule-governed AI systems.

Source: Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI

Read original source ->

External source stays available while the OJO article and comment thread stay local.

More in Artificial Intelligence & Machine Learning

view topic

Multimodal Machine Learning for Ejection Fraction Diagnosis from Electrocardiograms

A new multimodal ML framework combines ECG and EHR features to classify LVEF, outperforming baselines and maintaining performance under temporal validation.

Intelligent Fault Diagnosis for General Aviation Aircraft via Multi-Fidelity Digital Twin and FMEA Knowledge Enhancement

A novel framework for fault diagnosis in general aviation aircraft achieves 96.2% Macro-F1 using multi-fidelity digital twins and FMEA-driven fault injection.

Spectral Lifecycle of Transformer Training: Transient Compression Waves, Persistent Spectral Gradients, and Q/K--V Asymmetry

A systematic study of weight matrix singular value spectra during transformer pretraining reveals three phenomena that fundamentally change how we understand transformer training.

Artifact-based Agent Framework for Adaptive and Reproducible Medical Image Processing

A novel framework for adaptive and reproducible medical image processing addresses the limitations of current medical imaging research by introducing adaptability and reproducibility.

Comments load interactively on the live page.