Sentence-level media bias detectors treat each sentence as an island, missing the contextual cues human annotators rely on. HierBias closes that gap: a hierarchical model that formally conditions bias predictions on document context, and it proves the math works.
Evaluated on the BABE and BASIL benchmarks, HierBias hits 0.853 F1 and 0.723 MCC. That beats the previous state-of-the-art bias detector by 2.6% F1 and 4.3% MCC, with McNemar's test at p < 0.05. Not hand-wavy improvement - a statistically significant jump from modeling context.
Context Reduces Bayes Error, Not Just Empirically
The authors introduce a formal notion they call context-conditioned bias probability. They prove that when inter-sentence mutual information is non-zero, leveraging document context strictly reduces the Bayes error of sentence-level classification. This is not an empirical observation dressed up as theory - it's a theorem about the fundamental limits of the task.
A multi-task generalization bound extends the argument: jointly training binary bias detection with fine-grained bias type classification (four classes) improves sample efficiency on small annotated corpora. That matters because media bias datasets are expensive to annotate at scale.
RoBERTa Encoder Plus Cross-Sentence Aggregator
Architecture is straightforward. A sentence-level RoBERTa encoder produces per-sentence representations. Those pass through a cross-sentence Transformer aggregator that models inter-sentence dependencies. Dual output heads handle the two tasks simultaneously: binary detection and four-class type classification (e.g., lexical vs. structural bias).
Ablation experiments confirm each theoretical component contributes independently. Drop the cross-sentence aggregator, performance falls. Remove the multi-task head, sample efficiency drops. The system earns its complexity.
The next step is obvious: apply this hierarchical context conditioning to other document-level NLP tasks where sentence independence is a known weakness - stance detection, fact-checking, propaganda identification. The theoretical framework generalizes beyond media bias.
Source: HierBias: Context-Conditioned Hierarchical Media Bias Detection with Multi-Task Type Classification
Domain: arxiv.org
Comments load interactively on the live page.