1,820 Annotated Tables, 9 Languages: New Benchmark Exposes Extraction Failures ニュース

Q: What is the significance of: 1,820 Annotated Tables, 9 Languages: New Benchmark Exposes Extraction Failures ニュース?

PulseBench-Tabのテーブルのほぼ半数が合併細胞を含み、ほとんどの商用抽出機は、新しいグラフベースの忠誠度メトリックで60%未満のスコアを獲得しています。

48.1% of tables in the new PulseBench-Tab benchmark have merged or spanning cells — a structural quirk that flattens most commercial extractors.

Existing table extraction benchmarks dodge hard cases. PulseBench-Tab doesn't. Its 1,820 human-annotated tables come from 380 real-world documents: financial filings, government reports, and regulatory disclosures. Nine languages across Latin, CJK, Arabic, and Cyrillic scripts. Table sizes range from 2 cells to 1,183. No cherry-picked clean layouts.

T-LAG Turns Tables Into Graphs

The authors propose T-LAG (Table Logical Adjacency Graph), an evaluation metric that models each table as a directed graph over cell adjacencies. Instead of pixel-level overlap or cell-by-cell F1, T-LAG computes structural and content fidelity in a single score via optimal bipartite matching. If an extractor misreads a column span or drops a row boundary, the graph mismatch shows up immediately. Traditional cell-level metrics score that as a single error; T-LAG sees the ripple effect.

Nine Systems, Uneven Results

PulseBench-Tab evaluates 9 commercial and open-source table extraction systems. The paper reports per-language breakdowns. Systems that perform well on clean English PDFs drop sharply on Arabic or CJK documents with merged cells. Nearly half the benchmark tables contain spans — the kind of layout that breaks naive row-column parsers.

The full dataset, scoring code, and all provider outputs are publicly available on GitHub. Teams can now pinpoint exactly which cell-relationships their extractors get wrong — and that's where the next generation of table parsers will improve.

Source: PulseBench-Tab: A Multilingual Benchmark for Table Extraction with Graph-Based Evaluation
Domain: arxiv.org

1,820 Annotated Tables, 9 Languages: New Benchmark Exposes Extraction Failures ニュース

T-LAG Turns Tables Into Graphs

Nine Systems, Uneven Results

More in Artificial Intelligence