Source linked

General Parsers Are Only 3x Slower Than LR(1) - Rust Benchmark

A head-to-head Rust benchmark of 6 generalized parsers across 22 grammars shows the GLR family costs just a 3x median slowdown over deterministic LR(1), busting the myth that generality must be expensive.

rustgllrnglrbrnglrparsingcompilers

GLR parsers are only 3x slower than LR(1) on deterministic grammars. That's the headline result from the first unified, controlled benchmark of general context-free parsing algorithms, and it's a number that should make every compiler engineer stop assuming they need to hand-craft an LR grammar.

The paper, from an anonymous research team, implements six generalized algorithms — CYK, Valiant, Earley, GLL, RNGLR, and BRNGLR — plus deterministic LL(1) and LR(1) baselines, all in Rust with shared data structures and parse-tree extraction. They ran the lot across 22 grammars, from trivial expression languages all the way up to full C++ and Java specifications.

The 3x Number That Changes the Tradeoff

On deterministic grammars, the GLR family (GLL, RNGLR, BRNGLR) shows a median 3x slowdown compared to LR(1). That variance is narrow and predictable — not the unpredictable explosion you might expect from general algorithms. For non-deterministic grammars, the penalty is naturally larger, but the paper argues that most software engineering tools deal with grammars that are largely deterministic anyway.

CYK and Valiant, the classic cubic-time algorithms, fare worse. Valiant's asymptotically superior $O(n^\omega)$ matrix multiplication approach doesn't translate to practical speedups on realistic grammar sizes — the constant factors and memory overhead eat the theoretical advantage.

Why This Matters for Tools

Right now, developers building compilers, static analyzers, language servers, and fuzzers routinely choose LL or LR parsers because they assume general parsing is too slow. That assumption forces grammar contortions and language simplification — you literally cannot express natural language constructs like C++'s template disambiguation or JavaScript's automatic semicolon insertion without hacks.

Earley and GLL parsers eliminate those constraints. The 3x slowdown is a cost many tools can absorb, especially when general parsers let you keep your grammar clean and your language expressive. The benchmark's design — shared Rust implementation, identical data structures, apple-to-apple parse-tree extraction — means these numbers aren't artifact noise.

The next time you reach for yet another LALR conflict resolution trick, remember: GLR is your default now, not your emergency escape hatch.


Source: An Empirical Comparison of General Context-Free Parsers
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.