Source linked

VERITAS Feeds Verifier Errors Back Into Proof Search, Hits 40.6% on miniF2F

By routing syntax errors and type mismatches into a critic-guided MCTS pass, VERITAS solves 7.3% of a hard combinatorics benchmark where Best-of-5 managed only 1.8%.

veritasminif2fcombinatoricsformal theorem provingmctszero shot

VERITAS solves 40.6% of miniF2F theorems without any fine-tuning, beating a Best-of-5 baseline that uses the same LLM by 3.7 percentage points. That gap is the whole point: most LLM-based provers throw away every signal from the verifier except a pass/fail bit. VERITAS keeps everything - syntax errors, type mismatches, partial goal progress - and routes it back into search.

Two-phase protocol that turns failures into fuel

The VERITAS team builds a zero-shot framework with two phases. Phase 1 runs standard Best-of-N sampling to generate candidate proofs. Phase 2 then hands those failures - not just the final verifier rejection but the exact line numbers and error types - to a critic-guided Monte Carlo Tree Search pass. The critic uses Phase 1 errors as explicit negative examples, so the MCTS exploration avoids dead paths that already failed. Crucially, the protocol preserves every theorem solved by Phase 1 alone; any additional solves in Phase 2 are directly attributable to feedback-driven exploration, not just more sampling.

On miniF2F the gain is modest but clear: 40.6% vs 36.9% for Best-of-5. The real stress test is VERITAS-CombiBench, a 55-theorem combinatorics benchmark the authors release alongside the paper. Here Best-of-5 craters to 1.8% - even below a simple Portfolio approach at 3.6% - while VERITAS hits 7.3%. Unguided sampling is toxic when the verifier must be consulted iteratively to recover correct lemma names.

Rich feedback beats raw compute

I've seen plenty of papers treat the verifier as a black box oracle. VERITAS proves that opening that box, even with no additional training data, can squeeze significantly more capability out of the same LLM. The authors release artifacts on GitHub, so anyone can replicate the two-phase search on their own formal domains. Expect this approach to spread beyond theorem proving into any code generation task where a compiler or type checker can emit structured error messages - that's where the real leverage lives.


Source: VERITAS: Verifier-Guided Proof Search for Zero-Shot Formal Theorem Proving
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.