Red Queen Gödel Machine Co-Evolves Agents and Evaluators for Self-Improvement

1.91x. That's how much more likely a baseline AI reviewer was to accept AI-generated papers over human-written ones. The Red Queen Gödel Machine (RQGM) fixes that by making evaluation itself an evolving adversary.

Why Static Evaluators Fail Self-Improving Agents

Every self-improving agent today assumes a fixed verifier: a benchmark, a labeled dataset, or a static judge. That works until the agent's improvements exploit the evaluation criterion rather than general capability. The RQGM paper, posted on arXiv, argues this mirrors a missing piece of biological evolution: species don't adapt to a fixed landscape; they co-evolve with predators, prey, and changing environments. The solution is to put the evaluator inside the improvement loop.

RQGM's Controlled Evolution in Epochs

The trick is controlled utility evolution. The RQGM organizes search into epochs with a fixed in-epoch evaluation criterion. At epoch boundaries, the utility can shift. That preserves self-improvement guarantees within each epoch while letting the objective evolve across them. No chaos, no undefined behavior - just a rigorous way to let agents and their evaluators co-adapt.

Beating Baselines on Coding, Writing, and Grading

On verifiable coding tasks, the RQGM adds a complementary agent-as-a-judge code-review signal that improves test pass rate over the prior state-of-the-art while using 1.35x to 1.72x fewer tokens. That's cheaper and better. For scientific paper writing, co-evolved writers achieved 1.78x to 1.86x higher acceptance rates under a diverse panel of AI judges. Co-evolved graders hit 9% higher ground-truth accuracy. The most striking fix: the baseline reviewer's 1.91x over-acceptance of AI papers vanished when the RQGM introduced an adversarial objective that forced reviewers to be equally stringent on human and AI work.

By embedding evaluation into the evolutionary loop, RQGM suggests a path to agents that can escape the Nash equilibrium of static benchmarks - and maybe even produce reviewers that don't favor their own kind.

Source: The Red Queen G"odel Machine: Co-Evolving Agents and Their Evaluators
Domain: arxiv.org

Red Queen Gödel Machine Co-Evolves Agents and Evaluators for Self-Improvement

Why Static Evaluators Fail Self-Improving Agents

RQGM's Controlled Evolution in Epochs

Beating Baselines on Coding, Writing, and Grading

More in Artificial Intelligence