Source linked

8B Embodied-R1.5 Beats GPT-5.4 on 16 of 24 Robotics Benchmarks

An 8-parameter model trained on 15B tokens of embodied data surpasses GPT-5.4 and Gemini-Robotics on most VLM benchmarks, then fine-tunes into a VLA that beats pi_0.5 across four manipulation suites.

embodied r15embodied foundation modelsgemini roboticsgpt 54pi 05reinforcement learning

16 out of 24 embodied VLM benchmarks — that’s how many state-of-the-art scores Embodied-R1.5 snatches from models like Gemini-Robotics-ER-1.5 and GPT-5.4. And it does it with only 8 billion parameters.

15B Tokens of Embodied Data, One Architecture

The team behind Embodied-R1.5 built three automated data construction pipelines to cover the capabilities that matter: embodied cognition, task planning, correction, and pointing. That pipeline churned out over 15 billion tokens of training data. To handle the resulting heterogeneity across tasks, they designed a multi-task balanced RL recipe instead of standard reward hacking.

Planner-Grounder-Corrector: Closed-Loop Autonomy

Single models usually choke on long-horizon tasks because they can’t self-correct. Embodied-R1.5 wraps its reasoning loop into a Planner-Grounder-Corrector (PGC) framework. The same model plans, grounds its actions, detects failure, and replans — no separate modules, no external verifier. In zero-shot real-robot experiments, it handled instruction following, affordance grounding, articulated object manipulation, and complex long-horizon tasks without task-specific tuning.

VLA Fine-Tuning with Pocket Change

Because the base model internalizes embodied reasoning, you can turn Embodied-R1.5 into a Vision-Language-Action policy with a surprisingly small amount of data. The resulting VLA outperforms leading models like $\pi_{0.5}$ across four popular manipulation benchmark suites. No need to train from scratch or collect millions of demonstrations.

Open Source, Not a Press Release

The authors released model weights, the full training dataset, training code, and a new evaluation framework called EmbodiedEvalKit designed specifically for embodied tasks. If you’re building a robot that needs to reason, plan, and recover from mistakes, you now have a concrete baseline to beat — and the data to beat it with.


Source: Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.