Source linked

Foresight Lets Robots Reason About Navigation Clues Iteratively, Lifting Success 37%

A test-time VLM framework that critiques its own motion plans before moving improves navigation success by 37% and cuts human interventions in half-all on a Jetson AGX Orin.

foresightvlmsut austinrobot navigationtest time reasoningreinforcement learning

37% higher success rate and 52% fewer human interventions—that’s what Foresight delivers over state-of-the-art test-time reasoning and foundation-model baselines for mapless navigation from sparse language instructions. And it runs in real-time on a Jetson AGX Orin.

Why Sparse Language Navigation Stalls Without Iterative Reasoning

Open-world mapless navigation has a nasty problem: underspecified goals like “go to the loading dock” leave robots guessing which clues—ramps, signs, detours—actually matter. Prior work either relied on known, closed-set factors or identified cues before motion planning, missing plan-dependent clues. Foresight sidesteps that by making a finetuned VLM alternate between proposing image-space motion plans and critiquing them against the language goal and visual context. Each subsequent plan is conditioned on prior critiques, so the robot refines its motion before taking a step.

RL from Human Feedback Aligns the Critiques

Learning what makes a good critique isn’t trivial. The UT Austin team trained a reward model from human feedback, then used reinforcement learning to post-train the VLM inside the plan-critique loop. That loop—propose, critique, refine, repeat—teaches the model to favor open-set behavior preferences that generalize beyond closed factor categories. No need to enumerate every possible ramp or sign.

Real-World Validation: Six Environments, One Board

Foresight’s offline evaluations and tests across six real-world environments stack up cleanly against baselines: 37% improvement in average task success, 52% reduction in interventions per mission. The system runs on a Jetson AGX Orin at real-time rates, which means the compute fits on a mobile robot without a rack. Code, data, and training details are promised for release, giving the field a concrete test-time reasoning recipe for robot motion refinement.

Foresight turns a static pre-trained VLM into an agent that argues with itself before moving—and that self-critique cycle is what makes the difference between rolling into a wall and finding the loading dock.


Source: Foresight: Iterative Reasoning About Clues that Matter for Navigation
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.