Source linked

Foresight يمنح الروبوتات أبعاداً عن الترقية، تزايد النجاح 37٪

برنامج VLM في وقت الاختبار الذي يتناقض مع خطواته الخاصة قبل التحرك يزيد من نجاح الملاحة بنسبة 37٪ ويقلل من التدخل البشري إلى النصف في Jetson AGX Orin.

foresightvlmsut austinrobot navigationtest time reasoningreinforcement learning

37% higher success rate and 52% fewer human interventions—that’s what Foresight delivers over state-of-the-art test-time reasoning and foundation-model baselines for mapless navigation from sparse language instructions. And it runs in real-time on a Jetson AGX Orin.

Why Sparse Language Navigation Stalls Without Iterative Reasoning

Open-world mapless navigation has a nasty problem: underspecified goals like “go to the loading dock” leave robots guessing which clues—ramps, signs, detours—actually matter. Prior work either relied on known, closed-set factors or identified cues before motion planning, missing plan-dependent clues. Foresight sidesteps that by making a finetuned VLM alternate between proposing image-space motion plans and critiquing them against the language goal and visual context. Each subsequent plan is conditioned on prior critiques, so the robot refines its motion before taking a step.

RL from Human Feedback Aligns the Critiques

Learning what makes a good critique isn’t trivial. The UT Austin team trained a reward model from human feedback, then used reinforcement learning to post-train the VLM inside the plan-critique loop. That loop—propose, critique, refine, repeat—teaches the model to favor open-set behavior preferences that generalize beyond closed factor categories. No need to enumerate every possible ramp or sign.

Real-World Validation: Six Environments, One Board

Foresight’s offline evaluations and tests across six real-world environments stack up cleanly against baselines: 37% improvement in average task success, 52% reduction in interventions per mission. The system runs on a Jetson AGX Orin at real-time rates, which means the compute fits on a mobile robot without a rack. Code, data, and training details are promised for release, giving the field a concrete test-time reasoning recipe for robot motion refinement.

Foresight turns a static pre-trained VLM into an agent that argues with itself before moving—and that self-critique cycle is what makes the difference between rolling into a wall and finding the loading dock.


Source: Foresight: Iterative Reasoning About Clues that Matter for Navigation
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.