Source linked

RL4F Benchmark Puts Model-Based Offline RL Ahead on Fusion Plasma Control

Offline RL benchmark using real DIII-D tokamak data shows model-based methods beat imitation learning on 3 of 4 plasma control tasks, but no single algorithm wins all.

rl4fdiii doffline reinforcement learningnuclear fusionplasma controlmodel based rl

Offline RL benchmark RL4F shows model-based methods beat imitation learning on 3 of 4 plasma control tasks, but no one-size-fits-all solution exists — and that's exactly why we need this standardized testbed.

Why Offline RL for Fusion Matters

Training plasma controllers on live tokamaks is expensive and risky. A single runaway instability can damage the vessel. Offline reinforcement learning sidesteps that risk by learning from historical discharge data. The new RL4F benchmark makes that approach measurable for the first time.

RL4F uses real-world data from the DIII-D tokamak at General Atomics, the largest magnetic fusion experiment in the U.S. The benchmark covers four full-profile tracking tasks: rotation, density, temperature, and pressure. Each requires controlling multiple actuators over long horizons — a realistic proxy for actual reactor control.

What the Numbers Say

The team behind RL4F ran a broad set of baselines under a unified evaluation protocol. Model-based offline RL methods (e.g., MOPO, COMBO) average the highest scores across objectives. They outperformed imitation learning and value-based offline RL on three of the four tasks. Pressure tracking was the lone exception, where no method shone.

No single algorithm dominated all four tasks. That's not a failure — it's a signal that the community needs to specialize methods for different plasma regimes. The benchmark exposes where each approach breaks down.

Open Code, Open Data, Open Questions

RL4F ships with a full codebase, preprocessed datasets from DIII-D, and a closed-loop evaluation environment. That means any RL researcher can start comparing new algorithms on fusion problems without needing access to a tokamak. The authors open-sourced everything, lowering the barrier for cross-disciplinary work.

The next step is clear: run these algorithms on real DIII-D shots to see if offline policies transfer. The benchmark gives us the tools to find out.


Source: Offline Reinforcement Learning for Plasma Control in Nuclear Fusion: Codebase and Benchmark
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.