Shipping a trillion parameters just got a lot cheaper. Async RL has a dirty secret: every step, the trainer has to ship the whole model to the inference engine. For a 7B model in bf16, that's 14 GB. For a frontier 1T model checkpoint, that's on the order of a terabyte. Per step.
The One Terabyte Problem Hugging Face landed a TRL PR that encodes just the changed elements as a sparse safetensors file, uploads it to a Hugging Face Bucket, and tells vLLM to fetch it. On Qwen3-0.6B, the per-step payload drops from 1.2 GB to 20-35 MB.
Why bf16 RL Weights Are Almost Always Sparse The "98% of weights do not change" claim sounds suspiciously like one of those numbers that works in the demo and falls apart in the wild. It's not. It falls out of how bf16 arithmetic works at the learning rates RL uses. A bf16 number has 7 mantissa bits. Between two consecutive powers of two, there are exactly $2^7 = 128$ representable values, so the spacing between adjacent bf16 numbers around $|w|$ is roughly $|w| \cdot 2^{-7}$.
Delta Weight Sync in Action The cherry on top: Hugging Face ran a full disaggregated training where the trainer was on one box, vLLM lived in a Hugging Face Space, the Wordle environment lived in another Space, and weights flowed through a single Hub bucket. No shared cluster, no RDMA, no VPN. Async RL just got a lot cheaper. With this breakthrough, Hugging Face enables cheaper async RL training, paving the way for more efficient and scalable model development.
Source: Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL
Domain: huggingface.co
Comments load interactively on the live page.