Source linked

Hugging Face's Open R1 Distills DeepSeek-R1、Beats Claude 3.7 Sonnet on IOI24

ハッギング・フェイスは、DeepSeek-R1の完全にオープンな再生版をリリースし、その中には、厳しいオリンピックの問題でクロード3.7・ソネットを上回る350Kのトラックデータセットと7Bモデルが含まれている。

huggingfacedeepseek r1open r1reasoninglarge language modelscodeforces

A 7B parameter model trained on 100k competitive programming solutions from DeepSeek-R1 beats Claude 3.7 Sonnet on the IOI24 benchmark—and a 32B version surpasses DeepSeek-R1 itself. That's the headline from Hugging Face's Open R1 project, which just shipped the first reproducible chunk of the DeepSeek-R1 pipeline.

Open R1 Hits Step One: 350k Verified Reasoning Traces

The Open R1 repo reproduces DeepSeek-R1 in three planned steps. Step 1 is now complete: Hugging Face released Mixture-of-Thoughts, a curated dataset of 350k verified reasoning traces distilled from DeepSeek-R1. The dataset covers mathematics, coding, and science, designed to teach step-by-step reasoning. Alongside it came OpenR1-Distill-7B, a model trained on that data that replicates the capabilities of DeepSeek's own 7B distill. The training recipe uses GRPO and SFT scripts in src/open_r1/, with a vLLM + FlashAttention stack pinned to CUDA 12.4 and PyTorch 2.6.0.

From Math to Code: Datasets That Outperform the Originals

Hugging Face didn't stop at math. They released CodeForces-CoTs, a dataset of 10k competitive programming problems with 100k solution traces distilled from R1. Also new: IOI24, a benchmark of extremely hard international olympiad problems. Trained on CodeForces-CoTs, a 7B Qwen model outperforms Claude 3.7 Sonnet on IOI24. A 32B version tops DeepSeek-R1 itself. Earlier they released OpenR1-Math-220k, 220k traces on a new version of NuminaMath—and models trained on that match DeepSeek's distilled performance.

Training Recipe for Reproducibility

Every step is scripted and configurable. Use accelerate launch with DeepSpeed ZeRO-3 on 8xH100 nodes, or adapt batch sizes for other topologies. The repo includes sft.py, grpo.py, and generate.py for supervised fine-tuning, reinforcement learning, and synthetic data generation. The configs live in recipes/. Want to reproduce OpenR1-Distill-7B? The exact accelerate launch command with --model_name_or_path open-r1/Qwen2.5-Math-7B-RoPE-300k, --dataset_name open-r1/Mixture-of-Thoughts, and --max_seq_length 32768 is in the README.

Step 2 will replicate the pure RL pipeline that created DeepSeek-R1-Zero, and Step 3 will chain multi-stage training from base model to RL-tuned. The scaffolding is public—the open reproduction of one of 2025's most influential reasoning models is no longer a black box.


Source: Open Reproduction of DeepSeek-R1
Domain: github.com

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.