smolR1

Demonstrating a reproducible DeepSeek R1 implementation using Qwen2.5B‑0.5B on two 4090 GPUs, providing a compact, stable GRPO baseline for rapid RL experimentation.

Overview

reproducing DeepSeek’s R1 on the smallest scale with Qwen2.5B-0.5B on two 4090 GPUs.
a smol and stable baseline for rapid experimentation.