This repository is the official PyTorch implementation of Pion Optimizer, by Kexuan Shi, Hanxuan Li, Zeju Qiu, Yandong Wen, Simon Buchholz, Weiyang Liu.
The code is coming soon. Stay tuned. :)
The RL experiments are built on top of verl. Please follow the installation instructions in verl/README.md to set up the environment.
Before running, you need to edit the scripts and replace the placeholder paths:
/path/to/your/dataset/— path to the preprocessed dataset (see verl data preparation)/path/to/your/model— path to the pretrained model.
We provide a ready-to-use script for training Qwen3-1.7B on the DeepMath dataset using GRPO with the Pion optimizer:
cd verl
bash examples/grpo_trainer/run_qwen3_1.7b_pion_deepmath.sh # for Qwen3-1.7B
bash examples/grpo_trainer/run_distilled_pion_deepmath.sh # for DeepSeek-R1-Distilled-Qwen-1.5BTo run baseline comparisons with AdamW and Muon:
# Qwen3-1.7B
bash examples/grpo_trainer/run_qwen3_1.7b_adamw_deepmath.sh # AdamW
bash examples/grpo_trainer/run_qwen3_1.7b_muon_deepmath.sh # Muon
# DeepSeek-R1-Distilled-Qwen-1.5B
bash examples/grpo_trainer/run_distilled_adamw_deepmath.sh # AdamW
bash examples/grpo_trainer/run_distilled_muon_deepmath.sh # Muon