Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

This repository is the official PyTorch implementation of Pion Optimizer, by Kexuan Shi, Hanxuan Li, Zeju Qiu, Yandong Wen, Simon Buchholz, Weiyang Liu.

The code is coming soon. Stay tuned. :)

Running RL Experiments

Environment Setup

The RL experiments are built on top of verl. Please follow the installation instructions in verl/README.md to set up the environment.

Before running, you need to edit the scripts and replace the placeholder paths:

/path/to/your/dataset/ — path to the preprocessed dataset (see verl data preparation)
/path/to/your/model — path to the pretrained model.

Running GRPO Training with Pion Optimizer

We provide a ready-to-use script for training Qwen3-1.7B on the DeepMath dataset using GRPO with the Pion optimizer:

cd verl
bash examples/grpo_trainer/run_qwen3_1.7b_pion_deepmath.sh # for Qwen3-1.7B
bash examples/grpo_trainer/run_distilled_pion_deepmath.sh # for DeepSeek-R1-Distilled-Qwen-1.5B

To run baseline comparisons with AdamW and Muon:

# Qwen3-1.7B
bash examples/grpo_trainer/run_qwen3_1.7b_adamw_deepmath.sh   # AdamW
bash examples/grpo_trainer/run_qwen3_1.7b_muon_deepmath.sh    # Muon

# DeepSeek-R1-Distilled-Qwen-1.5B
bash examples/grpo_trainer/run_distilled_adamw_deepmath.sh    # AdamW
bash examples/grpo_trainer/run_distilled_muon_deepmath.sh     # Muon

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
docs		docs
verl		verl
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

Running RL Experiments

Environment Setup

Running GRPO Training with Pion Optimizer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

Running RL Experiments

Environment Setup

Running GRPO Training with Pion Optimizer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages