Orthogonal Model Merging

The Chinese University of Hong Kong

🌐 Homepage | 📑 Paper | 📖 arXiv | 🤗 Models

🔔News

🎉[2026-05-01]: OrthoMerge is accepted by ICML 2026.

🔥[2026-02-06]: We released our paper, models, and codes.

Introduction

We introduce a geometry-preserving model merging framework, called Orthogonal Model Merging (OrthoMerge). For models trained with Orthogonal Finetuning (OFT), the orthogonal matrices representing these transformations are explicit. We map task-specific orthogonal transformations into the Lie algebra, where we perform a magnitude-corrected integration that accounts for both the direction and the intensity of the adaptations. Furthermore, we extend this strategy to models finetuned via standard additive methods (e.g., LoRA, full finetuning), where explicit orthogonal transformations are absent. We introduce an Orthogonal-Residual Decoupling strategy that solves the orthogonal Procrustes problem to extract the implicit orthogonal component from finetuned models. This allows us to merge the orthogonal components of the adaptation on the manifold, while handling the residuals by traditional merging in Euclidean space.

A comparison of (a) current model merging, our (b) orthogonal merging and (c) orthogonal-residual decoupling merging.

An illustration of OrthoMerge. (a) To merge orthogonal transformations, we first map them to the Lie algebra SO(d), perform the merging there with magnitude correction to preserve the strength of the transformations, and finally map the result back to the orthogonal group. (b) For general models, we decouple weights into orthogonal and residual components, merging them separately on the Riemannian manifold formed by the orthogonal group and in Euclidean space, respectively.

Quick Start

Installation

git clone https://github.com/Sphere-AI-Lab/OrthoMerge.git
conda create -n OrthoMerge python=3.10 -y
conda activate OrthoMerge
cd OrthoMerge
pip install -r requirements.txt

Models for Merging Experiments

We utilize the following base models and task-specific fine-tuned models for our experiments.

1. Merging OFT Models

Base Model: meta-llama/Llama-3.1-8B
Task-Specific Adapters: SphereLab/Llama-3.1-8B_OFT_adapters

2. Merging Non-OFT Models

Llama 3.2 Experiments:

Base Model: meta-llama/Llama-3.2-3B
Task-Specific Models: MergeBench Collection (Llama-3.2-3B)

Qwen 2.5 VL Experiments:

Base Model: Qwen/Qwen2.5-VL-7B-Instruct
Task-Specific Models:

Merge

# For OFT models
bash scripts/OrthoMerge_OFT_models.sh

# For non-OFT models
bash scripts/OrthoMerge_non_OFT_models.sh

Replace SVD with Newton–Schulz Iteration

The SVD-based layer-wise Procrustes step is highly time-consuming, especially for larger models. To speed up this step, the original SVD-based solver can be replaced with Newton–Schulz (NS) iteration. NS iteration significantly improves efficiency while maintaining comparable performance.

Original SVD version

def orthogonal_procrustes_torch_right(W1: torch.Tensor, W0: torch.Tensor) -> torch.Tensor:
    # Find R such that W0 @ R ≈ W1
    A = torch.matmul(W0.t(), W1)
    U, _, Vh = torch.linalg.svd(A, full_matrices=False)
    return torch.matmul(U, Vh)

Newton–Schulz version

def orthogonal_procrustes_torch_right(
    W1: torch.Tensor,
    W0: torch.Tensor,
    steps: int = 5,
) -> torch.Tensor:
    # Find R such that W0 @ R ≈ W1
    G = torch.matmul(W0.t(), W1)

    # Newton-Schulz coefficients
    a, b, c = 3.4445, -4.7750, 2.0315

    # Use fp32 for numerical stability
    X = G.float()

    # Normalize before iteration
    X /= X.norm() + 1e-7

    # Usually G is square; keep this for rectangular safety
    transposed = False
    if X.size(0) > X.size(1):
        X = X.T
        transposed = True

    for _ in range(steps):
        A = X @ X.T
        B = b * A + c * (A @ A)
        X = a * X + B @ X

    if transposed:
        X = X.T

    return X.to(G.dtype)

Evaluation

For evaluation environments using lmms-eval, lm-eval-harness, bigcode-eval, and safety-eval, please follow the setup instructions provided in their respective repositories.

# For OFT models
bash scripts/OrthoMerge_OFT_models.sh

# For non-OFT models
bash scripts/OrthoMerge_non_OFT_models.sh

Citation

If you find our work and this codebase helpful, please consider starring this repo and cite:

  @InProceedings{yang2026orthomerge,
      title={Orthogonal Model Merging},
      author={Yang, Sihan and Shi, Kexuan and Liu, Weiyang},
      booktitle={ICML},
      year={2026}
  }

Contact

Sihan Yang: sihany077@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
assets		assets
docs		docs
eval		eval
merge		merge
scripts		scripts
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Orthogonal Model Merging

🔔News

Introduction

Quick Start

Installation

Models for Merging Experiments

1. Merging OFT Models

2. Merging Non-OFT Models

Merge

Replace SVD with Newton–Schulz Iteration

Original SVD version

Newton–Schulz version

Evaluation

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Orthogonal Model Merging

🔔News

Introduction

Quick Start

Installation

Models for Merging Experiments

1. Merging OFT Models

2. Merging Non-OFT Models

Merge

Replace SVD with Newton–Schulz Iteration

Original SVD version

Newton–Schulz version

Evaluation

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages