This is an implementation of SpinQuant and QuaRot for different models like Qwen. We are not intented to do exactly the same things as SpinQuant and QuaRot, instead we provide a framework to customize rotation operations for any models you want to use.
We provide a unified interface to rotate a model.
import rotate
... # do whatever you want
rotate.rotate_model(model, ...) # parameters are customizableYou can find an example for Qwen2ForCausalLM and Qwen2VLForConditionalGeneration in qwen2.5-instruct.py.
The rotation operation on a model can be viewed as sequentially executing a series of predefined operations. Suppose you want to add a rotation operation for a model abc, first create abc.py in rotate/model and define operations as following
from ..common import RotateOperationRegistry
# register the first step of operation to rotate model abc
@RotateOperationRegistry.register(abc)
def first_operation(model: abc, ...):
... # do whatever you want
@RotateOperationRegistry.register(abc)
def second_operation(model: abc, ...):
... # do whatever you wantAfter doing that, rotate.rotate_model(model, ...) will sequantially call first_operation and second_operation to handle model.
To ensure the invariance of a model, we should first fuse some operations of norm into the adjacent linear module.
Formally,
in layer norm, we have
in RSM norm, we have
In LLMs, norm is usually followed by linear.
This implies that
This is done by fuse_layer_norms in rotatioin_utils.py.
The key problem is how fuse_layer_norms should identify the norm layers and their succeeding linear layers in diverse model architectures.
In our framework, to support a model like abc, you must implement a NormLinearIterator in abc.py, which iterates through the model and yields all (father, norm_name, linears) pairs. An example in qwen.py is shown below
from ..common import NormLinearIterator
@NormLinearIterator.register_iterator
class Qwen2NormLinearIterator(NormLinearIterator):
def __init__(self, model: Qwen2ForCausalLM):
super().__init__()
self.model = model
def __iter__(self):
for layer in self.model.model.layers:
yield layer, "input_layernorm", [
layer.self_attn.q_proj,
layer.self_attn.k_proj,
layer.self_attn.v_proj,
]
yield layer, "post_attention_layernorm", [
layer.mlp.up_proj,
layer.mlp.gate_proj,
]
yield self.model.model, "norm", [self.model.lm_head]
@classmethod
def supports_model(cls, model: nn.Module) -> bool:
return isinstance(model, Qwen2ForCausalLM) or isinstance(model, Qwen2VLForConditionalGeneration)The rotation operation on a model can be viewed as applying rotational transformations to either the inputs or outputs of certain layers while ensuring mathematical equivalence before and after rotation.
For different layer types (e.g., embedding and linear), the implementation of rotating their outputs varies. However, at an abstract level, both cases involve rotating outputs.
To streamline the code logic, our framework introduces the AutoOperation class, which encapsulates the same operation across different layers. This eliminates the need for conditional statements when applying the same operation to different layer types.
For details, you can refer to common.py and qwen.py.
Currently, the rotation matrices we use are all random Hadamard matrices, which may not achieve optimal performance. According to SpinQuant, we can adopt a QAT (Quantization-Aware Training)-like approach to learn the rotation matrices for better results. This functionality has not yet been implemented and remains a TODO item.
