[WIP] Generic quantization support for PEFT methods by BenjaminBossan · Pull Request #3117 · huggingface/peft

BenjaminBossan · 2026-03-25T17:23:41Z

Problem

Right now, if a new PEFT method wants to add support for quantized layers, it requires a significant amount of work. Notably, the method needs to implement dedicated layer classes for each quantization method (e.g. one class for bnb 4bit, one for bnb 8bit, one for AWQ, ...). These classes typically are >90% boilerplate and the actual difference between implementations of these classes is minimal.

The result of that is that, at the moment, most PEFT methods don't support any, or only very few, quantization methods, even though the amount of actual logic required to support these methods is relatively small.

Migration

For LoRA, we have already implemented the layer classes for each supported quantization method. For the sake of consistency, it could still make sense to migrate LoRA to the new approach if it's accepted. This needs to be accompanied by detailed regression testing to ensure that everything keeps working. I would only suggest to deprecate and remove abandoned quantization methods (perhaps for a v1.0 release).

The bigger issue, however, is that packages that depend on PEFT may break with this change. As an example, if they detect quantized layers via isinstance checks, those would break as all layers would just be normal lora.Linear, lora.Conv2d etc. The approach here would most likely involve deprecating the import of these classes. I think it's also possible to "cheat" isinstance and pretend like there is inheritance when there isn't but I'd like to avoid that.

Anyway, this is out of scope of this PR and will be addressed in the future.

Scope

Updating all PEFT methods is too much for a single PR. This PR focuses on only three PEFT methods for now:

MiSS: A pretty normal PEFT method, representative of many other PEFT methods.
BOFT: Also pretty normal, but requires slight rewrite of the forward step. Similar changes may be required for other methods too.
VeRA: Already supports bnb but with this PR, specific bnb layers are no longer needed.

Right now, if a new PEFT method wants to add support for quantized layers, it requires a significant amount of worker. Notably, the method needs to implement dedicated layer classes for each quantization method (e.g. one class for bnb 4bit, one for bnb 8bit, one for AWQ, ...). The result of that is that, at the moment, most PEFT methods don't support any, or only very few, quantization methods, even though the amount of actual logic required to support these methods is quite contained. This PR is a suggestion of how to solve the issue. If this approach is accepted, with a few extra lines, we should be able to support all quantization methods in all PEFT methods. The PR is not in a finished state, more to follow. Right now, only VeRA and MiSS have been updated as a POC.

HuggingFaceDocBuilderDev · 2026-03-25T17:27:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

no reason why it would be nn.Linear instead of nn.Module like the other PEFT methods

+ some docstring cleanups

Copilot

Pull request overview

This PR introduces a generic “quantization backend” abstraction so PEFT tuner layers can support multiple quantization frameworks without needing per-backend layer subclasses, and wires it into VeRA and MiSS as an initial proof-of-concept.

Changes:

Add Quantizationbackend implementations + backend resolution (resolve_quantization_backend) and a helper to surface backend info in module repr.
Extend BaseTunerLayer with get_base_weight / set_base_weight to centralize dequantize/requantize handling for merge/unmerge.
Surface quantization backend info in get_layer_status() / get_model_status() and add tests (including a new quantization matrix test file).

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`tests/test_tuners_utils.py`	Adds coverage for layer/model status reporting of `quantization_backend`.
`tests/test_quantization.py`	Adds a PEFT-method × quant-backend matrix test suite (bnb/torchao loaders and core behavioral checks).
`src/peft/utils/quantization_utils.py`	Introduces quantization backend classes, backend resolution logic, and repr helper.
`src/peft/utils/__init__.py`	Exports the new quantization helpers from `peft.utils`.
`src/peft/tuners/vera/model.py`	Updates VeRA injection to remove bnb-specific module creation and to forward torchao merge metadata.
`src/peft/tuners/vera/layer.py`	Hooks VeRA layers into the new backend mechanism for merge/unmerge and forward safety cloning.
`src/peft/tuners/vera/bnb.py`	Removes VeRA’s dedicated bitsandbytes layer implementations (intended to be superseded by generic backend support).
`src/peft/tuners/tuners_utils.py`	Adds `quantization_backend` attribute and centralized base-weight getters/setters to `BaseTunerLayer`.
`src/peft/tuners/miss/model.py`	Forwards torchao merge metadata during MiSS injection.
`src/peft/tuners/miss/layer.py`	Hooks MiSS layers into the new backend mechanism for merge/unmerge and forward safety cloning.
`src/peft/peft_model.py`	Extends tuner status dataclasses + status functions to report quantization backend consistency.

Comments suppressed due to low confidence (1)

src/peft/tuners/vera/model.py:259

This PR removes the dedicated VeRA bitsandbytes layer implementations, but peft.tuners.vera still has lazy attribute resolution for Linear8bitLt / Linear4bit via from .bnb import ... (see src/peft/tuners/vera/__init__.py). With vera/bnb.py deleted, those imports will raise at runtime and existing tests/imports that reference peft.tuners.vera.Linear8bitLt will break. Please update the VeRA package exports to match the new generic quantization approach (either provide compatible aliases or remove the lazy attributes).

    @staticmethod
    def _create_new_module(vera_config, vera_A, vera_B, adapter_name, target, **kwargs):
        bias = kwargs.pop("bias", False)

        if isinstance(target, BaseTunerLayer):
            target_base_layer = target.get_base_layer()
        else:
            target_base_layer = target

        if isinstance(target_base_layer, torch.nn.Linear):
            if kwargs["fan_in_fan_out"]:
                warnings.warn(
                    "fan_in_fan_out is set to True but the target module is `torch.nn.Linear`. "
                    "Setting fan_in_fan_out to False."
                )
                kwargs["fan_in_fan_out"] = vera_config.fan_in_fan_out = False
        elif isinstance(target_base_layer, Conv1D):
            kwargs["is_target_conv_1d_layer"] = True
            if not kwargs["fan_in_fan_out"]:
                warnings.warn(
                    "fan_in_fan_out is set to False but the target module is `Conv1D`. Setting fan_in_fan_out to True."
                )
                kwargs["fan_in_fan_out"] = vera_config.fan_in_fan_out = True
        else:
            raise ValueError(
                f"Target module {target} is not supported. Currently, only the following modules are supported: "
                "`torch.nn.Linear`, `transformers.pytorch_utils.Conv1D`."
            )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Also: skip bnb4bit + CPU

Conv not fully fleshed out

Forgot to update some tests

BenjaminBossan · 2026-03-31T14:41:56Z

-            rotated_weight = torch.transpose(rotated_weight, 0, 1)
-
-            scaled_rotated_weight = rotated_weight * boft_scale
+            x_rotated = x @ boft_rotation


This is a reformulation of the forward path of BOFT that avoids using the base layer weight directly. This was necessary because calling torch.mm(boft_rotation, orig_weight) can fail with quantized weights. Instead, we should make a forward pass and let the quantized layer handle the details. I ran the BOFT tests with the old and the new implementation and added an assert that they are identical (up to precision).

Regarding runtime, I checked the MetaMath benchmark and got 147 sec for 250 steps (116 sec for 1 eval run) using main branch, and 138 sec (108 sec) using the code from this branch. So the new code seems to be on par or possibly slightly faster than the old one.

github-actions · 2026-04-25T15:16:03Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

BenjaminBossan · 2026-04-27T09:15:16Z

not stale

BenjaminBossan · 2026-05-07T14:35:24Z

@Joluck It would be great if you could check the MiSS change.
@zqiu24 It would be great if you could check the BOFT change.

BenjaminBossan changed the title ~~[WIP] Generic quantization support for PEFT~~ [WIP] Generic quantization support for PEFT methods Mar 26, 2026

BenjaminBossan added 4 commits March 26, 2026 12:48

Several fixes to MiSS

b0c63b1

VeRA better dtype handling in forward

7d68da6

Change VeRA inheritance

b83ef1a

no reason why it would be nn.Linear instead of nn.Module like the other PEFT methods

Reviewer feedback: s/quant/quantization

58c8242

+ some docstring cleanups

BenjaminBossan requested a review from Copilot March 26, 2026 14:31

Copilot started reviewing on behalf of BenjaminBossan March 26, 2026 14:31 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

Comment thread tests/test_quantization.py Outdated

Comment thread src/peft/utils/quantization_utils.py Outdated

Comment thread tests/test_quantization.py Outdated

Comment thread tests/test_quantization.py Outdated

Comment thread tests/test_quantization.py

BenjaminBossan added 5 commits March 26, 2026 16:25

Copilot: Better class name, remove dead code

cf55c1b

Copilot: Adjust tests, tolerances

b02bcdc

Also: skip bnb4bit + CPU

Utility to get quantization kwargs

5c6b679

Support for BOFT

58d261e

Conv not fully fleshed out

Remove more VeRA bnb remnants

a1e82e1

Forgot to update some tests

BenjaminBossan commented Mar 31, 2026

View reviewed changes

BenjaminBossan mentioned this pull request Apr 9, 2026

[TinyLoRA]tinylora implementation #3024

Merged

BenjaminBossan added the wip label Apr 27, 2026

BenjaminBossan added 6 commits May 6, 2026 16:35

Merge branch 'main' into refactor-quantization-support

7d3d51b

Also test safe-merge path

cbdca27

Some fixes required after latest refactor

5d0d82d

Fix bug from merge conflict resolution in MiSS

5b8011c

Add GPTQModel tests

a89273e

Add GPTQModel support

2efe999

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Generic quantization support for PEFT methods#3117

[WIP] Generic quantization support for PEFT methods#3117
BenjaminBossan wants to merge 16 commits intohuggingface:mainfrom
BenjaminBossan:refactor-quantization-support

BenjaminBossan commented Mar 25, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BenjaminBossan Mar 31, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

BenjaminBossan commented Apr 27, 2026

Uh oh!

BenjaminBossan commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

BenjaminBossan commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Suggested solution

Migration

Scope

Uh oh!

HuggingFaceDocBuilderDev commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BenjaminBossan Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

BenjaminBossan commented Apr 27, 2026

Uh oh!

BenjaminBossan commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BenjaminBossan commented Mar 25, 2026 •

edited

Loading