How is alternating optimization for MSE-optimized grids implemented in MR-GPTQ / FP-Quant?

Hi, thanks for the great work.

I have a question about the implementation of **Ingredient 1: MSE-Optimized Grids** described in the MR-GPTQ paper.

In the paper, the objective is written as:

$$
\min_{s_T, s_{G_1},...,s_{G_k}} \sum_i \left\lVert \hat{X_i} - X_i \right\rVert_2^2
$$

The paper says:

> We solve this by using alternating optimization over the block scales and the per-tensor scale, respectively.

Could you clarify how this alternating optimization is performed in practice?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is alternating optimization for MSE-optimized grids implemented in MR-GPTQ / FP-Quant? #25

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

How is alternating optimization for MSE-optimized grids implemented in MR-GPTQ / FP-Quant? #25

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions