Add support for int8 quantization backend by silveroxides · Pull Request #37 · Comfy-Org/comfy-kitchen

silveroxides · 2026-05-07T22:30:53Z

This PR adds support for int8 quantized models with full backend support and with optimized matmul kernels using triton.

…w per_channel flag. Adds dedicated triton per-row dequant kernel and fixes scale broadcasting in eager and CUDA backends.

…rwise

Feature/int8 tensorwise

Merge pull request #5 from silveroxides/feature/int8-tensorwise

…rwise

Feature/int8 tensorwise

Merge pull request #7 from silveroxides/feature/int8-tensorwise

Walegphil · 2026-06-05T14:18:11Z

I'm really excited for this to be merged in,
I can build the wheel manually and use it, but its gonna remove some of the hassle of setting up once merged!

Kranos555 · 2026-06-15T14:33:33Z

Still waiting for this to be merged.

lexterslab · 2026-06-19T07:29:48Z

Why does nobody here care about this? Just merge it into main already, you dimwits. Double the speed, same quality, half the size. What are you waiting for?

Dervlex · 2026-06-19T08:31:10Z

@lexterslab You have no clue how coding works right ?
Is this PR reviewed - code correct, without bugs ?
Will it break anything else ?
Has it malicious code inside ?

All this needs to get reviewed before merging in main.
I guess thats what they are waiting for BEFORE merging it in MAIN :)

silveroxides · 2026-06-19T09:08:16Z

@Dervlex He is not a veteran programmer no. But he wants to support and show that there is interest from users to get this merged and relieve me of the burden of compiling custom wheels and maintaining code which means watching Core ComfyUI repo as a hawk and checking that this does not get any other sudden version bumps and pypi publishes.
This is something I have done since January and the PR has been reviewed by contractor hired by ComfyOrg and report has been given.
All this however without ever bringing me into the picture or any other official communication whatsoever.
If you were to look at my fork you will see a plethora of branches where I have done extensive development using various paths.
The code is actually used via silveroxides/ComfyUI-QuantOps and has been for some months now with daily active users both using prebuilt wheels and compiling themselves.

All of this has been attempted at being communicated to ComfyOrg on the regular but the amount of respose is extremely unusual for open source as is the lack of any single comment regarding it.
Perhaps they expect a PR on the ComfyUI main repo too but tbh, considering there are plenty of other merged PR that has not yet had ComfyUI implementations yet, I can't see why I would add on even more work on top of this.
I already have even more improvements to this PR to push but the problem with lack of communication and delays makes me hesitant to commit anything for the fear of having to babysit this for another month.

Dervlex · 2026-06-19T09:12:28Z

@silveroxides Yeah can totally understand this...
Sad to see that open-source projects are getting stucked through stuff like this.
But what to do else.
To let everyone just push is a much more horrorble idea :/
Not sure how to handle that -> More moderators in the end.

lexterslab · 2026-06-19T09:32:56Z

@Dervlex Even though I can't code myself, I know very well how it works, I'm a SysAdmin. It's just sad to see that open-source projects from the community are obviously not welcome, while Comfy itself gets worse and worse from update to update.

aksugat · 2026-06-19T10:10:37Z

Well its better not be pushed as NVIDIA only since it works on ROCM as well since rdna2-4 has INT8 support. And there is already https://github.com/patientx/ComfyUI-INT8-Fast-ROCM and https://huggingface.co/bertbobson/Ideogram-4-INT8-ConvRot

lexterslab · 2026-06-19T10:30:28Z

Nobody who seriously works with SD gives a damn about ROCM and Radeon. And calling a model that was scaled from FP8 to FP32, then downcast to BF16 and finally quantized to int8 the "father" is quite a stretch. Besides, it's generally missing the point. It's about native support via Triton, nothing else.

aksugat · 2026-06-19T10:36:55Z

And scaling to BF16 from FP8 gives you magical quality that was missing from FP8 :D?? and then scaling to mixedrow int8 i already did tests it was worse than berts Convrot weights. Besides the point The ComfyUI core engine team is working on supporting int8 natively. They know more about INT8 than hobbyists. And to support both platforms ROCm and CUDA thats how reality and real world works in when money at stakes.

lexterslab · 2026-06-19T10:57:39Z

Wtf are you talking about dude. Quantization only goes one direction: High > Low and not Low > Higher > High > Lower. What kind of bullshit is that supposed to be? In the end you get a crappy quant, nothing more.

XD Yeah, of course they do, thanks for that joke to start the weekend.

Heliumrich · 2026-06-19T11:54:34Z

@comfyanonymous please merge

woct0rdho · 2026-06-20T02:08:00Z

Well its better not be pushed as NVIDIA only since it works on ROCM as well since rdna2-4 has INT8 support.

There are Triton kernels which work on both Nvidia and AMD. Once this gets merged I guess there will be even higher demand to support Triton on Nvidia Pascal and AMD RDNA2.

fappaz · 2026-06-20T06:08:19Z

This will be a game changer for the community, specially those with modest specs and struggling to upgrade in this moment of memory scarcity. Looking forward to the merge!

silveroxides and others added 10 commits May 7, 2026 19:03

Add support for int8 tensorwise quantization with backend and layouts

954a4aa

Add functioning cuBLASLt backend for int8 that passes tests.

60a3b2f

Add per-channel weight scaling support to TensorWiseINT8Layout via ne…

750cbd6

…w per_channel flag. Adds dedicated triton per-row dequant kernel and fixes scale broadcasting in eager and CUDA backends.

Merge branch 'merge/upstream-int8-tensorwise' into feature/int8-tenso…

1499ea4

…rwise

Merge pull request #5 from silveroxides/feature/int8-tensorwise

5952e29

Feature/int8 tensorwise

Merge pull request #6 from silveroxides/merge/upstream-int8-tensorwise

db14686

Merge pull request #5 from silveroxides/feature/int8-tensorwise

Merge branch 'merge/upstream-int8-tensorwise' into feature/int8-tenso…

05166ef

…rwise

Merge pull request #7 from silveroxides/feature/int8-tensorwise

11a4aa4

Feature/int8 tensorwise

Merge pull request #8 from silveroxides/merge/upstream-int8-tensorwise

519a902

Merge pull request #7 from silveroxides/feature/int8-tensorwise

Add tests to cover int8 implementation

adbd529

silveroxides marked this pull request as ready for review June 2, 2026 18:03

silveroxides mentioned this pull request Jun 2, 2026

INT8 support #9

Open

Conversation

silveroxides commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Walegphil commented Jun 5, 2026

Uh oh!

Kranos555 commented Jun 15, 2026

Uh oh!

lexterslab commented Jun 19, 2026

Uh oh!

Dervlex commented Jun 19, 2026

Uh oh!

silveroxides commented Jun 19, 2026

Uh oh!

Dervlex commented Jun 19, 2026

Uh oh!

lexterslab commented Jun 19, 2026

Uh oh!

aksugat commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lexterslab commented Jun 19, 2026

Uh oh!

aksugat commented Jun 19, 2026

Uh oh!

lexterslab commented Jun 19, 2026

Uh oh!

Heliumrich commented Jun 19, 2026

Uh oh!

woct0rdho commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fappaz commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

silveroxides commented May 7, 2026 •

edited

Loading

aksugat commented Jun 19, 2026 •

edited

Loading

woct0rdho commented Jun 20, 2026 •

edited

Loading