Skip to content

Add support for int8 quantization backend#37

Open
silveroxides wants to merge 10 commits into
Comfy-Org:mainfrom
silveroxides:feature/int8-tensorwise
Open

Add support for int8 quantization backend#37
silveroxides wants to merge 10 commits into
Comfy-Org:mainfrom
silveroxides:feature/int8-tensorwise

Conversation

@silveroxides

@silveroxides silveroxides commented May 7, 2026

Copy link
Copy Markdown

This PR adds support for int8 quantized models with full backend support and with optimized matmul kernels using triton.

@silveroxides silveroxides marked this pull request as ready for review June 2, 2026 18:03
@silveroxides silveroxides mentioned this pull request Jun 2, 2026
@Walegphil

Copy link
Copy Markdown

I'm really excited for this to be merged in,
I can build the wheel manually and use it, but its gonna remove some of the hassle of setting up once merged!

@Kranos555

Copy link
Copy Markdown

Still waiting for this to be merged.

@lexterslab

Copy link
Copy Markdown

Why does nobody here care about this? Just merge it into main already, you dimwits. Double the speed, same quality, half the size. What are you waiting for?

@Dervlex

Dervlex commented Jun 19, 2026

Copy link
Copy Markdown

@lexterslab You have no clue how coding works right ?
Is this PR reviewed - code correct, without bugs ?
Will it break anything else ?
Has it malicious code inside ?

All this needs to get reviewed before merging in main.
I guess thats what they are waiting for BEFORE merging it in MAIN :)

@silveroxides

Copy link
Copy Markdown
Author

@Dervlex He is not a veteran programmer no. But he wants to support and show that there is interest from users to get this merged and relieve me of the burden of compiling custom wheels and maintaining code which means watching Core ComfyUI repo as a hawk and checking that this does not get any other sudden version bumps and pypi publishes.
This is something I have done since January and the PR has been reviewed by contractor hired by ComfyOrg and report has been given.
All this however without ever bringing me into the picture or any other official communication whatsoever.
If you were to look at my fork you will see a plethora of branches where I have done extensive development using various paths.
The code is actually used via silveroxides/ComfyUI-QuantOps and has been for some months now with daily active users both using prebuilt wheels and compiling themselves.

All of this has been attempted at being communicated to ComfyOrg on the regular but the amount of respose is extremely unusual for open source as is the lack of any single comment regarding it.
Perhaps they expect a PR on the ComfyUI main repo too but tbh, considering there are plenty of other merged PR that has not yet had ComfyUI implementations yet, I can't see why I would add on even more work on top of this.
I already have even more improvements to this PR to push but the problem with lack of communication and delays makes me hesitant to commit anything for the fear of having to babysit this for another month.

@Dervlex

Dervlex commented Jun 19, 2026

Copy link
Copy Markdown

@silveroxides Yeah can totally understand this...
Sad to see that open-source projects are getting stucked through stuff like this.
But what to do else.
To let everyone just push is a much more horrorble idea :/
Not sure how to handle that -> More moderators in the end.

@lexterslab

Copy link
Copy Markdown

@Dervlex Even though I can't code myself, I know very well how it works, I'm a SysAdmin. It's just sad to see that open-source projects from the community are obviously not welcome, while Comfy itself gets worse and worse from update to update.

@aksugat

aksugat commented Jun 19, 2026

Copy link
Copy Markdown

Well its better not be pushed as NVIDIA only since it works on ROCM as well since rdna2-4 has INT8 support. And there is already https://github.com/patientx/ComfyUI-INT8-Fast-ROCM and https://huggingface.co/bertbobson/Ideogram-4-INT8-ConvRot

@lexterslab

Copy link
Copy Markdown

Nobody who seriously works with SD gives a damn about ROCM and Radeon. And calling a model that was scaled from FP8 to FP32, then downcast to BF16 and finally quantized to int8 the "father" is quite a stretch. Besides, it's generally missing the point. It's about native support via Triton, nothing else.

@aksugat

aksugat commented Jun 19, 2026

Copy link
Copy Markdown

And scaling to BF16 from FP8 gives you magical quality that was missing from FP8 :D?? and then scaling to mixedrow int8 i already did tests it was worse than berts Convrot weights. Besides the point The ComfyUI core engine team is working on supporting int8 natively. They know more about INT8 than hobbyists. And to support both platforms ROCm and CUDA thats how reality and real world works in when money at stakes.

@lexterslab

Copy link
Copy Markdown

Wtf are you talking about dude. Quantization only goes one direction: High > Low and not Low > Higher > High > Lower. What kind of bullshit is that supposed to be? In the end you get a crappy quant, nothing more.

XD Yeah, of course they do, thanks for that joke to start the weekend.

@Heliumrich

Copy link
Copy Markdown

@comfyanonymous please merge

@woct0rdho

woct0rdho commented Jun 20, 2026

Copy link
Copy Markdown

Well its better not be pushed as NVIDIA only since it works on ROCM as well since rdna2-4 has INT8 support.

There are Triton kernels which work on both Nvidia and AMD. Once this gets merged I guess there will be even higher demand to support Triton on Nvidia Pascal and AMD RDNA2.

@fappaz

fappaz commented Jun 20, 2026

Copy link
Copy Markdown

This will be a game changer for the community, specially those with modest specs and struggling to upgrade in this moment of memory scarcity. Looking forward to the merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants