Add support for int8 quantization backend#37
Conversation
|
I'm really excited for this to be merged in, |
|
Still waiting for this to be merged. |
|
Why does nobody here care about this? Just merge it into main already, you dimwits. Double the speed, same quality, half the size. What are you waiting for? |
|
@lexterslab You have no clue how coding works right ? All this needs to get reviewed before merging in main. |
|
@Dervlex He is not a veteran programmer no. But he wants to support and show that there is interest from users to get this merged and relieve me of the burden of compiling custom wheels and maintaining code which means watching Core ComfyUI repo as a hawk and checking that this does not get any other sudden version bumps and pypi publishes. All of this has been attempted at being communicated to ComfyOrg on the regular but the amount of respose is extremely unusual for open source as is the lack of any single comment regarding it. |
|
@silveroxides Yeah can totally understand this... |
|
@Dervlex Even though I can't code myself, I know very well how it works, I'm a SysAdmin. It's just sad to see that open-source projects from the community are obviously not welcome, while Comfy itself gets worse and worse from update to update. |
|
Well its better not be pushed as NVIDIA only since it works on ROCM as well since rdna2-4 has INT8 support. And there is already https://github.com/patientx/ComfyUI-INT8-Fast-ROCM and https://huggingface.co/bertbobson/Ideogram-4-INT8-ConvRot |
|
Nobody who seriously works with SD gives a damn about ROCM and Radeon. And calling a model that was scaled from FP8 to FP32, then downcast to BF16 and finally quantized to int8 the "father" is quite a stretch. Besides, it's generally missing the point. It's about native support via Triton, nothing else. |
|
And scaling to BF16 from FP8 gives you magical quality that was missing from FP8 :D?? and then scaling to mixedrow int8 i already did tests it was worse than berts Convrot weights. Besides the point The ComfyUI core engine team is working on supporting int8 natively. They know more about INT8 than hobbyists. And to support both platforms ROCm and CUDA thats how reality and real world works in when money at stakes. |
|
Wtf are you talking about dude. Quantization only goes one direction: High > Low and not Low > Higher > High > Lower. What kind of bullshit is that supposed to be? In the end you get a crappy quant, nothing more. XD Yeah, of course they do, thanks for that joke to start the weekend. |
|
@comfyanonymous please merge |
There are Triton kernels which work on both Nvidia and AMD. Once this gets merged I guess there will be even higher demand to support Triton on Nvidia Pascal and AMD RDNA2. |
|
This will be a game changer for the community, specially those with modest specs and struggling to upgrade in this moment of memory scarcity. Looking forward to the merge! |
This PR adds support for int8 quantized models with full backend support and with optimized matmul kernels using triton.