Add Ideogram support and improve BF16 dequantization handling#459
Add Ideogram support and improve BF16 dequantization handling#459molbal wants to merge 6 commits into
Conversation
|
Great! I successfully ran it, but my device doesn't support bf16; it gets converted to fp32 computation, which makes it very slow. Can you make it run on my device in fp16? [INFO] got prompt |
|
Hi @yu234567 - try now. It should work better now, can you verify please? |
Thank you so much, it worked! |
…tization handling.
…prove dequantization efficiency.
…ntized model support and LoRA compatibility.
|
classifies Qwen3-VL-8B-Instruct q4_0 quant as _k quant and errors out ? is this expected or what quantization are you running for the te |
Summary
This adds support for Ideogram GGUF models.
What Changed
ideogramto the supported image GGUF architectures.Notes
Tested on Windows 11, Python version: 3.12.11 (main, Jul 23 2025, 00:32:20) [MSC v.1944 64 bit (AMD64)] [INFO] Total VRAM 8192 MB, total RAM 48394 MB
[INFO] pytorch version: 2.12.0+cu130
[INFO] Set vram state to: LOW_VRAM
[INFO] Device: cuda:0 NVIDIA GeForce RTX 3080 Laptop GPU
Tested with Q4_0 gguf from https://huggingface.co/leejet/ideogram-4-GGUF
Other GGUF quant types still use the existing dequant paths.