Avoid ROCm Conv3d crash in Qwen35 vision patch embedding by using equivalent linear projection#14215
Conversation
…ivalent linear projection
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR introduces a device-specific optimization in the Qwen 3.5 vision encoder. The change conditionally routes the full-patch convolution computation in 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The reason subgraphs exist, at least originally I think, was to reduce the need for custom nodes like this. I bet everything this sampler does can already be built with native nodes and packed into a subgraph. Too bad subgraphs are broken, so people have to keep making and relying on custom nodes for this instead. |
|
I think you're right, I have to add though that this issue also happens without the UIT Sampler >~>' I may have forgotten that, I went to bed really late |
Using UIT sampler (https://github.com/easygoing0114/ComfyUI-uit-hidream-o1) and HiDream o1, I got a segfault whenever I wanted to use a reference image:
As you can see, it runs now. But before this PR, this was the error log:
Long log
I looked at the source code from the backtrace, and found out that the apparently the conv3d is quite unstable for rocm kernels. I found out the patch, kernel and stride are all equally big. Because of this, we can simply replace it with a linear layer.
My hardware is:
Since I only tested AMD and a bf16 model I restricted my workaround to AMD and 16-bit only, not affecting other behaviour. The main advantage of this workaround is that you don't need more RAM, and it doens't hurt performance in any way.
This is my first commit to ComfyUI, so I may have done something wrong. Review/feedback is adviced. Thank you!