Skip to content

Avoid ROCm Conv3d crash in Qwen35 vision patch embedding by using equivalent linear projection#14215

Open
peterwilli wants to merge 6 commits into
Comfy-Org:masterfrom
peterwilli:fix/amd_rocm_qwen35_reference_image_segfault_fix
Open

Avoid ROCm Conv3d crash in Qwen35 vision patch embedding by using equivalent linear projection#14215
peterwilli wants to merge 6 commits into
Comfy-Org:masterfrom
peterwilli:fix/amd_rocm_qwen35_reference_image_segfault_fix

Conversation

@peterwilli

Copy link
Copy Markdown

Using UIT sampler (https://github.com/easygoing0114/ComfyUI-uit-hidream-o1) and HiDream o1, I got a segfault whenever I wanted to use a reference image: image

As you can see, it runs now. But before this PR, this was the error log:

Long log

(ComfyUI) ➜  ComfyUI git:(master) python main.py --disable-api-nodes --preview-method=latent2rgb --verbose                                
[INFO] setup plugin alembic.autogenerate.schemas
[INFO] setup plugin alembic.autogenerate.tables
[INFO] setup plugin alembic.autogenerate.types
[INFO] setup plugin alembic.autogenerate.constraints
[INFO] setup plugin alembic.autogenerate.defaults
[INFO] setup plugin alembic.autogenerate.comments
[START] Security scan
[INFO] [ComfyUI-Manager] Using `uv` as Python module for pip operations.
[DONE] Security scan
[DEBUG] Popen(['git', 'version'], cwd=/home/peter/Applications/MachineLearning/ComfyUI, stdin=None, shell=False, universal_newlines=False)
[DEBUG] Popen(['git', 'version'], cwd=/home/peter/Applications/MachineLearning/ComfyUI, stdin=None, shell=False, universal_newlines=False)
## ComfyUI-Manager: installing dependencies done.
** ComfyUI startup time: 2026-06-01 16:55:21.924
** Platform: Linux
** Python version: 3.13.11 (main, Jan 13 2026, 17:36:15) [Clang 21.1.4 ]
** Python executable: /home/peter/Applications/MachineLearning/ComfyUI/.venv/bin/python
** ComfyUI Path: /home/peter/Applications/MachineLearning/ComfyUI
** ComfyUI Base Folder Path: /home/peter/Applications/MachineLearning/ComfyUI
** User directory: /home/peter/Applications/MachineLearning/ComfyUI/user
** ComfyUI-Manager config path: /home/peter/Applications/MachineLearning/ComfyUI/user/__manager/config.ini
** Log path: /home/peter/Applications/MachineLearning/ComfyUI/user/comfyui.log
[INFO] 
Prestartup times for custom nodes:
[INFO]    0.5 seconds: /home/peter/Applications/MachineLearning/ComfyUI/custom_nodes/comfyui-manager
[INFO] 
(null): No such file or directory
(null): No such file or directory
[INFO] Found comfy_kitchen backend triton: {'available': True, 'disabled': True, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'apply_rope_split_half', 'apply_rope_split_half1', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8']}
[INFO] Found comfy_kitchen backend eager: {'available': True, 'disabled': False, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'apply_rope_split_half', 'apply_rope_split_half1', 'dequantize_mxfp8', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'gemv_awq_w4a16', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8', 'quantize_svdquant_w4a4', 'scaled_mm_mxfp8', 'scaled_mm_nvfp4', 'scaled_mm_svdquant_w4a4', 'stochastic_rounding_fp8']}
[INFO] Found comfy_kitchen backend cuda: {'available': True, 'disabled': True, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'apply_rope_split_half', 'apply_rope_split_half1', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'gemv_awq_w4a16', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8', 'quantize_svdquant_w4a4', 'scaled_mm_svdquant_w4a4', 'stochastic_rounding_fp8']}
[INFO] Checkpoint files will always be loaded safely.
[INFO] Total VRAM 32768 MB, total RAM 31875 MB
[INFO] pytorch version: 2.12.0+rocm7.2
[INFO] Set: torch.backends.cudnn.enabled = False for better AMD performance.
[INFO] AMD arch: gfx1150
[INFO] ROCm version: (7, 2)
[INFO] Set vram state to: NORMAL_VRAM
[INFO] Device: cuda:0 AMD Radeon 890M : native
[INFO] Using async weight offloading with 2 streams
[INFO] Enabled pinned memory 28687.0
[INFO] Using pytorch attention
[INFO] Python version: 3.13.11 (main, Jan 13 2026, 17:36:15) [Clang 21.1.4 ]
[INFO] ComfyUI version: 0.22.0
[INFO] comfy-aimdo version: 0.4.7
[INFO] comfy-kitchen version: 0.2.10
[DEBUG] Using selector: EpollSelector
[INFO] comfyui-frontend-package version: 1.44.19
[INFO] comfyui-workflow-templates version: 0.9.92
[INFO] comfyui-embedded-docs version: 0.5.2
[INFO] comfy-kitchen version: 0.2.10
[INFO] comfy-aimdo version: 0.4.7
[INFO] [Prompt Server] web root: /home/peter/Applications/MachineLearning/ComfyUI/.venv/lib/python3.13/site-packages/comfyui_frontend_package/static
[INFO] Asset seeder disabled
[snip]
[INFO] loaded completely;  15377.39 MB loaded, full load: True
[UITSampler] UiT model detected (HiDreamO1).
[INFO] Requested to load HiDreamO1TE
[INFO] loaded completely; 15823.47 MB usable, 0.00 MB loaded, full load: True
[INFO] loaded completely; 15823.47 MB usable, 0.00 MB loaded, full load: True
[INFO] Requested to load HiDreamO1
  0%|                                                                                                                                                                                                                                                                                                                                                              | 0/12 [00:00<?, ?it/s]Fatal Python error: Segmentation fault

Stack (most recent call first):
  File "/home/peter/Applications/MachineLearning/ComfyUI/.venv/lib/python3.13/site-packages/torch/nn/modules/conv.py", line 730 in _conv_forward
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/ops.py", line 552 in _conv_forward
  File "/home/peter/Applications/MachineLearning/ComfyUI/.venv/lib/python3.13/site-packages/torch/nn/modules/conv.py", line 735 in forward
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/ops.py", line 565 in forward
  File "/home/peter/Applications/MachineLearning/ComfyUI/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1789 in _call_impl
  File "/home/peter/Applications/MachineLearning/ComfyUI/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1778 in _wrapped_call_impl
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/text_encoders/qwen35.py", line 455 in forward
  File "/home/peter/Applications/MachineLearning/ComfyUI/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1789 in _call_impl
  File "/home/peter/Applications/MachineLearning/ComfyUI/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1778 in _wrapped_call_impl
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/text_encoders/qwen35.py", line 650 in forward
  File "/home/peter/Applications/MachineLearning/ComfyUI/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1789 in _call_impl
  File "/home/peter/Applications/MachineLearning/ComfyUI/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1778 in _wrapped_call_impl
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/ldm/hidream_o1/model.py", line 176 in _forward
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/patcher_extension.py", line 113 in execute
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/patcher_extension.py", line 106 in __call__
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy_extras/nodes_hidream_o1.py", line 203 in smoothing_wrapper
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/patcher_extension.py", line 114 in execute
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/ldm/hidream_o1/model.py", line 137 in forward
  File "/home/peter/Applications/MachineLearning/ComfyUI/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1789 in _call_impl
  File "/home/peter/Applications/MachineLearning/ComfyUI/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1778 in _wrapped_call_impl
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/model_base.py", line 230 in _apply_model
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/patcher_extension.py", line 113 in execute
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/model_base.py", line 186 in apply_model
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/samplers.py", line 334 in _calc_cond_batch
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/patcher_extension.py", line 113 in execute
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/samplers.py", line 218 in _calc_cond_batch_outer
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/samplers.py", line 210 in calc_cond_batch
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/samplers.py", line 619 in sampling_function
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/samplers.py", line 1212 in predict_noise
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/patcher_extension.py", line 113 in execute
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/samplers.py", line 1209 in outer_predict_noise
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/samplers.py", line 1202 in __call__
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/samplers.py", line 639 in __call__
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/k_diffusion/sampling.py", line 205 in sample_euler
  File "/home/peter/Applications/MachineLearning/ComfyUI/.venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124 in decorate_context
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/samplers.py", line 999 in sample
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/patcher_extension.py", line 113 in execute
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/samplers.py", line 1229 in inner_sample
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/samplers.py", line 1254 in outer_sample
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/patcher_extension.py", line 113 in execute
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/samplers.py", line 1316 in sample
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/samplers.py", line 1334 in sample
  File "/home/peter/Applications/MachineLearning/ComfyUI/comfy/sample.py", line 79 in sample_custom
  File "/home/peter/Applications/MachineLearning/ComfyUI/custom_nodes/ComfyUI-uit-hidream-o1/nodes_uit_hidream.py", line 236 in sample
  File "/home/peter/Applications/MachineLearning/ComfyUI/execution.py", line 298 in process_inputs
  File "/home/peter/Applications/MachineLearning/ComfyUI/execution.py", line 310 in _async_map_node_over_list
  File "/home/peter/Applications/MachineLearning/ComfyUI/execution.py", line 336 in get_output_data
  File "/home/peter/Applications/MachineLearning/ComfyUI/execution.py", line 536 in execute
  File "/home/peter/Applications/MachineLearning/ComfyUI/execution.py", line 774 in execute_async
  File "/home/peter/.local/share/uv/python/cpython-3.13.11-linux-x86_64-gnu/lib/python3.13/asyncio/events.py", line 89 in _run
  File "/home/peter/.local/share/uv/python/cpython-3.13.11-linux-x86_64-gnu/lib/python3.13/asyncio/base_events.py", line 2050 in _run_once
  File "/home/peter/.local/share/uv/python/cpython-3.13.11-linux-x86_64-gnu/lib/python3.13/asyncio/base_events.py", line 683 in run_forever
  File "/home/peter/.local/share/uv/python/cpython-3.13.11-linux-x86_64-gnu/lib/python3.13/asyncio/base_events.py", line 712 in run_until_complete
  File "/home/peter/.local/share/uv/python/cpython-3.13.11-linux-x86_64-gnu/lib/python3.13/asyncio/runners.py", line 118 in run
  File "/home/peter/.local/share/uv/python/cpython-3.13.11-linux-x86_64-gnu/lib/python3.13/asyncio/runners.py", line 195 in run
  File "/home/peter/Applications/MachineLearning/ComfyUI/execution.py", line 714 in execute
  File "/home/peter/Applications/MachineLearning/ComfyUI/main.py", line 327 in prompt_worker
  File "/home/peter/.local/share/uv/python/cpython-3.13.11-linux-x86_64-gnu/lib/python3.13/threading.py", line 995 in run
  File "/home/peter/.local/share/uv/python/cpython-3.13.11-linux-x86_64-gnu/lib/python3.13/threading.py", line 1044 in _bootstrap_inner
  File "/home/peter/.local/share/uv/python/cpython-3.13.11-linux-x86_64-gnu/lib/python3.13/threading.py", line 1015 in _bootstrap

Extension modules: sqlalchemy.cyextension.collections, sqlalchemy.cyextension.immutabledict, sqlalchemy.cyextension.processors, sqlalchemy.cyextension.resultproxy, sqlalchemy.cyextension.util, greenlet._greenlet, markupsafe._speedups, yaml._yaml, PIL._imaging, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, chardet.models, chardet.pipeline.ascii, chardet.pipeline.confusion, chardet.pipeline.escape, chardet.pipeline.magic, chardet.pipeline.statistical, chardet.pipeline.structural, chardet.pipeline.utf8, chardet.pipeline.utf1632, chardet.pipeline.validity, chardet.pipeline.orchestrator, charset_normalizer.md, charset_normalizer.cd, requests.packages.chardet.models, requests.packages.chardet.pipeline.ascii, requests.packages.chardet.pipeline.confusion, requests.packages.chardet.pipeline.escape, requests.packages.chardet.pipeline.magic, requests.packages.chardet.pipeline.statistical, requests.packages.chardet.pipeline.structural, requests.packages.chardet.pipeline.utf8, requests.packages.chardet.pipeline.utf1632, requests.packages.chardet.pipeline.validity, requests.packages.chardet.pipeline.orchestrator, numpy._core._multiarray_umath, numpy.linalg._umath_linalg, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._pcg64, numpy.random._generator, numpy.random._mt19937, numpy.random._philox, numpy.random._sfc64, numpy.random.mtrand, psutil._psutil_linux, PIL._imagingft, _cyutility, scipy._cyutility, scipy._lib._ccallback_c, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, scipy.special._ufuncs_cxx, scipy.special._ellip_harm_2, scipy.special._special_ufuncs, scipy.special._gufuncs, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, _ni_label, scipy.ndimage._ni_label, regex._regex, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._batched_linalg, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_schur_sqrtm, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpacklib, scipy.sparse.linalg._propack, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._slsqplib, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._hausdorff, scipy.spatial._distance_wrap, scipy.spatial.transform._rotation_cy, scipy.spatial.transform._rigid_transform_cy, scipy.optimize._direct, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.special.cython_special, scipy.stats._stats, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._rcont.rcont, scipy.stats._qmvnt_cy, av._core, av.logging, av.buffer, av.audio.format, av.error, av.dictionary, av.container.pyio, av.option, av.descriptor, av.format, av.index, av.utils, av.stream, av.container.streams, av.sidedata.encparams, av.sidedata.motionvectors, av.sidedata.sidedata, av.opaque, av.packet, av.container.input, av.container.output, av.container.core, av.codec.context, av.video.format, av.video.reformatter, av.plane, av.video.plane, av.video.frame, av.video.stream, av.codec.hwaccel, av.codec.codec, av.frame, av.audio.layout, av.audio.plane, av.audio.frame, av.audio.stream, av.filter.link, av.filter.context, av.filter.graph, av.filter.filter, av.filter.loudnorm, av.audio.resampler, av.audio.codeccontext, av.audio.fifo, av.bitstream, av.device, av.video.codeccontext, av.subtitles.stream, scipy.signal._sigtools, scipy.signal._max_len_seq_inner, scipy.signal._upfirdn_apply, scipy.signal._spline, scipy.signal._sosfilt, scipy.signal._peak_finding_utils (total: 202)
[1]    418089 segmentation fault (core dumped)  python main.py --disable-api-nodes --preview-method=latent2rgb --verbose
(ComfyUI) ➜  ComfyUI git:(master) 

I looked at the source code from the backtrace, and found out that the apparently the conv3d is quite unstable for rocm kernels. I found out the patch, kernel and stride are all equally big. Because of this, we can simply replace it with a linear layer.

My hardware is:

  • Framework 16
  • AMD Radeon 890M
  • AMD Ryzen AI 370 (Strix Point)
  • 64GB unified memory (split in 32GB VRAM and 32GB RAM)
  • 4TB SSD
  • NixOS 26.05

Since I only tested AMD and a bf16 model I restricted my workaround to AMD and 16-bit only, not affecting other behaviour. The main advantage of this workaround is that you don't need more RAM, and it doens't hurt performance in any way.
This is my first commit to ComfyUI, so I may have done something wrong. Review/feedback is adviced. Thank you!

@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: ae80a71e-d687-41e7-8f36-9565c467cc1e

📥 Commits

Reviewing files that changed from the base of the PR and between 39c12cf and ddb3bcc.

📒 Files selected for processing (1)
  • comfy/text_encoders/qwen35.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • comfy/text_encoders/qwen35.py

📝 Walkthrough

Walkthrough

This PR introduces a device-specific optimization in the Qwen 3.5 vision encoder. The change conditionally routes the full-patch convolution computation in Qwen35VisionPatchEmbed.forward: when executing on AMD CUDA with float16 or bfloat16 tensors, it performs the projection using F.linear with reshaped convolution weights instead of calling self.proj directly, thereby bypassing a reduced-precision kernel issue. All other device and dtype combinations retain the existing behavior.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: replacing Conv3d with linear projection to avoid ROCm crashes in Qwen35 vision patch embedding.
Description check ✅ Passed The description provides context about the segmentation fault issue, includes a detailed error log, explains the root cause, and documents the testing environment and scope of the fix.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jprsyt5

jprsyt5 commented Jun 2, 2026

Copy link
Copy Markdown

The reason subgraphs exist, at least originally I think, was to reduce the need for custom nodes like this.

I bet everything this sampler does can already be built with native nodes and packed into a subgraph.

Too bad subgraphs are broken, so people have to keep making and relying on custom nodes for this instead.

@peterwilli

Copy link
Copy Markdown
Author

I think you're right, I have to add though that this issue also happens without the UIT Sampler >~>' I may have forgotten that, I went to bed really late

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants