Skip to content

Fix device selection on Apple Silicon (MPS)#205

Open
FirasAguel wants to merge 1 commit into
1038lab:mainfrom
FirasAguel:fix/mps-device-selection
Open

Fix device selection on Apple Silicon (MPS)#205
FirasAguel wants to merge 1 commit into
1038lab:mainfrom
FirasAguel:fix/mps-device-selection

Conversation

@FirasAguel

Copy link
Copy Markdown

Summary

On Apple Silicon (Mac, MPS), several nodes selected their device with a module-level

device = "cuda" if torch.cuda.is_available() else "cpu"

Since torch.cuda.is_available() is False on a Mac, they silently fell back to CPU even when ComfyUI itself had selected MPS. The model still runs, just far slower, so there's no error — only a job that sits at 0% for minutes, which reads as a hang.

This defers device selection to ComfyUI's own device management via a small shared helper, so the nodes follow whatever backend ComfyUI chose (CUDA, MPS, or CPU) — the same approach the SAM2 / SAM3 / Florence2 / Segment / SDMatte nodes already use.

Changes

  • Add get_device() to py/AILab_utils.py, wrapping comfy.model_management.get_torch_device().
  • Use it in the nodes that hard-coded the CUDA-or-CPU check: BiRefNet, RMBG, ClothSegment, BodySegment, FashionSegment, FaceSegment, SegmentV2.
  • Centralizing it in one helper keeps the check from drifting again.
  • LamaRemover already uses get_torch_device() and is left unchanged.

device is only ever used as .to(device) / map_location=device / a hashable cache key, so returning a torch.device object instead of a string is safe at every call site.

Evidence (before/after, Apple Silicon)

  • Before (CPU fallback): a single BiRefNet image ≈ 18 min.
  • After (MPS): a 21-second video processed in ≈ 9 min (~1s/frame).

Environment: Apple Silicon, macOS 15.7.4, 48 GB unified memory, ComfyUI 0.20.1 (Desktop), PyTorch 2.10.0, Python 3.13, ComfyUI-RMBG v3.0.0. ComfyUI startup correctly reports Device: mps.

Notes

  • A few ops may not yet be implemented for MPS. If you hit one, launch ComfyUI with PYTORCH_ENABLE_MPS_FALLBACK=1 so those fall back to CPU instead of erroring.

Related issues

If you'd prefer a user-facing device dropdown (Auto/CUDA/CPU/MPS) on these nodes — matching the YoloV8/SAM3 pattern — rather than this automatic deferral, happy to follow up with that.

Several nodes selected their device with a module-level
`device = "cuda" if torch.cuda.is_available() else "cpu"`. On Apple Silicon
`torch.cuda.is_available()` is False, so they silently ran on CPU even when
ComfyUI had selected MPS. The model still runs, just far slower, so there is no
error -- only a job that sits at 0% for minutes (reads as a hang). A single
BiRefNet image took ~18 min on CPU; on MPS a 21s video processed in ~9 min
(~1s/frame).

Defer to ComfyUI's own device management via a shared helper
`get_device()` in AILab_utils (wraps `comfy.model_management.get_torch_device()`),
so nodes follow whatever backend ComfyUI chose -- matching what the SAM2/SAM3/
Florence2/Segment/SDMatte nodes already do. Centralizing it keeps the check from
drifting again.

Applied to BiRefNet, RMBG, ClothSegment, BodySegment, FashionSegment,
FaceSegment, and SegmentV2. LamaRemover already uses get_torch_device() and is
left unchanged.

Fixes 1038lab#200. Related to 1038lab#91, 1038lab#135.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BiRefNet node stuck at 0% with no error on Mac Apple Silicon (ComfyUI Desktop)

1 participant