feat(executor): pass GPU device requests to spawned method containers by Burhanuddin98 · Pull Request #94 · choras-org/backend

Burhanuddin98 · 2026-05-12T07:58:40Z

Summary

Adds device_requests=[DeviceRequest(count=-1, capabilities=[["gpu"]])] to the client.containers.run() call in LocalExecutor. The backend itself stays CPU-only; only the spawned solver method containers receive GPU passthrough.

Motivation

The workshop methods that benefit from GPU acceleration (Hamilton's PFFDTD via the c_cuda binary, edg-acoustics' CuPy path) cannot reach the host GPU without this — the backend dispatches via the Docker socket and `containers.run()` defaults to no device requests.

What this PR does

One block of changes in `app/services/executors/local_executor.py`:

```python
device_requests=[
docker.types.DeviceRequest(count=-1, capabilities=[["gpu"]])
],
```

Verification

End-to-end run on Windows + Docker Desktop + RTX 2060 Max-Q:

`nvidia-smi` inside the spawned method container correctly reports the host GPU.
A GPU-compiled PFFDTD c_cuda binary (Hamilton's `fdtd_main_gpu_double.x`) executes inside the spawned container; GPU utilization tracks the kernel during the run (`~40%` on the chosen mesh).
On a host without `nvidia-container-toolkit`, the device request is silently ignored and the container runs CPU-only — existing CPU-only methods (`pyroomacoustics`) keep working unchanged.

Open question for maintainers

`count=-1` (all visible GPUs) is unconditional here. A possible refinement is to read a per-method `requiresGpu` flag from `methods-config.json` and use `count=1` instead, sparing CPU-only methods a CUDA context reservation on multi-GPU laptops. Left out of this PR to keep the change minimal and let the design choice rest with the maintainers.

Test plan

`nvidia-smi` inside spawned container reports GPU
PFFDTD c_cuda binary runs to completion inside spawned container
CPU-only method (pyroomacoustics-style) still works on the same host
Pending maintainer review for the count=-1 / per-method-flag design choice

Adds device_requests=[DeviceRequest(count=-1, capabilities=[['gpu']])] to client.containers.run() in LocalExecutor. The backend itself stays CPU-only; only the spawned solver method containers receive GPU passthrough. On hosts with nvidia-container-toolkit installed, each spawned container sees the host GPU. Verified end-to-end with a GPU-compiled PFFDTD c_cuda binary: nvidia-smi inside the spawned container reports the device and GPU utilization tracks the solver kernel during the run. On hosts without the toolkit, the request is silently ignored and the container runs CPU-only -- so existing CPU-only methods like pyroomacoustics keep working unchanged. Open question for review: count=-1 (all visible GPUs) is unconditional here. A future refinement could read a per-method requiresGpu flag from methods-config.json and use count=1 instead, sparing CPU-only methods a CUDA context reservation on multi-GPU laptops. Kept out of this PR to minimize the change surface and let maintainers steer the design.

mberz

Thanks for your PR.

I've just tested it on macOS with arm and receive the following error when trying to boot the container:

backend-1 | [2026-05-12 10:53:34,419: ERROR/MainProcess] Failed to start Docker container: 500 Server Error for http+docker://localhost/v1.52/containers/8a02c31369a7b5430a1a5c42bfd9aab75ffb7561359f50eb638e5bd61e5df4d0/start: Internal Server Error ("could not select device driver "" with capabilities: [[gpu]]")

It seems that in case where gpu capability is not there, the containers fail.
Do you have a solution for that?

Addresses Marco's review on PR choras-org#94 (macOS arm test): > [2026-05-12 10:53:34,419: ERROR/MainProcess] Failed to start Docker > container: 500 Server Error [...] ("could not select device driver "" > with capabilities: [[gpu]]") My original claim that the device request would be silently ignored on hosts without nvidia-container-toolkit was wrong. The Docker daemon returns a hard 500 instead. Apple Silicon, macOS, Linux without nvidia runtime, and Windows without WSL+nvidia all hit this. Fix: try the GPU-on containers.run() first; on the specific "could not select device driver" / [[gpu]] APIError, log a warning and retry the run() without device_requests. Any other 500 (port conflict, image-pull failure, OOM, etc.) still propagates as before. Result: - Hosts with nvidia-container-toolkit: spawned container sees GPU (verified end-to-end with PFFDTD c_cuda binary on RTX 2060). - Hosts without GPU: spawned container runs CPU-only, no user-visible change vs the pre-PR behaviour. Backend logs one warning per spawn.

Burhanuddin98 · 2026-05-12T09:02:14Z

Hi Marco, thanks for catching that on macOS arm. You were right: my "silently ignored" assumption was wrong; the daemon returns a hard 500 instead. Pushed commit 0ace611 to the same branch. It tries the GPU request first and catches the specific could not select device driver ... [[gpu]] APIError, retrying CPU-only. Any other 500 (port conflict, image pull, OOM, etc.) still propagates as before. Verified on the Linux+nvidia path that GPU passthrough still works end-to-end with the PFFDTD c_cuda binary. Ready for re-review.

mberz added this to CHORAS planning May 12, 2026

github-project-automation Bot moved this to Backlog in CHORAS planning May 12, 2026

mberz moved this from Backlog to Require review in CHORAS planning May 12, 2026

mberz self-requested a review May 12, 2026 08:54

mberz requested changes May 12, 2026

View reviewed changes

mberz linked an issue Jun 18, 2026 that may be closed by this pull request

ENH: GPU support for simulation methods choras-org/CHORAS#53

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(executor): pass GPU device requests to spawned method containers#94

feat(executor): pass GPU device requests to spawned method containers#94
Burhanuddin98 wants to merge 2 commits into
choras-org:devfrom
Burhanuddin98:feat/gpu-passthrough-method-containers

Burhanuddin98 commented May 12, 2026

Uh oh!

mberz left a comment

Uh oh!

Burhanuddin98 commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Burhanuddin98 commented May 12, 2026

Summary

Motivation

What this PR does

Verification

Open question for maintainers

Test plan

Uh oh!

mberz left a comment

Choose a reason for hiding this comment

Uh oh!

Burhanuddin98 commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants