Refine the packaging infrastructure. by eirrgang · Pull Request #207 · llnl/LEAP

eirrgang · 2026-03-05T23:28:05Z

Continues the CMake reorganization and migrates from setup.py to pyproject.toml to drive the build and packaging.

Includes some source code patches for more consistent behavior and some shim headers for better compatibility across CUDA, HIP, and CPU runtimes.

Use pyproject.toml and scikit-build-core to drive the CMake build. Minor CMake modernization. Add some CMake infrastructure to try to handle the three target build types (cuda, hip, and cpu-only) but the project infrastructure may not be set up for non-cuda builds at this point.

- Fix default accelerator framework. - Only define `__USE_NOTEX` for AMD

eirrgang · 2026-04-27T22:01:30Z

Commit 6de2200 is a pretty substantial change to restore automatic hipification that setup.py performed via torch.utils.cpp_extension.

Background

torch.utils has some fairly elaborate tooling to automatically detect AMD GPUs and hipify the cuda sources automatically, but this does not rely on very stable interfaces and is sensitive to the torch release, the ROCM version, and the CUDA version, as well as relying on the distutils style setuptools Command conventions, which aren't fully compatible with other modern Python packaging tools.

A complete migration off of torch.utils hipify wrappers is not easy because the behavior of hipify-clang is inconsistent across ROCM versions.

CMake-driven hipify

We're trying to call hipify-clang, but, at least in ROCM 7.2, we're having a hard time managing its output files correctly.

In the mean time, we can use the hipify in torch.utils, but

this requires a torch installation
we need to avoid build-isolation in order to use the torch installation, and
we still need to do some tweaking to account for different rocm and cuda versions

For best results, use the most recent ROCM available and the oldest supported CUDA available.

With ROCM 7.2 and CUDA 12.9, the following seems to work

python -m build . -Ccmake.define.LEAP_GPU=AMD -Ccmake.define.CMAKE_CXX_COMPILER=`which amdclang++` --no-isolation

eirrgang added 6 commits November 12, 2025 17:56

sort file lists for readability

89dd918

Only need CXX component for OpenMP

18f775b

Improve GPU handling.

0faa6b9

- Fix default accelerator framework. - Only define `__USE_NOTEX` for AMD

Report the selected accelerator flavor.

6effaa9

Add a fallback version.

0577a9c

eirrgang mentioned this pull request Apr 20, 2026

Fix Could NOT find OpenMP_CUDA (missing: OpenMP_CUDA_FLAGS OpenMP_CUD… #203

Open

eirrgang force-pushed the eirrgang-packaging branch from 6de2200 to 0577a9c Compare June 23, 2026 23:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refine the packaging infrastructure.#207

Refine the packaging infrastructure.#207
eirrgang wants to merge 6 commits into
llnl:notexfrom
eirrgang:eirrgang-packaging

eirrgang commented Mar 5, 2026 •

edited

Loading

Uh oh!

eirrgang commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eirrgang commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eirrgang commented Apr 27, 2026

Background

CMake-driven hipify

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eirrgang commented Mar 5, 2026 •

edited

Loading