Skip to content

Refine the packaging infrastructure.#207

Open
eirrgang wants to merge 6 commits into
llnl:notexfrom
eirrgang:eirrgang-packaging
Open

Refine the packaging infrastructure.#207
eirrgang wants to merge 6 commits into
llnl:notexfrom
eirrgang:eirrgang-packaging

Conversation

@eirrgang

@eirrgang eirrgang commented Mar 5, 2026

Copy link
Copy Markdown
Collaborator

Continues the CMake reorganization and migrates from setup.py to pyproject.toml to drive the build and packaging.

Includes some source code patches for more consistent behavior and some shim headers for better compatibility across CUDA, HIP, and CPU runtimes.

Use pyproject.toml and scikit-build-core to drive the CMake build.

Minor CMake modernization.

Add some CMake infrastructure to try to handle the
three target build types (cuda, hip, and cpu-only)
but the project infrastructure may not be set up
for non-cuda builds at this point.
- Fix default accelerator framework.
- Only define `__USE_NOTEX` for AMD
@eirrgang

Copy link
Copy Markdown
Collaborator Author

Commit 6de2200 is a pretty substantial change to restore automatic hipification that setup.py performed via torch.utils.cpp_extension.

Background

torch.utils has some fairly elaborate tooling to automatically detect AMD GPUs and hipify the cuda sources automatically, but this does not rely on very stable interfaces and is sensitive to the torch release, the ROCM version, and the CUDA version, as well as relying on the distutils style setuptools Command conventions, which aren't fully compatible with other modern Python packaging tools.

A complete migration off of torch.utils hipify wrappers is not easy because the behavior of hipify-clang is inconsistent across ROCM versions.

CMake-driven hipify

We're trying to call hipify-clang, but, at least in ROCM 7.2, we're having a hard time managing its output files correctly.

In the mean time, we can use the hipify in torch.utils, but

  • this requires a torch installation
  • we need to avoid build-isolation in order to use the torch installation, and
  • we still need to do some tweaking to account for different rocm and cuda versions

For best results, use the most recent ROCM available and the oldest supported CUDA available.

With ROCM 7.2 and CUDA 12.9, the following seems to work

python -m build . -Ccmake.define.LEAP_GPU=AMD -Ccmake.define.CMAKE_CXX_COMPILER=`which amdclang++` --no-isolation

@eirrgang eirrgang force-pushed the eirrgang-packaging branch from 6de2200 to 0577a9c Compare June 23, 2026 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant