Context
We are currently developing and testing the SPNG branch of Wire-Cell Toolkit on a GPU machine (wcgpu) using a Spack-based environment (wcwc installation).
As we prepare to scale and deploy to other machines, we are relying heavily on Spack installations of wire-cell-toolkit@spng.
Problem
We are encountering subtle but critical dependency differences across environments, particularly:
- CUDA version differences (e.g., CUDA 11 vs CUDA 12+)
- Boost version differences
These differences require small but necessary source-level changes in SPNG for successful builds.
Example: NVTX changes in CUDA ≥ 12
In CUDA 12 and newer:
Required changes in SPNG:
// Old
#include <nvToolsExt.h>
// New
#include <nvtx3/nvToolsExt.h>
And remove linking against:
Current Workaround (Spack patch via package.py)
We are applying temporary fixes using filter_file() inside the patch() function:
filter_file(
" libs += ['nvToolsExt']",
" pass # nvToolsExt skipped; NVTX3 is header-only in modern CUDA",
"waft/libtorch.py",
string=True,
)
filter_file(
r",-lnvToolsExt",
"",
"waft/libtorch.py",
)
filter_file(
"if lib == 'python':",
"if lib == 'system':\n\t\t\t\tcontinue\n\t\t\tif lib == 'python':",
"waft/boost.py",
string=True,
)
Why this is a problem
- These fixes are environment-specific hacks
- Hard to manage maintain if the software is used over different machines and architectures:
- When source code is changed along the patches part, these patches also need to be fixed in wire-cell-spack repository
- They introduce maintenance overhead in Spack recipes
- Portability overhead
Suggested Solutions
Should we add conditional compilation flag in the SPNG codebase? As the script is used across different machines or built against different dependency versions, it might grow but it is probably more stable than patches in package.py in the long run? Examples (based on experience of building in wcgpu and sgpu machines with different CUDA version):
- Add conditional compilation in SPNG
-
Detect CUDA version
-
Switch include paths:
#if CUDA_VERSION >= 12000
#include <nvtx3/nvToolsExt.h>
#else
#include <nvToolsExt.h>
#endif
-
Remove explicit linkage to nvToolsExt --> This is already done TorchnvToolks
- Treat NVTX as optional / header-only where applicable
-
Improve build system (waft)
- Detect NVTX availability dynamically
- Avoid hardcoding
-lnvToolsExt (based on CUDA version available)
Temporary Recommendation
Until a proper fix is implemented:
- Continue using Spack
package.py patches (filter_file) as a workaround
- Track all such dependency-related patches centrally
- Should we add a comment in part of the SPNG code where patches are applied so that we are aware when to make changes in the package.py of wire-cell-spack?
Context
We are currently developing and testing the SPNG branch of Wire-Cell Toolkit on a GPU machine (
wcgpu) using a Spack-based environment (wcwcinstallation).As we prepare to scale and deploy to other machines, we are relying heavily on Spack installations of
wire-cell-toolkit@spng.Problem
We are encountering subtle but critical dependency differences across environments, particularly:
These differences require small but necessary source-level changes in SPNG for successful builds.
Example: NVTX changes in CUDA ≥ 12
In CUDA 12 and newer:
NVTX is now header-only
Headers moved to:
The
libnvToolsExtlibrary no longer existsRequired changes in SPNG:
And remove linking against:
Current Workaround (Spack patch via
package.py)We are applying temporary fixes using
filter_file()inside thepatch()function:Why this is a problem
Suggested Solutions
Should we add conditional compilation flag in the SPNG codebase? As the script is used across different machines or built against different dependency versions, it might grow but it is probably more stable than patches in package.py in the long run? Examples (based on experience of building in wcgpu and sgpu machines with different CUDA version):
Detect CUDA version
Switch include paths:
Remove explicit linkage to
nvToolsExt--> This is already done TorchnvToolksImprove build system (waft)
-lnvToolsExt(based on CUDA version available)Temporary Recommendation
Until a proper fix is implemented:
package.pypatches (filter_file) as a workaround