Proteus is a programmable runtime specialization and Just-In-Time (JIT) layer built on LLVM. It embeds into existing C++ codebases and accelerates host, CUDA, and HIP applications by using runtime context to specialize code and enable optimizations beyond static compilation.
Standard ahead-of-time (AOT) compilation can only optimize a program with the information available at build time. Proteus goes further by embedding optimizing JIT compilation directly into C/C++ applications.
Runtime context, such as the actual values of variables during execution, lets it specialize code on the fly and apply advanced compiler optimizations that accelerate performance beyond what static compilation allows.
Several frontends are available, depending on how you want to describe JIT code:
| Interface | Input style | Best for | Specialization model | Requires Clang AOT? |
|---|---|---|---|---|
| Code annotations | Existing C/C++/CUDA/HIP code | Incremental adoption in existing applications | Values, arrays, objects, and launch configuration | Yes |
| C++ frontend API | C++ source strings | Runtime-generated C++ and templates | Values, arrays, objects, and launch configuration | No |
| MLIR frontend API | MLIR source strings | Direct access to MLIR lowering | Encoded in the provided MLIR source | No |
| Embedded DSL API | Programmatic builders | Runtime code generation with high-level constructs | Values, arrays, and launch configuration | No |
These frontends can target host, CUDA, and HIP execution paths, with backend support depending on how Proteus was configured:
| Interface | Host | CUDA | HIP | Notes |
|---|---|---|---|---|
| Code annotations | Yes | Yes | Yes | Requires compiling with Clang and uses the Proteus LLVM pass |
| C++ frontend API | Yes | Yes | Yes | Uses Clang by default; CUDA paths can use NVCC |
| MLIR frontend API | Yes | Yes | Yes | Requires PROTEUS_ENABLE_MLIR=ON |
| Embedded DSL API | Yes | Yes | Yes | Uses the LLVM backend by default; MLIR backend requires PROTEUS_ENABLE_MLIR=ON |
CUDA, HIP, and MLIR support are available when Proteus is built with the corresponding configuration options enabled.
Proteus includes both in-memory and persistent caching, ensuring that once code has been compiled and optimized, the cost of recompilation is avoided.
Proteus consists of an LLVM pass and a runtime library that implements JIT compilation and optimization using LLVM as a library.
- The code annotation interface requires compiling your application with Clang so the Proteus LLVM pass can parse annotations.
- The DSL, C++ frontend, and MLIR frontend APIs don’t depend on which AOT compiler you use.
In all cases, you link your application against the Proteus runtime library. Details are provided later.
Proteus can be installed from source or via spack.
We provide a packaging recipe for Spack in the subdirectory packaging/spack.
Assuming you have a Spack installation and preferably using an isolated Spack environment, you can add the spack repo by cloning Proteus and then install it by running:
git clone https://github.com/Olympus-HPC/proteus.git
spack repo add proteus/packaging/spack
spack install proteusWe provide several variants to match different configurations, including CUDA, ROCm, and MPI support. A complete list of variants and their descriptions is available in the Spack package file, or viewable through:
spack info proteusSome typical examples:
# Install the latest version with CUDA support for sm_90 arch.
spack install proteus +cuda cuda_arch=90
# Install the latest version with ROCm support for gfx942 arch.
spack install proteus +rocm amdgpu_target=gfx942
# Install the latest version with MPI support.
spack install proteus +mpiThe project uses cmake and requires an LLVM installation.
CI tests currently cover LLVM 19, 20, 22 with CUDA versions 12.2, and AMD
ROCm versions 6.4.3 (based on LLVM 19), 7.1.1 (based on LLVM 20), 7.2.0 (based
on LLVM 22).
See the top-level CMakeLists.txt for the available build options.
A typical build looks like this:
mkdir -p build && cd build
cmake -DLLVM_INSTALL_DIR=<llvm_install_path> -DCMAKE_INSTALL_PREFIX=<install_path> ..
make install
The scripts directory contains setup scripts for building on different targets
(host-only, CUDA, ROCm) used on LLNL machines.
They also serve as good starting points to adapt for other environments.
Run them from the repository root:
source scripts/setup-<target>.shThese scripts load environment modules (specific to LLNL systems) and create a
build-<hostname>-<target>-<version> directory with a
working configuration.
To integrate Proteus with CMake, add the install prefix to CMAKE_PREFIX_PATH,
or pass it explicitly with
-Dproteus_DIR=<install_path>/<libdir>/cmake/proteus during configuration
where <libdir> is typically lib or lib64.
Then, in your project's CMakeLists.txt add:
find_package(proteus CONFIG REQUIRED)
add_proteus(<target>)If you only need the DSL, C++ frontend, or MLIR frontend APIs, you can link directly against
proteusFrontend.
In this case, you don’t need to compile your target with Clang:
find_package(proteus CONFIG REQUIRED)
target_link_libraries(<target> ... proteusFrontend ...)With make, annotation-based integration requires adding compilation and
linking flags, for example:
CXXFLAGS += -I<install_path>/include -fpass-plugin=<install_path>/<libdir>/libProteusPass.so
LDFLAGS += -L<install_path>/<libdir> -Wl,-rpath,<install_path>/<libdir> -lproteus $(llvm-config --libs) -lclang-cppIf you don't use code annotations, you can omit the -fpass-plugin option,
since the LLVM pass is only needed for processing annotations.
Proteus's core optimization technique is runtime constant folding.
It replaces runtime values with constants during JIT compilation, which in turn
turbo-charges classical compiler optimizations such as loop unrolling,
control-flow simplification, and constant propagation.
Think of it as doing constexpr, but at runtime.
Values that can be folded include function or kernel arguments, kernel launch dimensions, launch bounds, and other runtime variables.
Choose the interface that matches how your application wants to describe JIT work:
| Interface | Detailed guide |
|---|---|
| Code annotations | Code Annotations |
| C++ frontend API | C++ Frontend API |
| MLIR frontend API | MLIR Frontend API |
| DSL API | DSL API |
Proteus generates a unique specialization for each distinct set of runtime values and caches them in memory and on disk, so JIT overhead is minimized within and across runs.
The Proteus documentation has more extensive information, including a user's guide and developer manual.
We welcome contributions to Proteus in the form of pull requests targeting the
main branch of the repo, as well as questions, feature requests, or bug reports
via issues.
Please note that Proteus has a Code of Conduct. By participating in the Proteus community, you agree to abide by its rules.
Proteus was created by Giorgis Georgakoudis, georgakoudis1@llnl.gov.
Key contributors are:
- David Beckingsale, beckingsale1@llnl.gov
- Konstantinos Parasyris, parasyris1@llnl.gov
- John Bowen, bowen36@llnl.gov
- Zane Fink, fink12@llnl.gov
- Tal Ben Nun, bennun2@llnl.gov
- Thomas Stitt, stitt4@llnl.gov
Proteus is distributed under the terms of the Apache License (Version 2.0) with LLVM Exceptions.
All new contributions must be made under the Apache-2.0 with LLVM Exceptions license.
See LICENSE, COPYRIGHT, and NOTICE for details.
SPDX-License-Identifier: (Apache-2.0 WITH LLVM-exception)
LLNL-CODE-2000857
