[SYCL] Add lightweight free-function kernel property headers#21738
[SYCL] Add lightweight free-function kernel property headers#21738koparasy wants to merge 5 commits intointel:syclfrom
Conversation
8346075 to
295ad05
Compare
| additionally provides the launch-configuration properties that may be applied | ||
| directly to a free function kernel declaration, such as | ||
| `work_group_size`, `work_group_size_hint`, `sub_group_size`, | ||
| `max_work_group_size`, and `max_linear_work_group_size`. |
There was a problem hiding this comment.
I don't think we should call the header function_launch_properties. There are two distinct classes of kernel-related properties, which we call "kernel properties" and "kernel launch properties". The "kernel properties" are those that affect the way the kernel is compiled. For a normal kernel, you specify these via the get(properties_tag) function. For a free-function kernel, they are specified via the SYCL_EXT_ONEAPI_FUNCTION_PROPERTY macro. The "kernel launch properties" are those that affect the way the kernel is launched. These are specified via the launch_config parameter.
All the properties you list here are "kernel properties". Therefore, I think it would be confusing to put them in a header named "launch_properties". A logical breakdown of properties might be:
-
free_function_properties.hpp: Just thesingle_task_kernelandnd_range_kernelproperties along with theSYCL_EXT_ONEAPI_FUNCTION_PROPERTYmacro. -
kernel_properties.hpp: The properties defined insycl_ext_oneapi_kernel_properties, theproperty_tagtype, and theSYCL_EXT_ONEAPI_FUNCTION_PROPERTYmacro.
From your description, it sounds like error checking is a major contribution to the compilation time? If this is the case, could we add a macro that disables the error checking? This might be generally useful as a way to disable error checks in production code that has already been debugged.
Note that the max_work_group_size and max_linear_work_group_size properties are specific to CUDA. If these are causing compilations on Intel devices to be slow, we should do something to fix that. For example, they could be split out into a separate header.
There was a problem hiding this comment.
Thanks for the detailed feedback. I agree that the current function_launch_properties.hpp name is misleading, since the properties in that header are still compile-time kernel properties rather than "launch_config" properties.
My intent with the split was to separate the lightweight free-function annotation path from the heavier property machinery, not to introduce a new distinction between compile-time kernel properties and kernel launch properties. Given your point, I think the naming should better reflect that. One option I had in mind would be:
function_properties.hppfor the minimal free-function annotation pathsingle_task_kernel,nd_range_kernelSYCL_EXT_ONEAPI_FUNCTION_PROPERTY` macro.- A second header with a name along the lines of
function_kernel_properties.hpporfunction_compile_properties.hppfor the broader set of compile-time kernel properties used on that path.
If that direction sounds reasonable to you, I can update the naming and documentation accordingly. I wanted to check with you first before changing it.
On the compile-time side, my current suspicion is that the main cost comes from the general property machinery and associated metaprogramming, rather than primarily from the diagnostics or error checking. I have not ruled the diagnostics out completely, but so far the evidence points more toward the property framework itself being the dominant factor.
| cross-property validation that is performed when properties are combined via | ||
| the full property-list machinery. For example, checks for conflicting | ||
| combinations such as `work_group_size` together with `max_work_group_size` are | ||
| available only through the umbrella header. |
There was a problem hiding this comment.
Something seems wrong with the reasoning here. For a free-function kernel, the properties are never added to a property list. Instead, each property is individually specified via SYCL_EXT_ONEAPI_FUNCTION_PROPERTY. Therefore, I don't see how we can ever check for invalid combinations at compilation time. Maybe we could do that in the CFE or as an LLVMIR pass, but I don't see how we could do that via metaprogramming in the header.
I wonder if the metaprogramming cost you are observing is only relevant for kernel properties defined via get(properties_tag) of a named function object kernel (i.e. kernel defined as a class)? If this is the case, it seems like we could make free-function compilation faster by creating a header that contains only the property definitions for the "kernel properties" and leaving out any metaprogramming machinery that is used only when these properties are added to a property list.
Maybe this is what you are doing already, but I would not describe it as a lighter-weight header that skips validation. Instead, it's a header that is used only when defining a free-function kernel, where validation doesn't happen anyways.
I think we are on the same page about creating a header that defines only: single_task_kernel, nd_range_kernel, and SYCL_EXT_ONEAPI_FUNCTION_PROPERTY (and maybe the associated _key structures if it doesn't add significant overhead). These are the minimal things you need to define any free-function kernel.
I'm on the fence about creating a second header that defines the additional kernel properties like work_group_size. If we just define these properties without defining the metaprogramming that does the cross-property validation, does it really save much to split these off into a separate header?
Two other high level comments that are somewhat relevant:
-
The proposed khr_includes only proposes a set of header files for the core SYCL APIs. We never defined the headers associated with the various extensions. You are starting to do that now with this PR, so there's a lot of uncharted territory here.
-
I am in the process of proposing a KHR version of "compile time properties" here. In doing this, I'm planning to drop some of the heavy metaprogramming (like sorting the properties in the list). There is a POC implementation linked from that PR. It's possible that this KHR version of properties will compile significantly faster that the version we have now.
There was a problem hiding this comment.
Thanks for pushing back on this. Pushing everything into a single /sycl/ext/oneapi/kernel_properties/function_properties.hpp and leaving the cross-property validation out slightly increases compilation time to 29ms (from 26ms in comparison to the previous commit) but keeps the separation clean 👍 . The time was spent on the inclusion (std::array). I avoided the implementation to use std::array except device_has.
device_has is conceptually part of the free-function property world, because it already has FunctionPropertyMetaInfo support and maps naturally to SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(...).
But it is also the one property whose current public API hard-bakes in std::array:
static constexpr std::array<aspect, sizeof...(Aspects)> value;
That makes it different from work_group_size and friends, where std::array was only an implementation detail. So in the current implementation I have the device_has property on the base header (and thus excluded from "function_property" header) that keeps compilation time low, but this introduces an inconsistency in the free-function property model, where one property cannot be included in the lightweight header due to its API dependency
Given this is under the experimental namespace, my inclination would be to adjust the public API of device_has to remove the std::array dependency and make it consistent with the other properties.
Does that align with the expected policy for experimental features?
There was a problem hiding this comment.
I agree this is better described as a free-function-specific header rather than “skipping validation”. I’ll update the PR description accordingly.
For the split itself, I can prioritize the minimal header (single_task_kernel, nd_range_kernel, SYCL_EXT_ONEAPI_FUNCTION_PROPERTY, ...) and keep the device_has property at the initial location.
cperkinsintel
left a comment
There was a problem hiding this comment.
header / sycl stuff LGTM
Introduce lightweight kernel-property headers for free-function kernel annotation and launch-tuning use cases: - function_properties.hpp provides a standalone path for execution-kind annotations such as nd_range_kernel and single_task_kernel. - function_launch_properties.hpp provides a lightweight path for launch properties such as work_group_size, work_group_size_hint, and sub_group_size. - kernel_properties/properties.hpp remains the umbrella path for full property-list semantics and cross-property conflict checking. This reduces compile-time cost for kernel-language and JIT-style usage while preserving the full umbrella interface for users who need complete property-list behavior. Measured header compile times: - function_properties.hpp: ~26 ms - function_launch_properties path: ~58 ms - kernel_properties/properties.hpp umbrella: ~68 ms
property headers instead of the nested kernel_properties/* paths. Keep the legacy kernel_properties/* headers as forwarding compatibility shims, add smoke tests for both legacy include paths, and mark the shims for deletion once support for the old paths is dropped.
ea6ac15 to
faa95e1
Compare
Introduce headers specialized for free-function kernel annotation, separating them from the heavier property-list-based machinery used by named function object kernels.
function_properties.hppprovides a minimal path for defining free-function kernels, including:kernel_properties/properties.hppremains the umbrella header for full property-list semantics, including cross-property validation and metaprogramming used for named kernel objects.By isolating the free-function path, we avoid pulling in unnecessary metaprogramming machinery and reduce compile-time overhead for kernel-language and JIT-style usage.
Measured header compile times:
function_properties.hpp:~29 mskernel_properties/properties.hpp(umbrella): ~68 ms