Skip to content

[SYCL] Add lightweight free-function kernel property headers#21738

Open
koparasy wants to merge 5 commits intointel:syclfrom
koparasy:compile-time/function-properties
Open

[SYCL] Add lightweight free-function kernel property headers#21738
koparasy wants to merge 5 commits intointel:syclfrom
koparasy:compile-time/function-properties

Conversation

@koparasy
Copy link
Copy Markdown
Contributor

@koparasy koparasy commented Apr 10, 2026

Introduce headers specialized for free-function kernel annotation, separating them from the heavier property-list-based machinery used by named function object kernels.

  • function_properties.hpp provides a minimal path for defining free-function kernels, including:
    • single_task_kernel
    • nd_range_kernel
    • SYCL_EXT_ONEAPI_FUNCTION_PROPERTY
    • associated lightweight property definitions
  • kernel_properties/properties.hpp remains the umbrella header for full property-list semantics, including cross-property validation and metaprogramming used for named kernel objects.

By isolating the free-function path, we avoid pulling in unnecessary metaprogramming machinery and reduce compile-time overhead for kernel-language and JIT-style usage.

Measured header compile times:

function_properties.hpp: ~29 ms
kernel_properties/properties.hpp (umbrella): ~68 ms

@koparasy koparasy requested review from a team as code owners April 10, 2026 21:44
@koparasy koparasy requested a review from cperkinsintel April 10, 2026 21:44
@koparasy koparasy force-pushed the compile-time/function-properties branch from 8346075 to 295ad05 Compare April 10, 2026 22:12
additionally provides the launch-configuration properties that may be applied
directly to a free function kernel declaration, such as
`work_group_size`, `work_group_size_hint`, `sub_group_size`,
`max_work_group_size`, and `max_linear_work_group_size`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should call the header function_launch_properties. There are two distinct classes of kernel-related properties, which we call "kernel properties" and "kernel launch properties". The "kernel properties" are those that affect the way the kernel is compiled. For a normal kernel, you specify these via the get(properties_tag) function. For a free-function kernel, they are specified via the SYCL_EXT_ONEAPI_FUNCTION_PROPERTY macro. The "kernel launch properties" are those that affect the way the kernel is launched. These are specified via the launch_config parameter.

All the properties you list here are "kernel properties". Therefore, I think it would be confusing to put them in a header named "launch_properties". A logical breakdown of properties might be:

  • free_function_properties.hpp: Just the single_task_kernel and nd_range_kernel properties along with the SYCL_EXT_ONEAPI_FUNCTION_PROPERTY macro.

  • kernel_properties.hpp: The properties defined in sycl_ext_oneapi_kernel_properties, the property_tag type, and the SYCL_EXT_ONEAPI_FUNCTION_PROPERTY macro.

From your description, it sounds like error checking is a major contribution to the compilation time? If this is the case, could we add a macro that disables the error checking? This might be generally useful as a way to disable error checks in production code that has already been debugged.

Note that the max_work_group_size and max_linear_work_group_size properties are specific to CUDA. If these are causing compilations on Intel devices to be slow, we should do something to fix that. For example, they could be split out into a separate header.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed feedback. I agree that the current function_launch_properties.hpp name is misleading, since the properties in that header are still compile-time kernel properties rather than "launch_config" properties.

My intent with the split was to separate the lightweight free-function annotation path from the heavier property machinery, not to introduce a new distinction between compile-time kernel properties and kernel launch properties. Given your point, I think the naming should better reflect that. One option I had in mind would be:

  • function_properties.hpp for the minimal free-function annotation path single_task_kernel, nd_range_kernel SYCL_EXT_ONEAPI_FUNCTION_PROPERTY` macro.
  • A second header with a name along the lines of function_kernel_properties.hpp or function_compile_properties.hpp for the broader set of compile-time kernel properties used on that path.

If that direction sounds reasonable to you, I can update the naming and documentation accordingly. I wanted to check with you first before changing it.

On the compile-time side, my current suspicion is that the main cost comes from the general property machinery and associated metaprogramming, rather than primarily from the diagnostics or error checking. I have not ruled the diagnostics out completely, but so far the evidence points more toward the property framework itself being the dominant factor.

cross-property validation that is performed when properties are combined via
the full property-list machinery. For example, checks for conflicting
combinations such as `work_group_size` together with `max_work_group_size` are
available only through the umbrella header.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something seems wrong with the reasoning here. For a free-function kernel, the properties are never added to a property list. Instead, each property is individually specified via SYCL_EXT_ONEAPI_FUNCTION_PROPERTY. Therefore, I don't see how we can ever check for invalid combinations at compilation time. Maybe we could do that in the CFE or as an LLVMIR pass, but I don't see how we could do that via metaprogramming in the header.

I wonder if the metaprogramming cost you are observing is only relevant for kernel properties defined via get(properties_tag) of a named function object kernel (i.e. kernel defined as a class)? If this is the case, it seems like we could make free-function compilation faster by creating a header that contains only the property definitions for the "kernel properties" and leaving out any metaprogramming machinery that is used only when these properties are added to a property list.

Maybe this is what you are doing already, but I would not describe it as a lighter-weight header that skips validation. Instead, it's a header that is used only when defining a free-function kernel, where validation doesn't happen anyways.

I think we are on the same page about creating a header that defines only: single_task_kernel, nd_range_kernel, and SYCL_EXT_ONEAPI_FUNCTION_PROPERTY (and maybe the associated _key structures if it doesn't add significant overhead). These are the minimal things you need to define any free-function kernel.

I'm on the fence about creating a second header that defines the additional kernel properties like work_group_size. If we just define these properties without defining the metaprogramming that does the cross-property validation, does it really save much to split these off into a separate header?

Two other high level comments that are somewhat relevant:

  • The proposed khr_includes only proposes a set of header files for the core SYCL APIs. We never defined the headers associated with the various extensions. You are starting to do that now with this PR, so there's a lot of uncharted territory here.

  • I am in the process of proposing a KHR version of "compile time properties" here. In doing this, I'm planning to drop some of the heavy metaprogramming (like sorting the properties in the list). There is a POC implementation linked from that PR. It's possible that this KHR version of properties will compile significantly faster that the version we have now.

Copy link
Copy Markdown
Contributor Author

@koparasy koparasy Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing back on this. Pushing everything into a single /sycl/ext/oneapi/kernel_properties/function_properties.hpp and leaving the cross-property validation out slightly increases compilation time to 29ms (from 26ms in comparison to the previous commit) but keeps the separation clean 👍 . The time was spent on the inclusion (std::array). I avoided the implementation to use std::array except device_has.

device_has is conceptually part of the free-function property world, because it already has FunctionPropertyMetaInfo support and maps naturally to SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(...).

But it is also the one property whose current public API hard-bakes in std::array:

static constexpr std::array<aspect, sizeof...(Aspects)> value;

That makes it different from work_group_size and friends, where std::array was only an implementation detail. So in the current implementation I have the device_has property on the base header (and thus excluded from "function_property" header) that keeps compilation time low, but this introduces an inconsistency in the free-function property model, where one property cannot be included in the lightweight header due to its API dependency

Given this is under the experimental namespace, my inclination would be to adjust the public API of device_has to remove the std::array dependency and make it consistent with the other properties.
Does that align with the expected policy for experimental features?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is better described as a free-function-specific header rather than “skipping validation”. I’ll update the PR description accordingly.

For the split itself, I can prioritize the minimal header (single_task_kernel, nd_range_kernel, SYCL_EXT_ONEAPI_FUNCTION_PROPERTY, ...) and keep the device_has property at the initial location.

Copy link
Copy Markdown
Contributor

@cperkinsintel cperkinsintel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

header / sycl stuff LGTM

Introduce lightweight kernel-property headers for free-function kernel
annotation and launch-tuning use cases:

- function_properties.hpp provides a standalone path for execution-kind
  annotations such as nd_range_kernel and single_task_kernel.
- function_launch_properties.hpp provides a lightweight path for launch
  properties such as work_group_size, work_group_size_hint, and
  sub_group_size.
- kernel_properties/properties.hpp remains the umbrella path for full
  property-list semantics and cross-property conflict checking.

This reduces compile-time cost for kernel-language and JIT-style usage
while preserving the full umbrella interface for users who need complete
property-list behavior.

Measured header compile times:
- function_properties.hpp: ~26 ms
- function_launch_properties path: ~58 ms
- kernel_properties/properties.hpp umbrella: ~68 ms
property headers instead of the nested kernel_properties/* paths.

Keep the legacy kernel_properties/* headers as forwarding compatibility
shims, add smoke tests for both legacy include paths, and mark the
shims for deletion once support for the old paths is dropped.
@koparasy koparasy force-pushed the compile-time/function-properties branch from ea6ac15 to faa95e1 Compare April 23, 2026 17:45
@koparasy koparasy requested review from a team as code owners April 23, 2026 17:45
@koparasy koparasy requested a review from adamfidel April 23, 2026 17:45
Copy link
Copy Markdown
Contributor

@sarnex sarnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

esimd lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants