feat[next-dace]: Use SDFG library node for lowering of broadcast and reduce by edopao · Pull Request #2386 · GridTools/gt4py

edopao · 2025-11-11T11:03:07Z

TODO:

Run ICON4Py CI, see ICON4Py PR#1240
Run Blueline (there are some degradation but in general it works)
Run MuPhys (just to be sure)

philip-paul-mueller

There are some refinements needed.

philip-paul-mueller · 2025-11-11T12:21:46Z

+
+
+@dace_library.node
+class Fill(dace_nodes.LibraryNode):


I would add some more semantic, i.e. an input connector, that collects the value that should be broadcasted and an output connector for the output.

I am also wondering if it would make sense to have two different library nodes.
One where the value that is broadcast is a literal, like 0.0 and one, which is probably the current one, where the value is read from another data descriptor (might be hard to integrate into the lowering).

philip-paul-mueller · 2025-11-12T05:53:39Z

@edopao
I am not sure if we should add the transformations we need already in this PR or in a later one.
If we put it in a later one, we should patch the optimizer to expand the node right at the beginning, this way we preserve the current behaviour and performance.

edopao · 2025-11-24T10:23:52Z

cscs-ci run default

edopao · 2025-11-24T14:43:47Z

cscs-ci run default

edopao · 2025-11-24T15:07:23Z

cscs-ci run default

edopao · 2025-12-10T07:29:43Z

No plan for now to integrate this feature.

edopao

Very good, just some minor comments.

edopao · 2026-05-06T08:08:11Z

+    ```python
+    for i in range(len(broadcast_in_dim):
+        assert output.shape[broadcast_in_dim[i]] == value_to_broadcast.shape[i]
+    ```


Suggested change

```

```

In other words, the result array shape has the same size as the broadcast domain.

edopao · 2026-05-06T08:10:25Z

+
+    Todo:
+        - While for the output it is probably okay to always require an adjacent
+            AccessNode for the input it might be possible to be on the other side


Suggested change

AccessNode for the input it might be possible to be on the other side

AccessNode, the input nodes might be outside a map scope.

However, I don't understand how this could happen.

I think this comment dates back when there was still the .value attribute.

edopao · 2026-05-06T12:21:00Z

+        # Check single use data if it was not known at the beginning.
+        if self._single_use_data is None:
+            find_single_use_data = dace_analysis.FindSingleUseData()
+            single_use_data = find_single_use_data.apply_pass(sdfg, None)


Would it be wrong to now store single_use_data? I am asking because it is used again inside apply().

No because, there is no guarantee that between now and later there is a change that makes data no longer single use.
By passing it on constructor the user kinds of guarantee that it is safe.

edopao · 2026-05-06T12:29:43Z

+        #   probably yes, as we can remove the read and write of the initial data
+        #   only the write to final destination is left. If the consumers are Maps
+        #   the thing is a bit different. As we have to create the intermediate
+        #   allocation. If the read of the memory is okay the `InlineBroadcastAccess`


InlineBroadcastAccess does not exist yet.

I think that was the old name.
Then I removed it to Scalar..., because I thought that I onlyneed to handle scalars which was wrong.
So there is a todo to rename it, I think I will now switch to the old name.

edopao

Very good, just some minor comments.

No unit test yet and also it has thr wrong name.

…nto simplify. Also expanding is now happening at the very end.

… partially handle additional domain.

They are now conditionally expanded in the auto optimizer if splitting is enabled and no broadcast related transformation has been applied.

… transformation.

- Added a custom `free_symbols()` function to the library node to ensure that the symbols in `params` are not found - Added validation to ensure that the name and lable attribute of the node are always in sync.

I assume that some map (horizontal) fusion transformations runs amock. Probably there needs to be a better place. Before I was thinking that the issue is with the fusion that the splitter performs. But now I think it is earlier, i.e. Parallel Mapfusion has laready killed them. Maybe it is a good idea to make the transformation split, i.e. map inlining is distinct from AccessNode replacement.

…but this sets a new level.

Now the expansion of teh broadcast node takes place after the first fix point was found. I think this is the best idea, but I am not fully sure.

philip-paul-mueller · 2026-05-11T11:34:49Z

Here are the newest data from 8edefc0 which show that the speed is now comparable (at least is much better than before).
If I look at my experiences with the compute_advection_in_vertical then the 3% degradation we observes are most likely caused by some broadcast loops that were (for whatever reasons) not integrated into other kernels.

philip-paul-mueller · 2026-05-26T06:35:27Z

My current guess of why they are not integrated is because of their range.
In this PR the ranges are based on the size of the target location, i.e. the range is [0, size_of_patch_to_broadcast), but before the range was based on the grid coordinates that were written, i.e. the range was [136, 2000).
Furthermore, the Map splitter currently only consider the range and not the sizes.

I think the current design (with size) is better, because it makes handling the nodes much simpler and they contain less state (ideally they would be stateless, but this is not possible).
Thus, we should update the splitting tools to consider the size of the iteration spaces instead of their size.

edopao · 2026-05-26T15:08:39Z

I have moved the library node for reduction with skip values to a separate PR #2603

edopao added 2 commits November 11, 2025 12:01

edit

f8180e2

edit

ef9ef92

edopao force-pushed the dace-fill_node branch from f94a037 to ef9ef92 Compare November 11, 2025 12:12

edopao requested a review from philip-paul-mueller November 11, 2025 12:15

undo extra change

bd1b766

philip-paul-mueller reviewed Nov 11, 2025

View reviewed changes

use library node also in concat_where

25abd36

edopao force-pushed the dace-fill_node branch from bc04cfc to 25abd36 Compare November 11, 2025 12:42

havogt reviewed Nov 11, 2025

View reviewed changes

Comment thread src/gt4py/next/program_processors/runners/dace/sdfg_library_nodes.py Outdated

edopao added 4 commits November 12, 2025 22:46

edit

071f512

Merge remote-tracking branch 'upstream/main' into dace-fill_node

331bcd3

edit

b90976b

fix for inf expressions

21a79d2

edopao commented Nov 13, 2025

View reviewed changes

Comment thread src/gt4py/next/program_processors/runners/dace/gtir_to_sdfg_primitives.py Outdated

edopao added 8 commits November 13, 2025 15:14

edit

a1f6f1a

edit

6e97232

Merge branch 'dace-refactor_concat_where' into dace-fill_node

08484df

edit

da12bdb

Merge branch 'dace-refactor_concat_where' into dace-fill_node

5382dce

Merge branch 'main' into dace-fill_node

b04586c

Merge branch 'main' into dace-fill_node

4647c7d

remove special handling for inf symbol

779b164

fix rebase

f250040

fix rebase

68a417f

Merge branch 'main' into dace-fill_node

9d774e4

philip-paul-mueller added 3 commits May 5, 2026 13:11

Implemented a missing case.

1e2089b

Forgot to update them.

8d3a89e

git Fixed a wrong check.

3dcf818

philip-paul-mueller mentioned this pull request May 6, 2026

DO NOT MERGE: Check Broadcast Node C2SM/icon4py#1240

Draft

edopao commented May 6, 2026

View reviewed changes

philip-paul-mueller added 19 commits May 7, 2026 11:09

Updated the transformation to also handle vectors.

11b8651

No unit test yet and also it has thr wrong name.

Moved the ScalarBroadcastInliner (which still has the wrong name) i…

2cd3a56

…nto simplify. Also expanding is now happening at the very end.

Merge remote-tracking branch 'gt4py/main' into dace-fill_node

28e8808

Updated the reconfigure_dataflow_after_rerouting() such that it can…

7071b62

… partially handle additional domain.

Added a note.

fe58d9a

Relocated the expansion of broadcast nodes.

8972c02

They are now conditionally expanded in the auto optimizer if splitting is enabled and no broadcast related transformation has been applied.

Added a transformation that can remove chains of broadcast operations.

03ab21e

Updated the broadcast chain transformation to be more liberal.

e63a4f7

Integrated the broadcast chain remover into simplify.

790c953

Applied Edoardo's suggestions.

4debf5f

Removed name duplication when the broadcast nodes get dublicated in a…

e08f9ea

… transformation.

Updated the broadcast library node

4e445ee

- Added a custom `free_symbols()` function to the library node to ensure that the symbols in `params` are not found - Added validation to ensure that the name and lable attribute of the node are always in sync.

Updated how expansion is working, should be more uniform in this way.

d38dadd

Fixed how the reconnection is done.

a4af6e8

Splitting tools are now able to handle the broadcast node.

bd52557

I have seen a lot of strange thing in DaCe related to other_subset …

555ace1

…but this sets a new level.

Simplify runs the broadcast transformations again.

fa6e301

Updated the top level map optimizer.

8edefc0

Now the expansion of teh broadcast node takes place after the first fix point was found. I think this is the best idea, but I am not fully sure.

This was too much validation.

42b3e95

Merge branch 'main' into dace-fill_node

e9bc148

	```
	```
	In other words, the result array shape has the same size as the broadcast domain.

	AccessNode for the input it might be possible to be on the other side
	AccessNode, the input nodes might be outside a map scope.



		@dace_library.node
		class Fill(dace_nodes.LibraryNode):

Conversation

edopao commented Nov 11, 2025 • edited by philip-paul-mueller Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philip-paul-mueller left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

philip-paul-mueller commented Nov 12, 2025

Uh oh!

Uh oh!

edopao commented Nov 24, 2025

Uh oh!

edopao commented Nov 24, 2025

Uh oh!

edopao commented Nov 24, 2025

Uh oh!

edopao commented Dec 10, 2025

Uh oh!

edopao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edopao left a comment

Choose a reason for hiding this comment

Uh oh!

philip-paul-mueller commented May 11, 2026

Uh oh!

philip-paul-mueller commented May 26, 2026

Uh oh!

edopao commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

edopao commented Nov 11, 2025 •

edited by philip-paul-mueller

Loading