8384571: C2: Add some basic IGVN optimization for VectorBlendNode by erifan · Pull Request #31333 · openjdk/jdk

erifan · 2026-06-01T02:04:13Z

This PR introduces the basic Ideal/Identity transformations for VectorBlendNode.

The semantic of VectorBlend(X, Y, M) is: M ? Y : X.

Identity:

  (VectorBlend X Y (Replicate -1)) => Y
  (VectorBlend X Y (MaskAll   -1)) => Y
  (VectorBlend X Y (Replicate  0)) => X
  (VectorBlend X Y (MaskAll    0)) => X

Ideal:

  (VectorBlend (VectorBlend X A M) B M)  => (VectorBlend X B M)
  (VectorBlend A (VectorBlend B X M) M)  => (VectorBlend A X M)
  (VectorBlend A B (XorV/XorVMask M -1)) => (VectorBlend B A M)

Also corrects the VectorBlendNode header comment: across all backends (X86 SSE/AVX, AArch64 NEON/SVE, RISC-V V) the active mask lane selects vec2 (in(2)), and the inactive lane selects vec1 (in(1)).

JTReg and JMH tests are also added for each optimization pattern. All tests (tier1, tier2, and tier3) passed on AArch64 and X86 platforms.

JMH benchmark test results:

On a Nvidia Grace (Neoverse-V2) machine with 128-bit SVE2:

Benchmark		        Unit	Before	Error	After		Error	Uplift
blendNegatedMaskInt	    ops/ms	7990.6	2.8	    10215.2		11.0	1.3
identityAllOnesInt	    ops/ms	3574.8	2.6	    7967.1		0.3	    2.2
identityAllZerosLong	ops/ms	3575.6	1.0	    7966.0		3.6	    2.2
nestedBlendInnerLong	ops/ms	3533.8	2.8	    478573.0	3178.5	135.4
nestedBlendOuterInt	    ops/ms	3537.6	3.4	    472242.2	3034.2	133.5

On an AWS Graviton3 (Neoverse-V1) machine with 256-bit SVE1:

Benchmark		        Unit	Before	Error	After		Error	Uplift
blendNegatedMaskInt	    ops/ms	5171.9	5.2	    8129.0		17.3	1.6
identityAllOnesInt	    ops/ms	2722.0	0.1	    5891.3		0.1	    2.2
identityAllZerosLong	ops/ms	2722.4	0.1	    5891.1		0.3	    2.2
nestedBlendInnerLong	ops/ms	2697.6	0.0	    312148.7	2366.4	115.7
nestedBlendOuterInt	    ops/ms	2702.7	0.1	    308686.0	2709.8	114.2

On a Nvidia Grace (Neoverse-V2) machine with -XX:UseSVE=0:

Benchmark		        Unit	Before	Error	After		Error	Uplift
blendNegatedMaskInt	    ops/ms	7718.1	1.9	    9515.9		54.0	1.2
identityAllOnesInt	    ops/ms	3581.9	0.6	    8062.5		0.5	    2.3
identityAllZerosLong	ops/ms	3582.7	0.6	    8058.5		11.9	2.2
nestedBlendInnerLong	ops/ms	3529.6	1.4	    476029.8	5190.2	134.9
nestedBlendOuterInt	    ops/ms	3536.9	2.1	    486060.0	3442.1	137.4

On an AMD EPYC 9124 16-Core Processor with option -XX:UseAVX=3:

Benchmark		        Unit	Before	Error	After		Error	Uplift
blendNegatedMaskInt	    ops/ms	36773.6	541.7	46467.4		499.4	1.3
identityAllOnesInt	    ops/ms	5262.7	3.7	    13644.7		12.1	2.6
identityAllZerosLong	ops/ms	5272.4	3.4	    13665.3		8.4	    2.6
nestedBlendInnerLong	ops/ms	5256.6	4.9	    436643.3	14778.8	83.1
nestedBlendOuterInt	    ops/ms	5253.2	1.5	    223851.3	106003	42.6

On an AMD EPYC 9124 16-Core Processor with option -XX:UseAVX=2:

Benchmark		        Unit	Before	Error	After		Error	Uplift
blendNegatedMaskInt	    ops/ms	24335.3	32.1	30412.3		28.1	1.2
identityAllOnesInt	    ops/ms	5248.8	5.0	    13677.5		18.4	2.6
identityAllZerosLong	ops/ms	5248.8	2.2	    13655.8		2.9	    2.6
nestedBlendInnerLong	ops/ms	5146.2	4.6	    649242.6	1174.4	126.2
nestedBlendOuterInt	    ops/ms	5141.8	6.2	    646255.2	10654.1	125.7

The microbenchmark shows a significant speedup. This is mainly because this PR eliminates redundant computations inside the loop by hoisting them out of the loop. At the same time, it reduces the number of IR uses, which can in turn enable further optimizations.

I confirm that I make this contribution in accordance with the OpenJDK Interim AI Policy.

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

JDK-8384571: C2: Add some basic IGVN optimization for VectorBlendNode (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/31333/head:pull/31333
$ git checkout pull/31333

Update a local copy of the PR:
$ git checkout pull/31333
$ git pull https://git.openjdk.org/jdk.git pull/31333/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 31333

View PR using the GUI difftool:
$ git pr show -t 31333

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/31333.diff

Using Webrev

Link to Webrev Comment

This PR introduces the basic Ideal/Identity transformations for `VectorBlendNode`. The semantic of `VectorBlend(X, Y, M)` is: `M ? Y : X`. **Identity**: ``` (VectorBlend X Y (Replicate -1)) => Y (VectorBlend X Y (MaskAll -1)) => Y (VectorBlend X Y (Replicate 0)) => X (VectorBlend X Y (MaskAll 0)) => X ``` **Ideal**: ``` (VectorBlend (VectorBlend X A M) B M) => (VectorBlend X B M) (VectorBlend A (VectorBlend B X M) M) => (VectorBlend A X M) (VectorBlend A B (XorV/XorVMask M -1)) => (VectorBlend B A M) ``` Also corrects the VectorBlendNode header comment: across all backends (X86 SSE/AVX, AArch64 NEON/SVE, RISC-V V) the active mask lane selects `vec2` (in(2)), and the inactive lane selects `vec1` (in(1)). JTReg and JMH tests are also added for each optimization pattern. All tests (tier1, tier2, and tier3) passed on AArch64 and X86 platforms. JMH benchmark test results: On a Nvidia Grace (Neoverse-V2) machine with 128-bit SVE2: ``` Benchmark Unit Before Error After Error Uplift blendNegatedMaskInt ops/ms 7990.6 2.8 10215.2 11.0 1.3 identityAllOnesInt ops/ms 3574.8 2.6 7967.1 0.3 2.2 identityAllZerosLong ops/ms 3575.6 1.0 7966.0 3.6 2.2 nestedBlendInnerLong ops/ms 3533.8 2.8 478573.0 3178.5 135.4 nestedBlendOuterInt ops/ms 3537.6 3.4 472242.2 3034.2 133.5 ``` On an AWS Graviton3 (Neoverse-V1) machine with 256-bit SVE1: ``` Benchmark Unit Before Error After Error Uplift blendNegatedMaskInt ops/ms 5171.9 5.2 8129.0 17.3 1.6 identityAllOnesInt ops/ms 2722.0 0.1 5891.3 0.1 2.2 identityAllZerosLong ops/ms 2722.4 0.1 5891.1 0.3 2.2 nestedBlendInnerLong ops/ms 2697.6 0.0 312148.7 2366.4 115.7 nestedBlendOuterInt ops/ms 2702.7 0.1 308686.0 2709.8 114.2 ``` On a Nvidia Grace (Neoverse-V2) machine with `-XX:UseSVE=0`: ``` Benchmark Unit Before Error After Error Uplift blendNegatedMaskInt ops/ms 7718.1 1.9 9515.9 54.0 1.2 identityAllOnesInt ops/ms 3581.9 0.6 8062.5 0.5 2.3 identityAllZerosLong ops/ms 3582.7 0.6 8058.5 11.9 2.2 nestedBlendInnerLong ops/ms 3529.6 1.4 476029.8 5190.2 134.9 nestedBlendOuterInt ops/ms 3536.9 2.1 486060.0 3442.1 137.4 ``` On an AMD EPYC 9124 16-Core Processor with option `-XX:UseAVX=3`: ``` Benchmark Unit Before Error After Error Uplift blendNegatedMaskInt ops/ms 36773.6 541.7 46467.4 499.4 1.3 identityAllOnesInt ops/ms 5262.7 3.7 13644.7 12.1 2.6 identityAllZerosLong ops/ms 5272.4 3.4 13665.3 8.4 2.6 nestedBlendInnerLong ops/ms 5256.6 4.9 436643.3 14778.8 83.1 nestedBlendOuterInt ops/ms 5253.2 1.5 223851.3 106002.6 42.6 ``` On an AMD EPYC 9124 16-Core Processor with option `-XX:UseAVX=2`: ``` Benchmark Unit Before Error After Error Uplift blendNegatedMaskInt ops/ms 24335.3 32.1 30412.3 28.1 1.2 identityAllOnesInt ops/ms 5248.8 5.0 13677.5 18.4 2.6 identityAllZerosLong ops/ms 5248.8 2.2 13655.8 2.9 2.6 nestedBlendInnerLong ops/ms 5146.2 4.6 649242.6 1174.4 126.2 nestedBlendOuterInt ops/ms 5141.8 6.2 646255.2 10654.1 125.7 ``` The microbenchmark shows a significant speedup. This is mainly because this PR eliminates redundant computations inside the loop by hoisting them out of the loop. At the same time, it reduces the number of IR uses, which can in turn enable further optimizations.

bridgekeeper · 2026-06-01T02:06:06Z

👋 Welcome back erfang! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2026-06-01T02:06:14Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2026-06-01T02:07:20Z

@erifan The following labels will be automatically applied to this pull request:

core-libs
hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

openjdk · 2026-06-01T02:07:22Z

The total number of required reviews for this PR has been set to 2 based on the presence of this label: hotspot-compiler. This can be overridden with the /reviewers command.

mlbridge · 2026-06-01T02:12:14Z

Webrevs

00: Full (f8bedad7)

openjdk Bot added hotspot-compiler hotspot-compiler-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Jun 1, 2026

openjdk Bot added the rfr Pull request is ready for review label Jun 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8384571: C2: Add some basic IGVN optimization for VectorBlendNode#31333

8384571: C2: Add some basic IGVN optimization for VectorBlendNode#31333
erifan wants to merge 1 commit into
openjdk:masterfrom
erifan:JDK-8384571-vector-blend-opt-pr1

erifan commented Jun 1, 2026 •

edited by openjdk Bot

Loading

Uh oh!

bridgekeeper Bot commented Jun 1, 2026

Uh oh!

openjdk Bot commented Jun 1, 2026

Uh oh!

openjdk Bot commented Jun 1, 2026

Uh oh!

openjdk Bot commented Jun 1, 2026

Uh oh!

mlbridge Bot commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

erifan commented Jun 1, 2026 • edited by openjdk Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

bridgekeeper Bot commented Jun 1, 2026

Uh oh!

openjdk Bot commented Jun 1, 2026

Uh oh!

openjdk Bot commented Jun 1, 2026

Uh oh!

openjdk Bot commented Jun 1, 2026

Uh oh!

mlbridge Bot commented Jun 1, 2026

Webrevs

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

erifan commented Jun 1, 2026 •

edited by openjdk Bot

Loading