8384571: C2: Add some basic IGVN optimization for VectorBlendNode#31333
Open
erifan wants to merge 1 commit into
Open
8384571: C2: Add some basic IGVN optimization for VectorBlendNode#31333erifan wants to merge 1 commit into
erifan wants to merge 1 commit into
Conversation
This PR introduces the basic Ideal/Identity transformations for `VectorBlendNode`. The semantic of `VectorBlend(X, Y, M)` is: `M ? Y : X`. **Identity**: ``` (VectorBlend X Y (Replicate -1)) => Y (VectorBlend X Y (MaskAll -1)) => Y (VectorBlend X Y (Replicate 0)) => X (VectorBlend X Y (MaskAll 0)) => X ``` **Ideal**: ``` (VectorBlend (VectorBlend X A M) B M) => (VectorBlend X B M) (VectorBlend A (VectorBlend B X M) M) => (VectorBlend A X M) (VectorBlend A B (XorV/XorVMask M -1)) => (VectorBlend B A M) ``` Also corrects the VectorBlendNode header comment: across all backends (X86 SSE/AVX, AArch64 NEON/SVE, RISC-V V) the active mask lane selects `vec2` (in(2)), and the inactive lane selects `vec1` (in(1)). JTReg and JMH tests are also added for each optimization pattern. All tests (tier1, tier2, and tier3) passed on AArch64 and X86 platforms. JMH benchmark test results: On a Nvidia Grace (Neoverse-V2) machine with 128-bit SVE2: ``` Benchmark Unit Before Error After Error Uplift blendNegatedMaskInt ops/ms 7990.6 2.8 10215.2 11.0 1.3 identityAllOnesInt ops/ms 3574.8 2.6 7967.1 0.3 2.2 identityAllZerosLong ops/ms 3575.6 1.0 7966.0 3.6 2.2 nestedBlendInnerLong ops/ms 3533.8 2.8 478573.0 3178.5 135.4 nestedBlendOuterInt ops/ms 3537.6 3.4 472242.2 3034.2 133.5 ``` On an AWS Graviton3 (Neoverse-V1) machine with 256-bit SVE1: ``` Benchmark Unit Before Error After Error Uplift blendNegatedMaskInt ops/ms 5171.9 5.2 8129.0 17.3 1.6 identityAllOnesInt ops/ms 2722.0 0.1 5891.3 0.1 2.2 identityAllZerosLong ops/ms 2722.4 0.1 5891.1 0.3 2.2 nestedBlendInnerLong ops/ms 2697.6 0.0 312148.7 2366.4 115.7 nestedBlendOuterInt ops/ms 2702.7 0.1 308686.0 2709.8 114.2 ``` On a Nvidia Grace (Neoverse-V2) machine with `-XX:UseSVE=0`: ``` Benchmark Unit Before Error After Error Uplift blendNegatedMaskInt ops/ms 7718.1 1.9 9515.9 54.0 1.2 identityAllOnesInt ops/ms 3581.9 0.6 8062.5 0.5 2.3 identityAllZerosLong ops/ms 3582.7 0.6 8058.5 11.9 2.2 nestedBlendInnerLong ops/ms 3529.6 1.4 476029.8 5190.2 134.9 nestedBlendOuterInt ops/ms 3536.9 2.1 486060.0 3442.1 137.4 ``` On an AMD EPYC 9124 16-Core Processor with option `-XX:UseAVX=3`: ``` Benchmark Unit Before Error After Error Uplift blendNegatedMaskInt ops/ms 36773.6 541.7 46467.4 499.4 1.3 identityAllOnesInt ops/ms 5262.7 3.7 13644.7 12.1 2.6 identityAllZerosLong ops/ms 5272.4 3.4 13665.3 8.4 2.6 nestedBlendInnerLong ops/ms 5256.6 4.9 436643.3 14778.8 83.1 nestedBlendOuterInt ops/ms 5253.2 1.5 223851.3 106002.6 42.6 ``` On an AMD EPYC 9124 16-Core Processor with option `-XX:UseAVX=2`: ``` Benchmark Unit Before Error After Error Uplift blendNegatedMaskInt ops/ms 24335.3 32.1 30412.3 28.1 1.2 identityAllOnesInt ops/ms 5248.8 5.0 13677.5 18.4 2.6 identityAllZerosLong ops/ms 5248.8 2.2 13655.8 2.9 2.6 nestedBlendInnerLong ops/ms 5146.2 4.6 649242.6 1174.4 126.2 nestedBlendOuterInt ops/ms 5141.8 6.2 646255.2 10654.1 125.7 ``` The microbenchmark shows a significant speedup. This is mainly because this PR eliminates redundant computations inside the loop by hoisting them out of the loop. At the same time, it reduces the number of IR uses, which can in turn enable further optimizations.
|
👋 Welcome back erfang! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
|
The total number of required reviews for this PR has been set to 2 based on the presence of this label: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces the basic Ideal/Identity transformations for
VectorBlendNode.The semantic of
VectorBlend(X, Y, M)is:M ? Y : X.Identity:
Ideal:
Also corrects the VectorBlendNode header comment: across all backends (X86 SSE/AVX, AArch64 NEON/SVE, RISC-V V) the active mask lane selects
vec2(in(2)), and the inactive lane selectsvec1(in(1)).JTReg and JMH tests are also added for each optimization pattern. All tests (tier1, tier2, and tier3) passed on AArch64 and X86 platforms.
JMH benchmark test results:
On a Nvidia Grace (Neoverse-V2) machine with 128-bit SVE2:
On an AWS Graviton3 (Neoverse-V1) machine with 256-bit SVE1:
On a Nvidia Grace (Neoverse-V2) machine with
-XX:UseSVE=0:On an AMD EPYC 9124 16-Core Processor with option
-XX:UseAVX=3:On an AMD EPYC 9124 16-Core Processor with option
-XX:UseAVX=2:The microbenchmark shows a significant speedup. This is mainly because this PR eliminates redundant computations inside the loop by hoisting them out of the loop. At the same time, it reduces the number of IR uses, which can in turn enable further optimizations.
Progress
Issue
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/31333/head:pull/31333$ git checkout pull/31333Update a local copy of the PR:
$ git checkout pull/31333$ git pull https://git.openjdk.org/jdk.git pull/31333/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 31333View PR using the GUI difftool:
$ git pr show -t 31333Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/31333.diff
Using Webrev
Link to Webrev Comment