Skip to content

Update CI to Run Benchmark Comparisons and Persist Perf Data#873

Merged
KetanReddy merged 6 commits into
mainfrom
maint/improved-benchmarks
Jun 4, 2026
Merged

Update CI to Run Benchmark Comparisons and Persist Perf Data#873
KetanReddy merged 6 commits into
mainfrom
maint/improved-benchmarks

Conversation

@KetanReddy

Copy link
Copy Markdown
Member
  • On PR branches benchmark runs compare data against main's last run data. Comparisons are posted to github via an auto comment.
  • On main benchmarks are run and committed to the repo.

Change Type (required)

Indicate the type of change your pull request is:

  • patch
  • minor
  • major
  • N/A

Does your PR have any documentation updates?

  • Updated docs
  • No Update needed
  • Unable to update docs

@KetanReddy KetanReddy added the skip-release Preserve the current version when merged label Jun 1, 2026
@codecov

codecov Bot commented Jun 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 0.00%. Comparing base (55defef) to head (b4e3730).

Additional details and impacted files
@@     Coverage Diff     @@
##   main   #873   +/-   ##
===========================
===========================

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@intuit-svc

intuit-svc commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Benchmark Results

Comparison against baseline from main. ⚠️ = regression (>10% slower), ✅ = improvement (>5% faster)

core/player ⚠️

Benchmark Current Baseline Change
core/player/src/binding/__tests__/parser.bench.ts > parser benchmarks > Resolving binding: foo.bar 777.93K ops/s 2.37M ops/s -67.1% ⚠️
core/player/src/binding/__tests__/parser.bench.ts > parser benchmarks > Resolving binding: foo.pets.1.name 578.27K ops/s 1.41M ops/s -58.9% ⚠️
core/player/src/binding/__tests__/parser.bench.ts > parser benchmarks > Resolving binding: foo.pets.01.name 490.11K ops/s 1.38M ops/s -64.6% ⚠️
core/player/src/binding/__tests__/parser.bench.ts > parser benchmarks > Resolving binding: foo.pets['01'].name 434.23K ops/s 1.28M ops/s -66.2% ⚠️
core/player/src/binding/__tests__/parser.bench.ts > parser benchmarks > Resolving binding: foo.pets[01].name 532.18K ops/s 1.35M ops/s -60.7% ⚠️
core/player/src/binding/__tests__/parser.bench.ts > parser benchmarks > Resolving binding: foo.pets[name = "frodo"].type 298.15K ops/s 821.60K ops/s -63.7% ⚠️
core/player/src/binding/__tests__/parser.bench.ts > parser benchmarks > Resolving binding: foo.pets["name" = "sprinkles"].type 230.80K ops/s 632.37K ops/s -63.5% ⚠️
core/player/src/binding/__tests__/parser.bench.ts > parser benchmarks > Resolving binding: foo.pets["isDog" = false].type 274.45K ops/s 766.83K ops/s -64.2% ⚠️
core/player/src/binding/__tests__/parser.bench.ts > parser benchmarks > Resolving binding: foo.pets["isDog" = true].type 271.09K ops/s 831.64K ops/s -67.4% ⚠️
core/player/src/binding/__tests__/parser.bench.ts > binding creation benchmarks > Resolving binding: foo.bar 606.16K ops/s 1.78M ops/s -66.0% ⚠️
core/player/src/binding/__tests__/parser.bench.ts > binding creation benchmarks > Resolving binding: foo.pets.1.name 374.73K ops/s 1.13M ops/s -66.7% ⚠️
core/player/src/binding/__tests__/parser.bench.ts > binding creation benchmarks > Resolving binding: foo.pets.01.name 378.55K ops/s 1.05M ops/s -63.9% ⚠️
core/player/src/binding/__tests__/parser.bench.ts > binding creation benchmarks > Resolving binding: foo.pets['01'].name 332.05K ops/s 956.96K ops/s -65.3% ⚠️
core/player/src/binding/__tests__/parser.bench.ts > binding creation benchmarks > Resolving binding: foo.pets[01].name 365.35K ops/s 1.03M ops/s -64.5% ⚠️
core/player/src/binding/__tests__/parser.bench.ts > binding creation benchmarks > Resolving binding: foo.pets[name = "frodo"].type 208.69K ops/s 679.72K ops/s -69.3% ⚠️
core/player/src/binding/__tests__/parser.bench.ts > binding creation benchmarks > Resolving binding: foo.pets["name" = "sprinkles"].type 190.23K ops/s 533.29K ops/s -64.3% ⚠️
core/player/src/binding/__tests__/parser.bench.ts > binding creation benchmarks > Resolving binding: foo.pets["isDog" = false].type 227.49K ops/s 647.06K ops/s -64.8% ⚠️
core/player/src/binding/__tests__/parser.bench.ts > binding creation benchmarks > Resolving binding: foo.pets["isDog" = true].type 236.67K ops/s 678.94K ops/s -65.1% ⚠️
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = 1 + 3 (sync) 397.78K ops/s 1.36M ops/s -70.8% ⚠️
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = 1 + 3 (async) 311.97K ops/s 1.10M ops/s -71.6% ⚠️
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: conditional(true, true, false) (sync) 462.88K ops/s 1.30M ops/s -64.3% ⚠️
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: conditional(true, true, false) (async) 399.12K ops/s 1.11M ops/s -64.1% ⚠️
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = conditional({{bar}} > 0, true, false) (sync) 196.57K ops/s 598.36K ops/s -67.1% ⚠️
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = conditional({{bar}} > 0, true, false) (async) 178.75K ops/s 537.26K ops/s -66.7% ⚠️
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = conditional(conditional(true = false, false, true), conditional(false = false, true, false), conditional(true = true, false, true)) (sync) 112.07K ops/s 441.36K ops/s -74.6% ⚠️
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = conditional(conditional(true = false, false, true), conditional(false = false, true, false), conditional(true = true, false, true)) (async) 145.63K ops/s 398.01K ops/s -63.4% ⚠️
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = await(asyncTestFunction(1)) (sync) N/A N/A N/A
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = await(asyncTestFunction(1)) (async) 261.82K ops/s 811.98K ops/s -67.8% ⚠️
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = asyncTestFunction(1) (sync) 316.46K ops/s 1.02M ops/s -68.8% ⚠️
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = asyncTestFunction(1) (async) 290.14K ops/s 856.32K ops/s -66.1% ⚠️
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: asyncTestFunction(1) (sync) 797.18K ops/s 2.28M ops/s -65.1% ⚠️
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: asyncTestFunction(1) (async) 491.39K ops/s 1.86M ops/s -73.6% ⚠️
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = conditional(!{{bar}} == false, await(asyncTestFunction(1)), false) (sync) 172.56K ops/s 510.49K ops/s -66.2% ⚠️
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = conditional(!{{bar}} == false, await(asyncTestFunction(1)), false) (async) 146.75K ops/s 460.99K ops/s -68.2% ⚠️
core/player/src/view/resolver/__tests__/index.bench.ts > resolver benchmarks > initial resolve 644.98 ops/s 1.62K ops/s -60.3% ⚠️
core/player/src/view/resolver/__tests__/index.bench.ts > resolver benchmarks > Resolving from cache 19.66K ops/s 48.38K ops/s -59.4% ⚠️
core/player/src/view/resolver/__tests__/index.bench.ts > resolver benchmarks > data changes 2.92K ops/s 6.98K ops/s -58.1% ⚠️
core/player/src/view/resolver/__tests__/index.bench.ts > resolver benchmarks > data changes slow 651.72 ops/s 1.57K ops/s -58.4% ⚠️

plugins/async-node/core ⚠️

Benchmark Current Baseline Change
plugins/async-node/core/src/__tests__/index.bench.ts > async node benchmarks > Resolve Async Node 1 times 15.12K ops/s 61.54K ops/s -75.4% ⚠️
plugins/async-node/core/src/__tests__/index.bench.ts > async node benchmarks > Resolve Async Node 5 times 14.66K ops/s 45.63K ops/s -67.9% ⚠️
plugins/async-node/core/src/__tests__/index.bench.ts > async node benchmarks > Resolve Async Node 10 times 11.64K ops/s 35.74K ops/s -67.4% ⚠️
plugins/async-node/core/src/__tests__/index.bench.ts > async node benchmarks > Resolve Async Node 50 times 3.54K ops/s 12.54K ops/s -71.7% ⚠️
plugins/async-node/core/src/__tests__/index.bench.ts > async node benchmarks > Resolve Async Node 100 times 1.92K ops/s 6.05K ops/s -68.3% ⚠️
plugins/async-node/core/src/__tests__/transform.bench.ts > async transform benchmarks > Resolve Async Node 1 times 7.87K ops/s 30.34K ops/s -74.1% ⚠️
plugins/async-node/core/src/__tests__/transform.bench.ts > async transform benchmarks > Resolve Async Node 5 times 8.22K ops/s 26.47K ops/s -68.9% ⚠️
plugins/async-node/core/src/__tests__/transform.bench.ts > async transform benchmarks > Resolve Async Node 10 times 7.75K ops/s 19.35K ops/s -60.0% ⚠️
plugins/async-node/core/src/__tests__/transform.bench.ts > async transform benchmarks > Resolve Async Node 50 times 2.89K ops/s 8.58K ops/s -66.2% ⚠️
plugins/async-node/core/src/__tests__/transform.bench.ts > async transform benchmarks > Resolve Async Node 100 times 1.78K ops/s 4.85K ops/s -63.3% ⚠️

react/player ⚠️

Benchmark Current Baseline Change
react/player/src/asset/__tests__/index.bench.tsx > ReactAsset benchmarks > Render asset nested in 1 ReactAssets 611.19 ops/s 728.40 ops/s -16.1% ⚠️
react/player/src/asset/__tests__/index.bench.tsx > ReactAsset benchmarks > Bubble errors nested in 1 ReactAssets 1.08K ops/s 5.24K ops/s -79.4% ⚠️
react/player/src/asset/__tests__/index.bench.tsx > ReactAsset benchmarks > Render asset nested in 5 ReactAssets 623.83 ops/s 766.87 ops/s -18.7% ⚠️
react/player/src/asset/__tests__/index.bench.tsx > ReactAsset benchmarks > Bubble errors nested in 5 ReactAssets 1.09K ops/s 3.90K ops/s -72.1% ⚠️
react/player/src/asset/__tests__/index.bench.tsx > ReactAsset benchmarks > Render asset nested in 10 ReactAssets 627.48 ops/s 759.73 ops/s -17.4% ⚠️
react/player/src/asset/__tests__/index.bench.tsx > ReactAsset benchmarks > Bubble errors nested in 10 ReactAssets 924.98 ops/s 2.49K ops/s -62.8% ⚠️
react/player/src/asset/__tests__/index.bench.tsx > ReactAsset benchmarks > Render asset nested in 50 ReactAssets 462.67 ops/s 658.59 ops/s -29.7% ⚠️
react/player/src/asset/__tests__/index.bench.tsx > ReactAsset benchmarks > Bubble errors nested in 50 ReactAssets 245.66 ops/s 631.28 ops/s -61.1% ⚠️
react/player/src/asset/__tests__/index.bench.tsx > ReactAsset benchmarks > Render asset nested in 100 ReactAssets 405.06 ops/s 548.45 ops/s -26.1% ⚠️
react/player/src/asset/__tests__/index.bench.tsx > ReactAsset benchmarks > Bubble errors nested in 100 ReactAssets 125.38 ops/s 277.93 ops/s -54.9% ⚠️

@KetanReddy KetanReddy marked this pull request as ready for review June 1, 2026 22:10
@KetanReddy KetanReddy requested review from a team as code owners June 1, 2026 22:10
@KetanReddy

Copy link
Copy Markdown
Member Author

For the record, the regressions are because the initial baseline benchmarks were run on my local machine which is bigger than the benchmark executor by a factor of 4-5x. Once this is merged into main and the new baselines are established, there should be much less variance between runs.

@KetanReddy KetanReddy requested a review from a team as a code owner June 3, 2026 01:56

@sugarmanz sugarmanz left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on tying in other platforms perf tests? Should this pattern be used for all?

I know we've been somewhat against perf tests as PR checks given it's difficult to guarantee executor consistency across runs. Do we think that'll be an issue? What's the expectation if I open a PR that shows massive negative impact?

Comment thread .circleci/config.yml
Comment thread .circleci/config.yml
Comment on lines +387 to +389
find . -path '*/benchmarks/current.json' -not -path './bazel-*' | while read f; do
cp "$f" "$(dirname $f)/baseline.json"
done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just have a Bazel runnable to write the benchmark results back to source?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So right now the benchmark tests do write the current run back to source. This is just renaming them to make them the baseline. Are you suggesting another bazel command to do that update instead (kind of like how our rules_player doc targets work?)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, gotcha! Makes sense then — we might just want to standardize the way we do "golden" files. For the Kotlin .api files, there is a build target to actually generate the .api file, a test target to diff that build target against the "golden" file, and a run target to write the built target as the "golden" file. I know we have the comparison script outside of Bazel, which kinda makes sense cause it's not a strict pass/fail, but maybe the build/update targets make sense?

I don't think we need to change for this PR, just forward thinking.

@KetanReddy

KetanReddy commented Jun 3, 2026

Copy link
Copy Markdown
Member Author

Thoughts on tying in other platforms perf tests? Should this pattern be used for all?

We can do that! To be honest I wasn't aware that we had perf tests in the other platforms.

I know we've been somewhat against perf tests as PR checks given it's difficult to guarantee executor consistency across runs. Do we think that'll be an issue? What's the expectation if I open a PR that shows massive negative impact?

It does look like there are slight variances run to run but I think it should be relatively stable. I think procedure-wise we should treat this as a "yellow flag" for now. If we see performance loss somewhere we can test on our local machines which should be more stable. But this at least surfaces things up easier than before.

@sugarmanz sugarmanz left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make a ticket for tying in the other platforms? Getting the baseline merged makes more sense than holding it up

@KetanReddy KetanReddy merged commit 08bd284 into main Jun 4, 2026
24 checks passed
@KetanReddy KetanReddy deleted the maint/improved-benchmarks branch June 4, 2026 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-release Preserve the current version when merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants