Update CI to Run Benchmark Comparisons and Persist Perf Data#873
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #873 +/- ##
===========================
===========================
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Benchmark ResultsComparison against baseline from
|
| Benchmark | Current | Baseline | Change |
|---|---|---|---|
core/player/src/binding/__tests__/parser.bench.ts > parser benchmarks > Resolving binding: foo.bar |
777.93K ops/s | 2.37M ops/s | -67.1% |
core/player/src/binding/__tests__/parser.bench.ts > parser benchmarks > Resolving binding: foo.pets.1.name |
578.27K ops/s | 1.41M ops/s | -58.9% |
core/player/src/binding/__tests__/parser.bench.ts > parser benchmarks > Resolving binding: foo.pets.01.name |
490.11K ops/s | 1.38M ops/s | -64.6% |
core/player/src/binding/__tests__/parser.bench.ts > parser benchmarks > Resolving binding: foo.pets['01'].name |
434.23K ops/s | 1.28M ops/s | -66.2% |
core/player/src/binding/__tests__/parser.bench.ts > parser benchmarks > Resolving binding: foo.pets[01].name |
532.18K ops/s | 1.35M ops/s | -60.7% |
core/player/src/binding/__tests__/parser.bench.ts > parser benchmarks > Resolving binding: foo.pets[name = "frodo"].type |
298.15K ops/s | 821.60K ops/s | -63.7% |
core/player/src/binding/__tests__/parser.bench.ts > parser benchmarks > Resolving binding: foo.pets["name" = "sprinkles"].type |
230.80K ops/s | 632.37K ops/s | -63.5% |
core/player/src/binding/__tests__/parser.bench.ts > parser benchmarks > Resolving binding: foo.pets["isDog" = false].type |
274.45K ops/s | 766.83K ops/s | -64.2% |
core/player/src/binding/__tests__/parser.bench.ts > parser benchmarks > Resolving binding: foo.pets["isDog" = true].type |
271.09K ops/s | 831.64K ops/s | -67.4% |
core/player/src/binding/__tests__/parser.bench.ts > binding creation benchmarks > Resolving binding: foo.bar |
606.16K ops/s | 1.78M ops/s | -66.0% |
core/player/src/binding/__tests__/parser.bench.ts > binding creation benchmarks > Resolving binding: foo.pets.1.name |
374.73K ops/s | 1.13M ops/s | -66.7% |
core/player/src/binding/__tests__/parser.bench.ts > binding creation benchmarks > Resolving binding: foo.pets.01.name |
378.55K ops/s | 1.05M ops/s | -63.9% |
core/player/src/binding/__tests__/parser.bench.ts > binding creation benchmarks > Resolving binding: foo.pets['01'].name |
332.05K ops/s | 956.96K ops/s | -65.3% |
core/player/src/binding/__tests__/parser.bench.ts > binding creation benchmarks > Resolving binding: foo.pets[01].name |
365.35K ops/s | 1.03M ops/s | -64.5% |
core/player/src/binding/__tests__/parser.bench.ts > binding creation benchmarks > Resolving binding: foo.pets[name = "frodo"].type |
208.69K ops/s | 679.72K ops/s | -69.3% |
core/player/src/binding/__tests__/parser.bench.ts > binding creation benchmarks > Resolving binding: foo.pets["name" = "sprinkles"].type |
190.23K ops/s | 533.29K ops/s | -64.3% |
core/player/src/binding/__tests__/parser.bench.ts > binding creation benchmarks > Resolving binding: foo.pets["isDog" = false].type |
227.49K ops/s | 647.06K ops/s | -64.8% |
core/player/src/binding/__tests__/parser.bench.ts > binding creation benchmarks > Resolving binding: foo.pets["isDog" = true].type |
236.67K ops/s | 678.94K ops/s | -65.1% |
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = 1 + 3 (sync) |
397.78K ops/s | 1.36M ops/s | -70.8% |
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = 1 + 3 (async) |
311.97K ops/s | 1.10M ops/s | -71.6% |
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: conditional(true, true, false) (sync) |
462.88K ops/s | 1.30M ops/s | -64.3% |
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: conditional(true, true, false) (async) |
399.12K ops/s | 1.11M ops/s | -64.1% |
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = conditional({{bar}} > 0, true, false) (sync) |
196.57K ops/s | 598.36K ops/s | -67.1% |
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = conditional({{bar}} > 0, true, false) (async) |
178.75K ops/s | 537.26K ops/s | -66.7% |
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = conditional(conditional(true = false, false, true), conditional(false = false, true, false), conditional(true = true, false, true)) (sync) |
112.07K ops/s | 441.36K ops/s | -74.6% |
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = conditional(conditional(true = false, false, true), conditional(false = false, true, false), conditional(true = true, false, true)) (async) |
145.63K ops/s | 398.01K ops/s | -63.4% |
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = await(asyncTestFunction(1)) (sync) |
N/A | N/A | N/A |
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = await(asyncTestFunction(1)) (async) |
261.82K ops/s | 811.98K ops/s | -67.8% |
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = asyncTestFunction(1) (sync) |
316.46K ops/s | 1.02M ops/s | -68.8% |
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = asyncTestFunction(1) (async) |
290.14K ops/s | 856.32K ops/s | -66.1% |
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: asyncTestFunction(1) (sync) |
797.18K ops/s | 2.28M ops/s | -65.1% |
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: asyncTestFunction(1) (async) |
491.39K ops/s | 1.86M ops/s | -73.6% |
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = conditional(!{{bar}} == false, await(asyncTestFunction(1)), false) (sync) |
172.56K ops/s | 510.49K ops/s | -66.2% |
core/player/src/expressions/__tests__/performance.bench.ts > Expression Parsing/Execution Benchmark > Parsing: {{foo}} = conditional(!{{bar}} == false, await(asyncTestFunction(1)), false) (async) |
146.75K ops/s | 460.99K ops/s | -68.2% |
core/player/src/view/resolver/__tests__/index.bench.ts > resolver benchmarks > initial resolve |
644.98 ops/s | 1.62K ops/s | -60.3% |
core/player/src/view/resolver/__tests__/index.bench.ts > resolver benchmarks > Resolving from cache |
19.66K ops/s | 48.38K ops/s | -59.4% |
core/player/src/view/resolver/__tests__/index.bench.ts > resolver benchmarks > data changes |
2.92K ops/s | 6.98K ops/s | -58.1% |
core/player/src/view/resolver/__tests__/index.bench.ts > resolver benchmarks > data changes slow |
651.72 ops/s | 1.57K ops/s | -58.4% |
plugins/async-node/core ⚠️
| Benchmark | Current | Baseline | Change |
|---|---|---|---|
plugins/async-node/core/src/__tests__/index.bench.ts > async node benchmarks > Resolve Async Node 1 times |
15.12K ops/s | 61.54K ops/s | -75.4% |
plugins/async-node/core/src/__tests__/index.bench.ts > async node benchmarks > Resolve Async Node 5 times |
14.66K ops/s | 45.63K ops/s | -67.9% |
plugins/async-node/core/src/__tests__/index.bench.ts > async node benchmarks > Resolve Async Node 10 times |
11.64K ops/s | 35.74K ops/s | -67.4% |
plugins/async-node/core/src/__tests__/index.bench.ts > async node benchmarks > Resolve Async Node 50 times |
3.54K ops/s | 12.54K ops/s | -71.7% |
plugins/async-node/core/src/__tests__/index.bench.ts > async node benchmarks > Resolve Async Node 100 times |
1.92K ops/s | 6.05K ops/s | -68.3% |
plugins/async-node/core/src/__tests__/transform.bench.ts > async transform benchmarks > Resolve Async Node 1 times |
7.87K ops/s | 30.34K ops/s | -74.1% |
plugins/async-node/core/src/__tests__/transform.bench.ts > async transform benchmarks > Resolve Async Node 5 times |
8.22K ops/s | 26.47K ops/s | -68.9% |
plugins/async-node/core/src/__tests__/transform.bench.ts > async transform benchmarks > Resolve Async Node 10 times |
7.75K ops/s | 19.35K ops/s | -60.0% |
plugins/async-node/core/src/__tests__/transform.bench.ts > async transform benchmarks > Resolve Async Node 50 times |
2.89K ops/s | 8.58K ops/s | -66.2% |
plugins/async-node/core/src/__tests__/transform.bench.ts > async transform benchmarks > Resolve Async Node 100 times |
1.78K ops/s | 4.85K ops/s | -63.3% |
react/player ⚠️
| Benchmark | Current | Baseline | Change |
|---|---|---|---|
react/player/src/asset/__tests__/index.bench.tsx > ReactAsset benchmarks > Render asset nested in 1 ReactAssets |
611.19 ops/s | 728.40 ops/s | -16.1% |
react/player/src/asset/__tests__/index.bench.tsx > ReactAsset benchmarks > Bubble errors nested in 1 ReactAssets |
1.08K ops/s | 5.24K ops/s | -79.4% |
react/player/src/asset/__tests__/index.bench.tsx > ReactAsset benchmarks > Render asset nested in 5 ReactAssets |
623.83 ops/s | 766.87 ops/s | -18.7% |
react/player/src/asset/__tests__/index.bench.tsx > ReactAsset benchmarks > Bubble errors nested in 5 ReactAssets |
1.09K ops/s | 3.90K ops/s | -72.1% |
react/player/src/asset/__tests__/index.bench.tsx > ReactAsset benchmarks > Render asset nested in 10 ReactAssets |
627.48 ops/s | 759.73 ops/s | -17.4% |
react/player/src/asset/__tests__/index.bench.tsx > ReactAsset benchmarks > Bubble errors nested in 10 ReactAssets |
924.98 ops/s | 2.49K ops/s | -62.8% |
react/player/src/asset/__tests__/index.bench.tsx > ReactAsset benchmarks > Render asset nested in 50 ReactAssets |
462.67 ops/s | 658.59 ops/s | -29.7% |
react/player/src/asset/__tests__/index.bench.tsx > ReactAsset benchmarks > Bubble errors nested in 50 ReactAssets |
245.66 ops/s | 631.28 ops/s | -61.1% |
react/player/src/asset/__tests__/index.bench.tsx > ReactAsset benchmarks > Render asset nested in 100 ReactAssets |
405.06 ops/s | 548.45 ops/s | -26.1% |
react/player/src/asset/__tests__/index.bench.tsx > ReactAsset benchmarks > Bubble errors nested in 100 ReactAssets |
125.38 ops/s | 277.93 ops/s | -54.9% |
|
For the record, the regressions are because the initial baseline benchmarks were run on my local machine which is bigger than the benchmark executor by a factor of 4-5x. Once this is merged into main and the new baselines are established, there should be much less variance between runs. |
sugarmanz
left a comment
There was a problem hiding this comment.
Thoughts on tying in other platforms perf tests? Should this pattern be used for all?
I know we've been somewhat against perf tests as PR checks given it's difficult to guarantee executor consistency across runs. Do we think that'll be an issue? What's the expectation if I open a PR that shows massive negative impact?
| find . -path '*/benchmarks/current.json' -not -path './bazel-*' | while read f; do | ||
| cp "$f" "$(dirname $f)/baseline.json" | ||
| done |
There was a problem hiding this comment.
Should we just have a Bazel runnable to write the benchmark results back to source?
There was a problem hiding this comment.
So right now the benchmark tests do write the current run back to source. This is just renaming them to make them the baseline. Are you suggesting another bazel command to do that update instead (kind of like how our rules_player doc targets work?)
There was a problem hiding this comment.
Oh, gotcha! Makes sense then — we might just want to standardize the way we do "golden" files. For the Kotlin .api files, there is a build target to actually generate the .api file, a test target to diff that build target against the "golden" file, and a run target to write the built target as the "golden" file. I know we have the comparison script outside of Bazel, which kinda makes sense cause it's not a strict pass/fail, but maybe the build/update targets make sense?
I don't think we need to change for this PR, just forward thinking.
We can do that! To be honest I wasn't aware that we had perf tests in the other platforms.
It does look like there are slight variances run to run but I think it should be relatively stable. I think procedure-wise we should treat this as a "yellow flag" for now. If we see performance loss somewhere we can test on our local machines which should be more stable. But this at least surfaces things up easier than before. |
sugarmanz
left a comment
There was a problem hiding this comment.
Can we make a ticket for tying in the other platforms? Getting the baseline merged makes more sense than holding it up
mainbenchmarks are run and committed to the repo.Change Type (required)
Indicate the type of change your pull request is:
patchminormajorN/ADoes your PR have any documentation updates?