Skip to content

ci(git_perf): rerun cross-runner variance experiment to confirm findings#734

Draft
kaihowl wants to merge 4 commits into
masterfrom
claude/cross-runner-variance-experiment-2dkx2n
Draft

ci(git_perf): rerun cross-runner variance experiment to confirm findings#734
kaihowl wants to merge 4 commits into
masterfrom
claude/cross-runner-variance-experiment-2dkx2n

Conversation

@kaihowl

@kaihowl kaihowl commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Summary

  • Adds permissions: contents: write to cross-runner-variance.yml so the analyze job can commit results back to the branch
  • Adds a "Print analysis results" step that echoes verdict/CoV tables to stdout (readable via MCP job logs)
  • Adds a "Commit results to branch" step that copies updated markdown and PNG analysis outputs into docs/ci-variance-experiment/ and pushes them back automatically after each run

Purpose

Reruns the experiment to confirm previous findings (min+MAD still best for CI benchmarking) after the runner pool may have changed.

Test plan

  • Trigger cross-runner-variance.yml on this branch via workflow_dispatch
  • Verify all 20 measure jobs complete (10 ubuntu + 10 macos)
  • Verify analyze job prints verdict table to stdout in logs
  • Verify analyze job commits updated docs/ci-variance-experiment/ back to this branch
  • Compare new CoV values against previous findings in README

https://claude.ai/code/session_01FqAWLBbGE9MFHDRCFBJdKy


Generated by Claude Code

After the analysis step, print verdict and CoV tables to stdout (readable
via job logs through MCP tools) and commit updated markdown/PNG results
back to the branch automatically.

Also add workflow-level `contents: write` permission required for the
commit-push step.

https://claude.ai/code/session_01FqAWLBbGE9MFHDRCFBJdKy
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Performance Report

Performance Results

Audit Results

Auditing measurement "add-benchmark" (os=ubuntu-22.04/rust=beta):
  ⚠️  WARNING: Change points detected in current epoch for 'add-benchmark':
     commit 6e3baa3 (-36.5%)
     commit ee82970 (+63.3%)
     commit 5533710 (-39.0%)
     commit 8ef210c (+53.1%)
     commit 3541c6a (-38.1%)
     Historical z-score comparison may be unreliable due to regime shift.
     Consider bumping epoch or investigating the change.
  ✅ 'add-benchmark'
  Aggregation: min
  z-score (mad): ↑ 2.02
  Head: μ: 25.6ms σ: N/A MAD: 0ns n: 1
  Tail: μ: 21.8ms σ: 4ms MAD: 1.9ms n: 31
   [-38.10% – +10.13%] █▇▇█▆██▇█▇▂▂█▂▇▇█▆▇█▆▁▁▁▇▆▄▁▄▄▄█

Auditing measurement "add-benchmark" (os=ubuntu-22.04/rust=stable):
  ⚠️  WARNING: Change points detected in current epoch for 'add-benchmark':
     commit b194e1c (-62.4%)
     commit c2dc7a7 (+187.3%)
     Historical z-score comparison may be unreliable due to regime shift.
     Consider bumping epoch or investigating the change.
  ✅ 'add-benchmark'
  Aggregation: min
  z-score (mad): ↑ 2.36
  Head: μ: 25.9ms σ: N/A MAD: 0ns n: 1
  Tail: μ: 21.3ms σ: 4.6ms MAD: 1.9ms n: 28
   [-64.47% – +10.59%] ▄▇▇▄█▇▁▇█▇▇█▇▇▆▇█▄▇▄▇▇▇▃▆▆▆▃█

Auditing measurement "bench::add_measurements/add_measurement/1::median" (os=ubuntu-22.04/rust=beta):
  ✅ 'bench::add_measurements/add_measurement/1::median'
  Aggregation: median
  z-score (mad): ↑ 1.24
  Head: μ: 19.5ms σ: N/A MAD: 0ns n: 1
  Tail: μ: 18.9ms σ: 793.9µs MAD: 470µs n: 5
   [-2.52% – +6.76%] ▇▁▃█▁▆

Auditing measurement "bench::add_measurements/add_measurement/1::median" (os=ubuntu-22.04/rust=stable):
  ✅ 'bench::add_measurements/add_measurement/1::median'
  Aggregation: median
  z-score (mad): ↑ 2.04
  Head: μ: 19.4ms σ: N/A MAD: 0ns n: 1
  Tail: μ: 15.9ms σ: 3.9ms MAD: 1.8ms n: 5
   [-36.13% – +9.81%] ▁▇▇▁██

Auditing measurement "bench::report/report_generation/10::median" (os=ubuntu-22.04/rust=beta):
  ✅ 'bench::report/report_generation/10::median'
  Aggregation: median
  z-score (mad): ↑ 1.41
  Head: μ: 23.2ms σ: N/A MAD: 0ns n: 1
  Tail: μ: 22.7ms σ: 840.5µs MAD: 355.4µs n: 7
   [-6.93% – +1.53%] ▇▁▃█▁▇▇▇

Auditing measurement "bench::report/report_generation/10::median" (os=ubuntu-22.04/rust=stable):
  ✅ 'bench::report/report_generation/10::median'
  Aggregation: median
  z-score (mad): ↑ 3.61
  Head: μ: 23.3ms σ: N/A MAD: 0ns n: 1
  Tail: μ: 18.2ms σ: 6.3ms MAD: 1.4ms n: 7
   [-67.12% – +6.48%] ▄██▄█▇▁█
  Note: Passed due to relative deviation (6.0%) being below threshold (10.0%)

Auditing measurement "release-binary-size" (os=ubuntu-22.04/rust=stable):
  ✅ 'release-binary-size'
  Aggregation: min
  z-score (stddev): ↑ 1.04
  Head: μ: 8.3MB σ: N/A MAD: 0B n: 1
  Tail: μ: 8.2MB σ: 78.6kB MAD: 35.3kB n: 36
   [-2.57% – +0.61%] ████▇▇▇▇▇▇▇▇▇▇▇▇▇▇▆▆▆▆▅▅▅▅▅▅▅▁▂▂▂▂▂▂█

Auditing measurement "report" (os=ubuntu-22.04/rust=beta):
  ⚠️  WARNING: Change points detected in current epoch for 'report':
     commit 6e3baa3 (-36.5%)
     commit ee82970 (+61.2%)
     commit 5533710 (-37.7%)
     commit 8ef210c (+53.8%)
     commit 3541c6a (-34.4%)
     commit dc90cee (+53.1%)
     Historical z-score comparison may be unreliable due to regime shift.
     Consider bumping epoch or investigating the change.
  ✅ 'report'
  Aggregation: min
  z-score (stddev): ↑ 0.93
  Head: μ: 29.6ms σ: N/A MAD: 0ns n: 1
  Tail: μ: 25.6ms σ: 4.2ms MAD: 2.3ms n: 33
   [-33.89% – +12.31%] █▆▇█▆█▇▆▇▆▁▁▇▁▆▇▇▇▆▇▆▁▁▁▇▆▅▁▄▄▄▅▅▇

Auditing measurement "report" (os=ubuntu-22.04/rust=stable):
  ⚠️  WARNING: Change points detected in current epoch for 'report':
     commit b194e1c (-62.8%)
     commit c2dc7a7 (+189.9%)
     commit d21b515 (+33.7%)
     Historical z-score comparison may be unreliable due to regime shift.
     Consider bumping epoch or investigating the change.
  ✅ 'report'
  Aggregation: min
  z-score (stddev): ↑ 0.95
  Head: μ: 29.9ms σ: N/A MAD: 0ns n: 1
  Tail: μ: 25.1ms σ: 5ms MAD: 2.1ms n: 30
   [-64.80% – +9.11%] ▄█▇▄█▇▁▇█▇██▇█▆▇█▄▇▄█▇█▄▆▆▇▄▆▆█

Auditing measurement "report-benchmark" (os=ubuntu-22.04/rust=beta):
  ⚠️  WARNING: Change points detected in current epoch for 'report-benchmark':
     commit 6e3baa3 (-35.8%)
     commit ee82970 (+59.6%)
     commit 5533710 (-37.9%)
     commit 8ef210c (+53.9%)
     commit 3541c6a (-34.3%)
     commit dc90cee (+53.0%)
     Historical z-score comparison may be unreliable due to regime shift.
     Consider bumping epoch or investigating the change.
  ✅ 'report-benchmark'
  Aggregation: min
  z-score (stddev): ↑ 0.87
  Head: μ: 29.5ms σ: N/A MAD: 0ns n: 1
  Tail: μ: 25.8ms σ: 4.3ms MAD: 2.1ms n: 31
   [-33.53% – +11.44%] █▆▇█▆▇▇▆▇▆▁▁▇▁▆▇▇▆▆▇▆▁▁▁▇▆▅▁▅▄▅▇

Auditing measurement "report-benchmark" (os=ubuntu-22.04/rust=stable):
  ⚠️  WARNING: Change points detected in current epoch for 'report-benchmark':
     commit b194e1c (-63.1%)
     commit c2dc7a7 (+190.0%)
     Historical z-score comparison may be unreliable due to regime shift.
     Consider bumping epoch or investigating the change.
  ✅ 'report-benchmark'
  Aggregation: min
  z-score (stddev): ↑ 0.88
  Head: μ: 29.8ms σ: N/A MAD: 0ns n: 1
  Tail: μ: 25.2ms σ: 5.2ms MAD: 2ms n: 28
   [-65.10% – +10.16%] ▄▇▇▄█▇▁▇█▇▇█▇▇▆▇█▄▇▄█▇█▄▆▆▆▄█

Auditing measurement "report-size" (os=ubuntu-22.04/rust=beta):
  ✅ 'report-size'
  Aggregation: min
  z-score (stddev): →
  Head: μ: 20.3kB σ: N/A MAD: 0B n: 1
  Tail: μ: 20.3kB σ: 0B MAD: 0B n: 33
   [+0.00% – +0.00%] ▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅

Auditing measurement "report-size" (os=ubuntu-22.04/rust=stable):
  ✅ 'report-size'
  Aggregation: min
  z-score (stddev): →
  Head: μ: 20.3kB σ: N/A MAD: 0B n: 1
  Tail: μ: 20.3kB σ: 0B MAD: 0B n: 30
   [+0.00% – +0.00%] ▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅

Auditing measurement "report-size-benchmark" (os=ubuntu-22.04/rust=beta):
  ✅ 'report-size-benchmark'
  Aggregation: median
  z-score (stddev): →
  Head: μ: 20.3kB σ: N/A MAD: 0B n: 1
  Tail: μ: 20.3kB σ: 0B MAD: 0B n: 33
   [+0.00% – +0.00%] ▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅

Auditing measurement "report-size-benchmark" (os=ubuntu-22.04/rust=stable):
  ✅ 'report-size-benchmark'
  Aggregation: median
  z-score (stddev): →
  Head: μ: 20.3kB σ: N/A MAD: 0B n: 1
  Tail: μ: 20.3kB σ: 0B MAD: 0B n: 30
   [+0.00% – +0.00%] ▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅

Overall: PASSED (15/15 groups passed)

Measurement Storage Size

Live Measurement Size Report
============================

⚠️  Shallow clone detected - measurement counts may be incomplete (see FAQ)

Number of commits with measurements: 364
Total measurement data size (on-disk (compressed)): 523.4kB

Repository Statistics (for context):
-------------------------------------
  Loose objects: 0 (0B)
  Packed objects: 18662 (4.4MB)
  Total repository size: 4.4MB

Created by git-perf

github-actions Bot and others added 3 commits June 8, 2026 20:47
When two workflow dispatches finish around the same time, both analyze
jobs try to push to the same branch and one gets rejected. Since the
committed files are fully generated content, force push is correct —
last run wins.

https://claude.ai/code/session_01FqAWLBbGE9MFHDRCFBJdKy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants