ci: add gpu benchmarks by JuArce · Pull Request #724 · yetanotherco/lambda_vm

JuArce · 2026-06-26T19:39:18Z

GPU benchmark workflow (`/bench-gpu`)

This PR adds a per-PR GPU prover benchmark that rents a GPU on Vast.ai on demand, runs the prover on it, and posts the result back to the PR — then always tears the box down.

What it does

Rents an RTX 5090 on Vast.ai (hourly) — datacenter-grade host, ≥16 cores, ≥64 GB RAM, ≥64 GB disk, driver ≥ 580, ≤ $1/hr (with retry if the pool is momentarily empty).
Provisions via our Vast template (Rust + LLVM + sysroot + CUDA), checks out the PR head, and runs the drift-free A/B/B/A (ABBA) paired benchmark — the same method as the CPU /bench-abba, but with the CUDA prover path enabled (--features jemalloc-stats,prover/cuda).
Builds the cli at the PR head and at main, interleaves N prove pairs of the headline ethrex 20-transfer block, and reports a paired-t 95% CI + Wilcoxon verdict (PR vs main; + = PR faster).
Always destroys the instance at the end (success, failure, or cancel) so there's no lingering hourly cost.

How to use it

Comment on any PR:

/bench-gpu

/bench-gpu — run with the default 14 pairs (sized to resolve a ~2% change given GPU noise).
/bench-gpu N — run N pairs (clamped to [2, 40]; more pairs → finer resolution, longer/€€).
Restricted to repo members/owners/collaborators. The bot reacts 👀, posts a "started" notice (the run takes ~1 hr: dual CUDA build + the proves), then edits in the result.

It can also be launched manually from the Actions tab via workflow_dispatch (with a pairs input); those runs post the result to the run summary instead of a PR comment.

Reading the result

=== ABBA paired result  (improvement: + = PR faster) ===
  pairs: 14   mean A (PR): 95.49s   mean B (base): 96.38s
  [parametric] paired-t   mean +0.88%   95% CI: [-0.90%, +2.65%]
  [robust]     median +1.11%   Wilcoxon ... p=0.32
  VERDICT: INCONCLUSIVE / REAL IMPROVEMENT / REAL REGRESSION

Trust the verdict when the paired-t CI and Wilcoxon agree. INCONCLUSIVE on a small-looking delta → re-run with more pairs.

Requirements (repo secrets)

VAST_API_KEY — Vast.ai API key.
VAST_TEMPLATE_HASH — the "NVIDIA CUDA Lambda VM" template hash.

Note: /bench-gpu comments only take effect once this is merged to main (GitHub runs issue_comment workflows from the default branch).

# Conflicts: # .github/workflows/benchmark-pr.yml

github-actions · 2026-06-29T15:21:56Z

AI Review

PR #724 · 3 changed files

Findings

Status	Sev	Location	Finding	Found by
confirmed	high	.github/workflows/benchmark-gpu.yml:9	GPU workflow comment claims 'default 10' but the actual default is 14	minimax minimax/MiniMax-M3 glm openrouter/z-ai/glm-5.2 kimi openrouter/moonshotai/kimi-k2.7-code moonmath zro/minimax-m3
confirmed	high	.github/workflows/benchmark-gpu.yml:309	Missing pipefail masks GPU benchmark failures	kimi openrouter/moonshotai/kimi-k2.7-code
confirmed	medium	.github/workflows/benchmark-gpu.yml:113	Unpinned `pip install --upgrade vastai` makes the workflow break silently if Vast CLI ships a breaking change	moonmath zro/minimax-m3
confirmed	medium	.github/workflows/benchmark-gpu.yml:142	Hardcoded rustup toolchain date (nightly-2026-02-01) in onstart check will become stale	nemotron openrouter/nvidia/nemotron-3-ultra-550b-a55b moonmath zro/minimax-m3
confirmed	medium	.github/workflows/benchmark-gpu.yml:372	Teardown does not retry on transient vastai destroy failure — leaks paid GPU instances	minimax minimax/MiniMax-M3
confirmed	medium	scripts/bench_abba.sh:94	BENCH_FEATURES changes do not invalidate cached binaries	kimi openrouter/moonshotai/kimi-k2.7-code
confirmed	low	.github/workflows/benchmark-gpu.yml:352	`listComments` uses default 30-item page, so the GPU comment can be missed and duplicated on a chatty PR	moonmath zro/minimax-m3

Status column reflects the verdict from the verifier: deepseek-verifier (openrouter/deepseek/deepseek-v4-pro).

AI-003: GPU workflow comment claims 'default 10' but the actual default is 14

Status: confirmed
Severity: high
Location: .github/workflows/benchmark-gpu.yml:9
Found by: minimax:minimax/MiniMax-M3, glm:openrouter/z-ai/glm-5.2, kimi:openrouter/moonshotai/kimi-k2.7-code, moonmath:zro/minimax-m3
Verified by: deepseek-verifier:openrouter/deepseek/deepseek-v4-pro
Rejected by: -

Claim

Doc says "N = pair count, default 10", but the workflow's workflow_dispatch input pairs has default: "14" and the issue_comment parser falls back to ${N:-14}.

Evidence

benchmark-gpu.yml line 9 (header): Triggered by a "/bench-gpu [N]" comment on a PR (N = pair count, default 10) or via workflow_dispatch. Line 22: default: "14". Line 73: PAIRS=${N:-14}. Line 77: PAIRS=${DISPATCH_PAIRS:-14}.

Suggested fix

Change line 9 to say "default 14" so the docs match the actual default and the 14 ~ resolves a 2% delta rationale on line 79.

AI-005: Missing pipefail masks GPU benchmark failures

Status: confirmed
Severity: high
Location: .github/workflows/benchmark-gpu.yml:309
Found by: kimi:openrouter/moonshotai/kimi-k2.7-code
Verified by: deepseek-verifier:openrouter/deepseek/deepseek-v4-pro
Rejected by: -

Claim

The Run GPU ABBA benchmark step pipes the SSH command through tee without set -o pipefail. GitHub Actions' default shell is bash -e, so the step succeeds if tee succeeds even when the remote SSH benchmark command fails. This can produce a misleading "success" PR comment with empty or partial results.

Evidence

benchmark-gpu.yml line 309: $SSH "bash -lc \"$REMOTE\"" | tee "$RUNNER_TEMP/abba_out.txt". No set -o pipefail is set in the step, unlike bench-abba.yml line 90 which explicitly uses it.

Suggested fix

Add set -euo pipefail (or at least set -o pipefail) at the top of the Run GPU ABBA benchmark step's run block.

AI-007: Unpinned `pip install --upgrade vastai` makes the workflow break silently if Vast CLI ships a breaking change

Status: confirmed
Severity: medium
Location: .github/workflows/benchmark-gpu.yml:113
Found by: moonmath:zro/minimax-m3
Verified by: deepseek-verifier:openrouter/deepseek/deepseek-v4-pro
Rejected by: -

Claim

The "Install Vast CLI" step runs pip install --quiet --upgrade vastai with no version pin. Any future Vast CLI release that changes the vastai search offers --raw, vastai show instance --raw, vastai create instance --raw, or vastai attach ssh interface (all of which this workflow depends on) will silently break the workflow on the next run, and the failure will surface as a Vast API error rather than a build error, with a Vast box potentially already rented by then.

Evidence

benchmark-gpu.yml line 113: pip install --quiet --upgrade vastai. No version specifier. The workflow parses Vast JSON output (jq on .new_contract, .public_ipaddr, .ports["22/tcp"][0].HostPort, .driver_version, etc.) and calls multiple Vast subcommands — every one of which is a candidate for breakage across releases.

Suggested fix

Pin a specific version, e.g. pip install --quiet vastai==<tested_version>, and bump intentionally after re-testing the workflow.

AI-008: Hardcoded rustup toolchain date (nightly-2026-02-01) in onstart check will become stale

Status: confirmed
Severity: medium
Location: .github/workflows/benchmark-gpu.yml:142
Found by: nemotron:openrouter/nvidia/nemotron-3-ultra-550b-a55b, moonmath:zro/minimax-m3
Verified by: deepseek-verifier:openrouter/deepseek/deepseek-v4-pro
Rejected by: -

Claim

The provisioning check looks for a specific nightly toolchain date that will become outdated, causing false provisioning failures.

Evidence

benchmark-gpu.yml line 262: "$HOME/.cargo/bin/rustup" toolchain list 2>/dev/null | grep -q nightly-2026-02-01. Hard-coded literal. The same template name appears in the GPU benchmark comment on line 54 ("~15-30 min") but not as a configurable knob.

Suggested fix

Make the expected toolchain a workflow env variable (e.g. EXPECTED_TOOLCHAIN: nightly-2026-02-01) so the check can be updated centrally. Even better, drop the toolchain-version check entirely and rely on the === done === marker in /var/log/onstart.log, which is the primary check anyway.

AI-011: Teardown does not retry on transient vastai destroy failure — leaks paid GPU instances

Status: confirmed
Severity: medium
Location: .github/workflows/benchmark-gpu.yml:372
Found by: minimax:minimax/MiniMax-M3
Verified by: deepseek-verifier:openrouter/deepseek/deepseek-v4-pro
Rejected by: -

Claim

If vastai destroy instance $IID --yes fails for any reason (transient network/auth), the workflow emits a ::warning:: and continues. The instance is paid hourly until a human notices.

Evidence

The destroy step (lines 372–380) has no retry loop; it just || swallows the error and prints a warning. Compare with the SSH-attach step (line 193) which loops 12 times.

Suggested fix

Wrap the destroy call in a small retry loop (e.g. 3 attempts with 10s sleep) and, on final failure, also attempt to record the leaked instance id to a workflow artifact so a follow-up cleanup job can sweep it.

AI-012: BENCH_FEATURES changes do not invalidate cached binaries

Status: confirmed
Severity: medium
Location: scripts/bench_abba.sh:94
Found by: kimi:openrouter/moonshotai/kimi-k2.7-code
Verified by: deepseek-verifier:openrouter/deepseek/deepseek-v4-pro
Rejected by: -

Claim

bench_abba.sh caches binaries keyed only by SHA. When the same refs are benchmarked with a different BENCH_FEATURES value, the existing cached binaries are reused and the new features are ignored.

Evidence

Lines 94-102 compare only $WORK/cli_A.sha/$WORK/cli_B.sha against SHA_A/SHA_B. Line 52 introduces BENCH_FEATURES and line 112 uses it in cargo build, but no feature hash is stored or compared.

Suggested fix

Store a feature fingerprint alongside the SHA file (e.g., echo "$BENCH_FEATURES" > "$WORK/$2.features") and compare it before reusing cached binaries.

AI-023: `listComments` uses default 30-item page, so the GPU comment can be missed and duplicated on a chatty PR

Status: confirmed
Severity: low
Location: .github/workflows/benchmark-gpu.yml:352
Found by: moonmath:zro/minimax-m3
Verified by: deepseek-verifier:openrouter/deepseek/deepseek-v4-pro
Rejected by: -

Claim

The "Comment ABBA result on PR" step uses github.rest.issues.listComments(...) without paginate: true, so it only sees the first 30 comments. On a PR with >30 comments where an earlier /bench-gpu run posted the marker comment past page 1, a subsequent run will not find the existing comment and will post a duplicate — leaving stale result comments behind instead of updating the in-place one.

Evidence

benchmark-gpu.yml lines 352-368: const { data: comments } = await github.rest.issues.listComments({...}) with no pagination options; const existing = comments.find(c => c.user.type === 'Bot' && c.body.includes(marker)). Default GitHub REST page size is 30.

Suggested fix

Pass paginate: true to listComments so the call fetches every page before searching for the existing marker comment.

Reviewer Lanes

Lane	Model	Prompt	Status	Findings
glm	openrouter/z-ai/glm-5.2	general	success	3
kimi	openrouter/moonshotai/kimi-k2.7-code	general	success	4
minimax	minimax/MiniMax-M3	general	success	6
moonmath	zro/minimax-m3	general	success	7
nemotron	openrouter/nvidia/nemotron-3-ultra-550b-a55b	general	success	7

Verification Lanes

Lane	Model	Status	Confirmed	Rejected	Uncertain
deepseek-verifier	openrouter/deepseek/deepseek-v4-pro	success	7	10	0

Native Codex and Claude reviews run separately and post their own comments. They are not included in this structured provenance report.

Discarded candidates (10) — rejected by the verifier

BENCH_FEATURES not passed to remote Vast instance - GPU benchmark runs with CPU features only (.github/workflows/benchmark-gpu.yml:170, found by nemotron:openrouter/nvidia/nemotron-3-ultra-550b-a55b) — BENCH_FEATURES is set at the job level (line 42). The step-level env at lines 272-279 does NOT override BENCH_FEATURES. In GitHub Actions, job-level env vars not overridden by a step-level env block remain available in the step's run block. The $BENCH_FEATURES on line 306 is expanded client-side by the runner's shell before being sent via SSH — it's inside double quotes in the REMOTE variable assignment, so it expands to the literal string 'jemalloc-stats,prover/cuda' in the remote command. The variable is correctly passed.
prover/cuda feature does not exist on the cli crate — GPU build will fail (.github/workflows/benchmark-gpu.yml:307, found by minimax:minimax/MiniMax-M3) — The prover dependency in bin/cli/Cargo.toml is named prover (line 9), not optional, and refers to package lambda-vm-prover. In --features prover/cuda on the cargo CLI, the dep_name/feature_name syntax enables the cuda feature of the dependency named prover. This works for non-optional dependencies on the command line. The prover crate has a cuda feature (prover/Cargo.toml:10). The --features prover/cuda passed via bench_abba.sh line 112 will resolve correctly — Cargo maps prover to the dependency and cuda to that dependency's feature.
Chicken-and-egg: GPU workflow requires bench_abba.sh change to be on main before it can be tested (.github/workflows/benchmark-gpu.yml:190, found by nemotron:openrouter/nvidia/nemotron-3-ultra-550b-a55b, kimi:openrouter/moonshotai/kimi-k2.7-code, moonmath:zro/minimax-m3, glm:openrouter/z-ai/glm-5.2) — The workflow explicitly documents this limitation in its own comments (lines 297-299: 'NOTE: requires this PR's bench_abba.sh change... to be on main — i.e. it only takes effect after merge'). The BENCH_FEATURES env variable is still set on the remote command (line 306) — it just won't be consumed by the older bench_abba.sh on main. The benchmark will still run with the old hardcoded features (jemalloc-stats only), so it's not a blocker. This is a known, documented design tradeoff, not an undiscovered bug.
Driver version filter may be overly restrictive (requires >= 580, CUDA 12.8 needs >= 550) (.github/workflows/benchmark-gpu.yml:75, found by nemotron:openrouter/nvidia/nemotron-3-ultra-550b-a55b) — The comment on lines 131-134 explicitly documents the rationale for requiring driver >= 580: 'Older drivers (e.g. 575) lack newer symbols like cuCtxGetDevice_v2 and the GPU path falls back to CPU.' This is an intentional engineering choice to avoid GPU availability issues at runtime, not a mistake about CUDA version requirements. The cuda_max_good>=12.8 filter works at a different abstraction level (Vast API capability) vs. the driver filter (runtime symbol availability).
Instance id persists only to the ephemeral runner tmp — a runner crash leaks the box (.github/workflows/benchmark-gpu.yml:180, found by minimax:minimax/MiniMax-M3, nemotron:openrouter/nvidia/nemotron-3-ultra-550b-a55b, glm:openrouter/z-ai/glm-5.2, moonmath:zro/minimax-m3) — The workflow immediately persists the instance ID to $RUNNER_TEMP (line 180) and the destroy step uses 'if: always()' (line 372). This is the best practice available within GitHub-hosted runner constraints — there is no persistent storage accessible across runner instances. Runner crashes/terminations are an inherent platform limitation, not a code defect. The workflow already mitigates with concurrency control (lines 31-33) and immediate persistence.
workflow_dispatch triggered on main performs a no-op main-vs-main compare while still renting a GPU (.github/workflows/benchmark-gpu.yml:76, found by moonmath:zro/minimax-m3) — On workflow_dispatch triggered on main, BRANCH='main' (line 76), REF_A='origin/main' (line 289), and REF_B='origin/main' (line 307), resulting in a main-vs-main comparison. However, this requires a human to explicitly choose 'main' in the workflow_dispatch UI — the default ref for workflow_dispatch isn't 'main' (it's whatever branch is selected). The workflow is designed for PR branches; dispatching on main is a user error, not a code bug. The code cannot distinguish 'user intentionally tested main' from 'user made a mistake'.
SSH key attachment retry lacks exponential backoff (.github/workflows/benchmark-gpu.yml:113, found by nemotron:openrouter/nvidia/nemotron-3-ultra-550b-a55b) — The SSH attach step (lines 193-199) retries 12 times with fixed 10s intervals. Fixed-interval retries are a standard and widely-used pattern in CI workflows — they are simple, predictable, and work well for transient failures. Exponential backoff is a reasonable improvement suggestion but calling this a 'bug' or an issue is speculative. The fixed delay was chosen to handle the specific timing of Vast's instance initialization (keys take a few seconds to propagate).
Command injection risk: PR_NUM used directly in git fetch refspec (.github/workflows/benchmark-gpu.yml:183, found by nemotron:openrouter/nvidia/nemotron-3-ultra-550b-a55b) — PR_NUM is set from github.event.issue.number (line 63), which is guaranteed by the GitHub API to be an integer. In the git fetch refspec on line 285, this integer is interpolated into a refspec path — there is no shell injection vector from a numeric value. GitHub's event payload is trusted data in the GitHub Actions security model. Defense-in-depth validation is reasonable but the claim of 'command injection risk' is unfounded for this data source.
Comment-updater uses a generic 'GPU Benchmark (ABBA)' marker — collides across runs and across bots (.github/workflows/benchmark-gpu.yml:356, found by minimax:minimax/MiniMax-M3) — The marker 'GPU Benchmark (ABBA)' on line 356 is specific. The existing CPU ABBA workflow (bench-abba.yml line 109) uses a different comment header: '## ABBA tiebreaker —' which does not contain the GPU marker string. There is no current collision. The finding is speculative ('it is plausible a future cross-link could collide') and assumes a future change where another bot comment happens to mention the GPU benchmark marker — this is too hypothetical to qualify as a real issue.
BENCH_FEATURES default in bench_abba.sh is jemalloc-stats, but cli also requires the heap feature to print 'Peak heap' (scripts/bench_abba.sh:52, found by minimax:minimax/MiniMax-M3) — The finding itself states 'this is OK' and 'The ABBA stats step does not actually use the heap value.' It explicitly acknowledges there is no actual bug — it only warns about a potential 'maintenance hazard' if future code changes read heap from prove output. The bench_abba.sh script (line 134) only parses 'Proving time:', not 'Peak heap:'. The jemalloc-stats feature is included for the ABBA run but its output is not consumed. This is not an actual issue in the current code.

Raw lane outputs, candidates, final issues, and model metrics are uploaded as workflow artifacts.

JuArce · 2026-06-29T15:41:33Z

Addressed AI review findings

All 7 confirmed findings are resolved.

#	Sev	Finding	Fix
AI-003	high	Header said "default 10" but default is 14	Header now says `default 14`
AI-005	high	Missing `pipefail` could mask a failed GPU bench as success	`set -o pipefail` before the `tee` in the bench step
AI-007	med	Unpinned `pip install vastai`	Pinned to immutable commit `28494d92…` (git install) + moved `VAST_API_KEY` out of the install step so install-time code can't read it
AI-008	med	Hardcoded `nightly-2026-02-01` provisioning check goes stale	Dropped the toolchain-date check; readiness now relies on the `=== done ===` onstart marker plus cargo/sysroot/repo artifacts
AI-011	med	Teardown didn't retry transient `vastai destroy` failures	Destroy now retries 3× (10s apart); on final failure it warns with the instance label for manual cleanup
AI-012	med	`BENCH_FEATURES` change didn't invalidate the cached binaries	`bench_abba.sh` cache marker now stores `"<sha> <features>"`, so a different feature set forces a rebuild
AI-023	low	`listComments` (30-item page) could duplicate the result comment on chatty PRs	Both the "started" and result comment steps use `github.paginate(..., per_page: 100)`

The 10 candidates the verifier rejected were spot-checked and need no action (e.g. prover/cuda does resolve on the prover dep; BENCH_FEATURES is passed correctly via the job-level env expanded into the SSH command; the runner-crash/chicken-and-egg items are documented platform tradeoffs).

Lint: actionlint + shellcheck clean.

Review follow-ups on the GPU benchmark workflow: - Teardown: fall back to destroying by the unique RUN_LABEL when no instance id was recorded. The id file is written only after `create` succeeds and its JSON parses, so a box created in that window (concurrency cancel, or a parse failure) could otherwise leak and bill indefinitely. - Cap pairs at 32 (was 40) and round odd requests up to even (the AB/BA design wants even N); raise the job timeout to 210 min so a worst-case 32-pair run (64 proves + slow provisioning + dual CUDA build) fits without timing out after the expensive build. - Fix the CUDARC_PIN comment: the boxes are ~CUDA 12.8 (matching cuda-12080 and the cuda_max_good>=12.8 offer floor), not 13.0; tie it to the MIN_DRIVER guard as the opposite end of the same compatibility window. - Log only the needed fields of create.json instead of the full --raw response, so an unexpected sensitive field can't land in the run log. - Validate the workflow_dispatch branch name before it is interpolated into the remote `bash -lc` command. - Move the run-summary write into an always() step so workflow_dispatch failures are visible in the Actions summary rather than only the raw step log.

JuArce added 2 commits June 26, 2026 16:07

ci: add gpu benchmarks

158278c

add retries

a9d848a

JuArce self-assigned this Jun 26, 2026

JuArce added 12 commits June 26, 2026 16:40

Merge branch 'main' into gpu_benchmarks

49faa19

# Conflicts: # .github/workflows/benchmark-pr.yml

ci: use ABBA method to run the benchmark

5ef0fe2

ci: use 64gb ram

337bce0

ci: remove datacenter flag

3fce349

fix: units for RAM

76a137a

fix: min driver and ssh key

09cf1e5

fix: rebuild binaries

f645901

fix: use correct sh

963966c

fix: use 64gb ram

f09578e

fix: use expensive machine with $1 cap

defc4da

fix: remove temporary code

1e0eb39

Merge branch 'main' into gpu_benchmarks

904549d

JuArce marked this pull request as ready for review June 29, 2026 15:09

JuArce requested a review from MauroToscano June 29, 2026 15:10

JuArce added the ai-review Trigger the AI review label Jun 29, 2026

claude Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread .github/workflows/benchmark-gpu.yml

claude Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread .github/workflows/benchmark-gpu.yml Outdated

claude Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread .github/workflows/benchmark-gpu.yml Outdated

claude Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread .github/workflows/benchmark-gpu.yml Outdated

This comment was marked as resolved.

Sign in to view

JuArce added 4 commits June 29, 2026 12:42

fix: apply code review

b075c2a

Merge remote-tracking branch 'origin/gpu_benchmarks' into gpu_benchmarks

1ed4895

test: run on push

d85fb37

fix: cuda

ecf615d

remove test setup

6e6a610

MauroToscano mentioned this pull request Jun 29, 2026

ci(bench-gpu): harden teardown, cap pairs at 32, fix CUDA comment #736

Merged

MauroToscano approved these changes Jun 29, 2026

View reviewed changes

MauroToscano enabled auto-merge June 29, 2026 19:24

MauroToscano added this pull request to the merge queue Jun 29, 2026

Merged via the queue into main with commit ae858b8 Jun 29, 2026
12 checks passed

MauroToscano deleted the gpu_benchmarks branch June 29, 2026 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: add gpu benchmarks#724

ci: add gpu benchmarks#724
MauroToscano merged 20 commits into
mainfrom
gpu_benchmarks

JuArce commented Jun 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

github-actions Bot commented Jun 29, 2026

Uh oh!

JuArce commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

JuArce commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GPU benchmark workflow (/bench-gpu)

What it does

How to use it

Reading the result

Requirements (repo secrets)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

github-actions Bot commented Jun 29, 2026

AI Review

Findings

Reviewer Lanes

Verification Lanes

Uh oh!

JuArce commented Jun 29, 2026

Addressed AI review findings

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JuArce commented Jun 26, 2026 •

edited

Loading

GPU benchmark workflow (`/bench-gpu`)