feat(bench): h2c expansion, mid-size payloads, resources/NIC-bound, full-grid weekly by FumingPower3925 · Pull Request #197 · goceleris/probatorium

FumingPower3925 · 2026-06-17T14:59:21Z

Prepares the next bench round. All changes are harness-only (no celeris/loadgen edits). Everything compiles and the full Go suite is green; every polyglot adapter was built and h2c-smoke-tested locally (curl --http2-prior-knowledge).

1. h2c coverage — +7 framework columns

The h2 scenarios were N/A for every non-Go framework. Added prior-knowledge h2c (-h2, h2c-noupg so they run only the h2 rows):

Framework	h2c via	verified
axum, ntex, hyper (Rust)	hyper `http2` builder / ntex `HttpService::h2`	HTTP/2 200 cleartext
aspnet (Kestrel)	`HttpProtocols.Http2`	✅
fastapi (Python)	hypercorn (uvicorn can't do h2)	✅
hono, elysia (Bun)	`node:http2` cleartext bridge to `app.fetch`	✅

drogon got no h2 column — drogon 1.9.x has zero server-side HTTP/2 (verified against its headers); its -engine h2c fails fast and its N/A is correct, not a gap. Native h2 columns share their h1 sibling's build via NativeBinary.BinName; every native adapter gained -engine parsing and resolveBenchColumns now passes -engine h1/-engine h2c.

2. The 64k problem

Mid-size payloads get/post-json-8k/16k — under the 20G ceiling so fast servers stay CPU-bound and differentiate. Endpoints + byte-identical payloads (8286 / 16463 bytes) added to all 19 adapters.
NIC-bound flag + CPU efficiency (schema v5.5) — BuildDocument flags cells whose bandwidth sits at the fabric line rate while the loadgen has headroom, and a new markdown section ranks those cells by Gbps-per-CPU% instead of raw RPS.

3. Resources — the real bug

Server CPU/RSS (observer.sqlite + cpu.log) was captured and fetched, but the merge→aggregate→document path silently dropped it (CellResult/CellAggregate had no Resources field). Threaded end-to-end; benchmarks[].resources now populates — exactly what makes the NIC-bound 64k cells interpretable.

4. Weekly = full grid

HeadlineWeekly() now runs the full grid (Globs: ["*/*"]), same coverage as Full(), differing only by per-cell window:

	grid	window	projected (1 arch)
headline (weekly)	full (~799 cells)	60s/15s	~21.6h — fits 24h
full (exhaustive)	full (~799 cells)	90s/20s	~30h — manual, `BENCH_BUDGET=70h`

Both profiles set Arches: 1 (amd64-only today; BenchTier overrides per BENCH_TARGET). Rated sweep stays curated (expanding it would blow the budget). Removed HeadlineServers/HeadlineScenarios/headlineGlobs().

Budget

FullRealizedCells 520→820 (realized grid ~799). Full() projects ~30h, headline ~21.6h, both under the 70h manual budget.

Verification

Full Go suite + mage-tagged tests green; gofmt/vet clean.
New tests: resources flow (aggregate→document), NIC-bound flag, native h2c column gating, native binary-sharing, weekly==full coverage.
Polyglot adapters built locally (Rust cargo, dotnet dotnet build, Bun conformance, Python venv) with h2c confirmed by curl.
Weekly change adversarially verified (budget arithmetic exact, coverage 799 cells weekly==full, no regressions).

Not yet run on the cluster — staged for the next round.

…ull-grid weekly Prepares the next bench round. Four changes, all harness-only (no celeris/loadgen edits). 1. h2c coverage (+7 framework columns). Adds prior-knowledge h2c-noupg columns that run only the H2 scenarios: axum-h2/ntex-h2/hyper-h2 (Rust), aspnet-h2 (Kestrel), fastapi-h2 (hypercorn), hono-h2/elysia-h2 (node:http2 bridge). All verified locally with curl --http2-prior-knowledge. drogon gets NO h2 column — drogon 1.9.x has no server-side HTTP/2 (its -engine h2c fails fast). Native h2 columns share their h1 sibling's build via NativeBinary.BinName; every native adapter gained -engine parsing and resolveBenchColumns now passes -engine h1/h2c. 2. 64k network-bound fix. (a) Mid-size payload rows get/post-json-8k/16k stay under the 20G fabric ceiling so fast servers stay CPU-bound and differentiate (byte-identical across all 19 adapters; 8286/16463 bytes). (b) schema v5.5 adds a per-cell network_bound flag (BuildDocument flags cells at >=80% of fabric line rate with loadgen headroom) + a markdown "Network-bound cells ranked by CPU efficiency" section. 3. Resources. Server CPU/RSS (observer.sqlite + cpu.log) was captured + fetched but the merge->Aggregate->BuildDocument path dropped it (CellResult/ CellAggregate had no Resources field). Threaded end-to-end; benchmarks[]. resources now populates — the lever that makes the NIC-bound 64k cells interpretable. 4. Weekly = full grid. HeadlineWeekly() now runs the full grid (Globs "*/*"), same coverage as Full(), differing only by per-cell window (60s/15s ~21.6h fits 24h single-arch vs 90s/20s). Both profiles set Arches:1 (amd64-only; BenchTier overrides per BENCH_TARGET). Rated sweep stays curated. Removed HeadlineServers/HeadlineScenarios/headlineGlobs. Budget: FullRealizedCells 520->820 (realized grid ~799). Full Go suite green; all polyglot adapters build + h2c-smoke-tested locally.

…lumns + h2o/lithium fixes Adapters (all contract-verified on the cluster, amd64): - rust/actix, cpp/lithium, c/libreactor+h2o, zig/httpzig, bun/bunraw, python/starlette, node/{uws,fastify,express}, java/{vertx,netty}, go/nbio - new ansible roles: node (Node 24 LTS), java (Temurin 21 + Maven); cpp +Boost; c role extended to build libh2o; deploy.yml + build_native_competitor.yml node/java wiring Registry / matrix: - 5 new celeris engine columns (adaptive x2, epoll-async x2, iouring-h1-sync) - 256c / 512c concurrency points; removed auto-mix-111 (synthetic mixed-proto) - removed ntex-h2: ntex's h2c DNFs under the loadgen's concurrent dial pool (works single-conn; guaranteed-DNF cell, same policy as drogon-h2) Fixes found via the full smoke sweep: - h2o: single-evloop -> fork-per-core + SO_REUSEPORT (190k -> 1.3M RPS) - vertx: pin netty 4.2.0 epoll classes (EpollIoHandler) for vert.x 5.0.0 - libreactor: build from master (the v1.0.1 tag's dynamic.h dep is broken) celeris adapter pinned to milestone/v1.5.3 (v1.5.3-0.20260619092256-90839abb91cb).

…y schedule The full grid is 1257 saturation cells (not the stale 820 pin), and rated mode runs on EVERY cell (run_bench_cell.yml gates -rated on the global bench_rated, ignoring the curated RatedServers×RatedScenarios subset) — 4 closed-loop passes × ~50s = ~196s/cell, 63% of the run. Together that's ~80–100h, not the ~21.6h the projection (820 cells + 24 rated cells) claimed. New "fast" profile = the FULL grid (every server × every scenario, capability- gated, */*) in SATURATION ONLY (rated off) at 35s/10s → 1257 cells × 62s = 21h39m, fits 24h with 2h21m headroom (FitWithin asserts it). RatedPasses=0 makes BenchTier skip the -rated flag entirely. ForProfile default + unknown → fast (full coverage, never a curated subset). full/headline kept for rare rated deep-dives. benchmark-tier.yml: default dispatch profile=fast; re-activate the weekly cadence (cron Wed 04:00 UTC) running fast; BENCH_BUDGET asserts 24h for fast, raises to 70h only for an explicit full dispatch. Also removes the ntex-h2 column (ntex h2c DNFs under the loadgen's concurrent dial pool; ntex-h1 unaffected).

…/loadgen hosts) The matrix job's ansible control loop + result aggregation must not share a host with the SUT (msa2-server) or loadgen (msa2-client) — otherwise it steals CPU/IO from the measurement. Pin runs-on to the otherwise-idle msr1 (safe as a conductor; only unusable as a bench TARGET under NIC load, celeris#312). Applies to future dispatches incl. the weekly cron; the in-flight run is unaffected.

… respawn Three fixes from the full-run failure analysis (httpzig was the sole publish blocker): - httpzig: build ReleaseSafe (not ReleaseFast) at BOTH sites (servers.go + the httpzig nativeBuildSpec in mage_cluster.go). http.zig's NonBlocking worker hits an `unreachable` under connection churn (churn-close); ReleaseFast makes it a silent process-killing UB, ReleaseSafe makes it a recoverable panic. - lithium: pin the upstream SHA + sed-patch the vendored single-header's input buffer from 50KiB -> 2MiB (with a FATAL_ERROR assert if the literal drifts). post-64k/post-1m bodies overflowed the 50KiB cap -> zero successful requests. - harness: wrap the SUT launch in a bounded (max 5) respawn supervisor so a crash mid-column self-heals in ~0.5s instead of cascading into N server-down cells (httpzig lost 16 cells this way). Teardown kills the supervisor (fixed-path pid) FIRST so the loop can't re-bind the port. Still pending for a fully clean re-run: loadgen-side drogon churn-backoff + h2 large-body frame/window handling; the audit confirmed the other 49 servers are crash/dead-SUT-safe.

…ol + h1 read-EOF backoff) v1.4.9 fixes the loadgen-side DNFs from the last full matrix run: drogon churn-close (h1 read-EOF backoff), post-64k-h2 across the 5 h2 columns (frame-size + send-window flow control), and fastapi-h2 under hypercorn (re-dial connections the server GOAWAYs mid-cell — was a 1.1B-error hot loop). Pulls x/net 0.56.0 transitively.

… for targeted re-runs A column that DNF'd in a full run previously forced re-running the entire ~21.6h grid. Expose three workflow_dispatch inputs so a subset can be re-run cheaply: - competitors -> BENCH_COMPETITORS (which columns to bench) - deploy_competitors -> DEPLOY_COMPETITORS (which binaries to build; h2 columns reuse their h1 sibling's binary, so the deploy set differs from the bench set) - publish=false -> BENCH_PUBLISH=0 (data-only; no docs push / no integrity gate) for a re-run you compose + publish manually All default to the full-grid/publishing behaviour, so the weekly schedule and a no-arg dispatch are unchanged.

… inherits it A subset re-run (run 27901600674) failed: the explicit Deploy step staged the right binaries, but BenchTier found no manifest (the prior run's Cleanup wiped it) and ran its internal auto-deploy, which falls back to BENCH_COMPETITORS for DEPLOY_COMPETITORS when the latter is unset in its env. BENCH_COMPETITORS is COLUMN slugs (aspnet-h2, ...) and Deploy only accepts MODULE slugs (aspnet, ...), so it aborted: '"aspnet-h2" resolves to neither a Go competitor nor a native one'. Scoping DEPLOY_COMPETITORS to the Deploy step left the BenchTier step's env without it. Move it to the job env so both the explicit Deploy and the auto-deploy use the module names.

… split_args) The mid-column respawn change added a comment 'the prior column's respawn supervisor' to the free-form (shell: |) stop task. ansible's ModuleArgsParser runs split_args over the ENTIRE free-form string to extract trailing k=v params, and it does not treat a leading # as a shell comment — so the lone apostrophe in column's read as an unterminated single-quote and aborted task loading for the whole file: 'failed at splitting arguments, either an unbalanced jinja2 block or quotes: bp={{ bench_port | default(8080) }}'. The bench step exited 4 before running a single cell (run 27902033388). Reworded the comment with no apostrophes/unpaired quotes and added an NB warning. Verified: all 93 free-form shell/command tasks across ansible/ pass split_args, and every shell body passes bash -n.

… go.mod tidy) Make feat/h2c-64k-resources pass the Lint + Test merge gates: - .github/actionlint.yaml: declare the 'msr1' self-hosted-runner label so benchmark-tier.yml's 'runs-on: [self-hosted, celeris-cluster, msr1]' validates (was the sole Lint failure). - cmd/runner/main_test.go: two stale assertions vs intentional changes — TestParseArgs_Defaults expected Duration=120s but DefaultConfig is now 45s (rated-off fast profile, 2d40f6d); TestNativeH2cColumnsAreH2cOnly listed 'ntex-h2' which was deliberately removed from the registry. - servers/{gin,hertz,iris}/go.mod+go.sum: go mod tidy (x/net 0.55->0.56 drift) so their Test jobs stop failing the tidy check.

- root: bradfitz/gomemcache -> latest; modernc.org/sqlite v1.52.0 -> v1.53.0 (pulls modernc.org/libc v1.73.4). - servers/{chi,fasthttp,gin}: bradfitz/gomemcache -> latest. - CI: actions/checkout v6 -> v7 across all six workflows (supersedes dependabot #198). The goceleris/probatorium pseudo-version 'updates' are replace-backed local placeholders (untouched); the validation/refapp celeris pins are version-syncs handled at the v1.5.3 re-pin. root build+vet+runner tests green; servers build; actionlint clean.

celeris v1.5.3 is now released, so replace the smoke-only pseudo-version and the stale 1.4.15 conformance pins with the clean tag: - servers/celeris: v1.5.3-0.20260619... pseudo-version -> v1.5.3. - validation/refapp/*: celeris 1.4.15 -> v1.5.3 (all 8 SDD-conformance refapps; observability also middleware/metrics+otel 1.4.15 -> v1.5.3). All rebuild clean against the released v1.5.3 (+ middleware v1.5.3 tags).

FumingPower3925 added 12 commits June 17, 2026 16:58

FumingPower3925 merged commit 7ed0f6a into main Jun 21, 2026
11 checks passed

FumingPower3925 deleted the feat/h2c-64k-resources branch June 21, 2026 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bench): h2c expansion, mid-size payloads, resources/NIC-bound, full-grid weekly#197

feat(bench): h2c expansion, mid-size payloads, resources/NIC-bound, full-grid weekly#197
FumingPower3925 merged 12 commits into
mainfrom
feat/h2c-64k-resources

FumingPower3925 commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FumingPower3925 commented Jun 17, 2026

1. h2c coverage — +7 framework columns

2. The 64k problem

3. Resources — the real bug

4. Weekly = full grid

Budget

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant