Skip to content

feat(bench): h2c expansion, mid-size payloads, resources/NIC-bound, full-grid weekly#197

Merged
FumingPower3925 merged 12 commits into
mainfrom
feat/h2c-64k-resources
Jun 21, 2026
Merged

feat(bench): h2c expansion, mid-size payloads, resources/NIC-bound, full-grid weekly#197
FumingPower3925 merged 12 commits into
mainfrom
feat/h2c-64k-resources

Conversation

@FumingPower3925

Copy link
Copy Markdown
Contributor

Prepares the next bench round. All changes are harness-only (no celeris/loadgen edits). Everything compiles and the full Go suite is green; every polyglot adapter was built and h2c-smoke-tested locally (curl --http2-prior-knowledge).

1. h2c coverage — +7 framework columns

The h2 scenarios were N/A for every non-Go framework. Added prior-knowledge h2c (-h2, h2c-noupg so they run only the h2 rows):

Framework h2c via verified
axum, ntex, hyper (Rust) hyper http2 builder / ntex HttpService::h2 HTTP/2 200 cleartext
aspnet (Kestrel) HttpProtocols.Http2
fastapi (Python) hypercorn (uvicorn can't do h2)
hono, elysia (Bun) node:http2 cleartext bridge to app.fetch

drogon got no h2 column — drogon 1.9.x has zero server-side HTTP/2 (verified against its headers); its -engine h2c fails fast and its N/A is correct, not a gap. Native h2 columns share their h1 sibling's build via NativeBinary.BinName; every native adapter gained -engine parsing and resolveBenchColumns now passes -engine h1/-engine h2c.

2. The 64k problem

  • Mid-size payloads get/post-json-8k/16k — under the 20G ceiling so fast servers stay CPU-bound and differentiate. Endpoints + byte-identical payloads (8286 / 16463 bytes) added to all 19 adapters.
  • NIC-bound flag + CPU efficiency (schema v5.5) — BuildDocument flags cells whose bandwidth sits at the fabric line rate while the loadgen has headroom, and a new markdown section ranks those cells by Gbps-per-CPU% instead of raw RPS.

3. Resources — the real bug

Server CPU/RSS (observer.sqlite + cpu.log) was captured and fetched, but the merge→aggregate→document path silently dropped it (CellResult/CellAggregate had no Resources field). Threaded end-to-end; benchmarks[].resources now populates — exactly what makes the NIC-bound 64k cells interpretable.

4. Weekly = full grid

HeadlineWeekly() now runs the full grid (Globs: ["*/*"]), same coverage as Full(), differing only by per-cell window:

grid window projected (1 arch)
headline (weekly) full (~799 cells) 60s/15s ~21.6h — fits 24h
full (exhaustive) full (~799 cells) 90s/20s ~30h — manual, BENCH_BUDGET=70h

Both profiles set Arches: 1 (amd64-only today; BenchTier overrides per BENCH_TARGET). Rated sweep stays curated (expanding it would blow the budget). Removed HeadlineServers/HeadlineScenarios/headlineGlobs().

Budget

FullRealizedCells 520→820 (realized grid ~799). Full() projects ~30h, headline ~21.6h, both under the 70h manual budget.

Verification

  • Full Go suite + mage-tagged tests green; gofmt/vet clean.
  • New tests: resources flow (aggregate→document), NIC-bound flag, native h2c column gating, native binary-sharing, weekly==full coverage.
  • Polyglot adapters built locally (Rust cargo, dotnet dotnet build, Bun conformance, Python venv) with h2c confirmed by curl.
  • Weekly change adversarially verified (budget arithmetic exact, coverage 799 cells weekly==full, no regressions).

Not yet run on the cluster — staged for the next round.

…ull-grid weekly

Prepares the next bench round. Four changes, all harness-only (no celeris/loadgen edits).

1. h2c coverage (+7 framework columns). Adds prior-knowledge h2c-noupg columns
   that run only the H2 scenarios: axum-h2/ntex-h2/hyper-h2 (Rust), aspnet-h2
   (Kestrel), fastapi-h2 (hypercorn), hono-h2/elysia-h2 (node:http2 bridge).
   All verified locally with curl --http2-prior-knowledge. drogon gets NO h2
   column — drogon 1.9.x has no server-side HTTP/2 (its -engine h2c fails fast).
   Native h2 columns share their h1 sibling's build via NativeBinary.BinName;
   every native adapter gained -engine parsing and resolveBenchColumns now
   passes -engine h1/h2c.

2. 64k network-bound fix. (a) Mid-size payload rows get/post-json-8k/16k stay
   under the 20G fabric ceiling so fast servers stay CPU-bound and differentiate
   (byte-identical across all 19 adapters; 8286/16463 bytes). (b) schema v5.5
   adds a per-cell network_bound flag (BuildDocument flags cells at >=80% of
   fabric line rate with loadgen headroom) + a markdown "Network-bound cells
   ranked by CPU efficiency" section.

3. Resources. Server CPU/RSS (observer.sqlite + cpu.log) was captured + fetched
   but the merge->Aggregate->BuildDocument path dropped it (CellResult/
   CellAggregate had no Resources field). Threaded end-to-end; benchmarks[].
   resources now populates — the lever that makes the NIC-bound 64k cells
   interpretable.

4. Weekly = full grid. HeadlineWeekly() now runs the full grid (Globs "*/*"),
   same coverage as Full(), differing only by per-cell window (60s/15s ~21.6h
   fits 24h single-arch vs 90s/20s). Both profiles set Arches:1 (amd64-only;
   BenchTier overrides per BENCH_TARGET). Rated sweep stays curated. Removed
   HeadlineServers/HeadlineScenarios/headlineGlobs.

Budget: FullRealizedCells 520->820 (realized grid ~799). Full Go suite green;
all polyglot adapters build + h2c-smoke-tested locally.
…lumns + h2o/lithium fixes

Adapters (all contract-verified on the cluster, amd64):
- rust/actix, cpp/lithium, c/libreactor+h2o, zig/httpzig, bun/bunraw,
  python/starlette, node/{uws,fastify,express}, java/{vertx,netty}, go/nbio
- new ansible roles: node (Node 24 LTS), java (Temurin 21 + Maven); cpp +Boost;
  c role extended to build libh2o; deploy.yml + build_native_competitor.yml node/java wiring

Registry / matrix:
- 5 new celeris engine columns (adaptive x2, epoll-async x2, iouring-h1-sync)
- 256c / 512c concurrency points; removed auto-mix-111 (synthetic mixed-proto)
- removed ntex-h2: ntex's h2c DNFs under the loadgen's concurrent dial pool
  (works single-conn; guaranteed-DNF cell, same policy as drogon-h2)

Fixes found via the full smoke sweep:
- h2o: single-evloop -> fork-per-core + SO_REUSEPORT (190k -> 1.3M RPS)
- vertx: pin netty 4.2.0 epoll classes (EpollIoHandler) for vert.x 5.0.0
- libreactor: build from master (the v1.0.1 tag's dynamic.h dep is broken)

celeris adapter pinned to milestone/v1.5.3 (v1.5.3-0.20260619092256-90839abb91cb).
…y schedule

The full grid is 1257 saturation cells (not the stale 820 pin), and rated mode
runs on EVERY cell (run_bench_cell.yml gates -rated on the global bench_rated,
ignoring the curated RatedServers×RatedScenarios subset) — 4 closed-loop passes
× ~50s = ~196s/cell, 63% of the run. Together that's ~80–100h, not the ~21.6h
the projection (820 cells + 24 rated cells) claimed.

New "fast" profile = the FULL grid (every server × every scenario, capability-
gated, */*) in SATURATION ONLY (rated off) at 35s/10s → 1257 cells × 62s = 21h39m,
fits 24h with 2h21m headroom (FitWithin asserts it). RatedPasses=0 makes BenchTier
skip the -rated flag entirely. ForProfile default + unknown → fast (full coverage,
never a curated subset). full/headline kept for rare rated deep-dives.

benchmark-tier.yml: default dispatch profile=fast; re-activate the weekly cadence
(cron Wed 04:00 UTC) running fast; BENCH_BUDGET asserts 24h for fast, raises to 70h
only for an explicit full dispatch.

Also removes the ntex-h2 column (ntex h2c DNFs under the loadgen's concurrent dial
pool; ntex-h1 unaffected).
…/loadgen hosts)

The matrix job's ansible control loop + result aggregation must not share a host
with the SUT (msa2-server) or loadgen (msa2-client) — otherwise it steals CPU/IO
from the measurement. Pin runs-on to the otherwise-idle msr1 (safe as a conductor;
only unusable as a bench TARGET under NIC load, celeris#312). Applies to future
dispatches incl. the weekly cron; the in-flight run is unaffected.
… respawn

Three fixes from the full-run failure analysis (httpzig was the sole publish blocker):

- httpzig: build ReleaseSafe (not ReleaseFast) at BOTH sites (servers.go + the
  httpzig nativeBuildSpec in mage_cluster.go). http.zig's NonBlocking worker hits
  an `unreachable` under connection churn (churn-close); ReleaseFast makes it a
  silent process-killing UB, ReleaseSafe makes it a recoverable panic.
- lithium: pin the upstream SHA + sed-patch the vendored single-header's input
  buffer from 50KiB -> 2MiB (with a FATAL_ERROR assert if the literal drifts).
  post-64k/post-1m bodies overflowed the 50KiB cap -> zero successful requests.
- harness: wrap the SUT launch in a bounded (max 5) respawn supervisor so a crash
  mid-column self-heals in ~0.5s instead of cascading into N server-down cells
  (httpzig lost 16 cells this way). Teardown kills the supervisor (fixed-path pid)
  FIRST so the loop can't re-bind the port.

Still pending for a fully clean re-run: loadgen-side drogon churn-backoff +
h2 large-body frame/window handling; the audit confirmed the other 49 servers
are crash/dead-SUT-safe.
…ol + h1 read-EOF backoff)

v1.4.9 fixes the loadgen-side DNFs from the last full matrix run:
drogon churn-close (h1 read-EOF backoff), post-64k-h2 across the 5 h2
columns (frame-size + send-window flow control), and fastapi-h2 under
hypercorn (re-dial connections the server GOAWAYs mid-cell — was a
1.1B-error hot loop). Pulls x/net 0.56.0 transitively.
… for targeted re-runs

A column that DNF'd in a full run previously forced re-running the entire
~21.6h grid. Expose three workflow_dispatch inputs so a subset can be
re-run cheaply:
  - competitors        -> BENCH_COMPETITORS (which columns to bench)
  - deploy_competitors -> DEPLOY_COMPETITORS (which binaries to build; h2
                          columns reuse their h1 sibling's binary, so the
                          deploy set differs from the bench set)
  - publish=false      -> BENCH_PUBLISH=0 (data-only; no docs push / no
                          integrity gate) for a re-run you compose + publish
                          manually
All default to the full-grid/publishing behaviour, so the weekly schedule
and a no-arg dispatch are unchanged.
… inherits it

A subset re-run (run 27901600674) failed: the explicit Deploy step staged
the right binaries, but BenchTier found no manifest (the prior run's Cleanup
wiped it) and ran its internal auto-deploy, which falls back to
BENCH_COMPETITORS for DEPLOY_COMPETITORS when the latter is unset in its env.
BENCH_COMPETITORS is COLUMN slugs (aspnet-h2, ...) and Deploy only accepts
MODULE slugs (aspnet, ...), so it aborted: '"aspnet-h2" resolves to neither
a Go competitor nor a native one'. Scoping DEPLOY_COMPETITORS to the Deploy
step left the BenchTier step's env without it. Move it to the job env so
both the explicit Deploy and the auto-deploy use the module names.
… split_args)

The mid-column respawn change added a comment 'the prior column's respawn
supervisor' to the free-form (shell: |) stop task. ansible's ModuleArgsParser
runs split_args over the ENTIRE free-form string to extract trailing k=v
params, and it does not treat a leading # as a shell comment — so the lone
apostrophe in column's read as an unterminated single-quote and aborted task
loading for the whole file: 'failed at splitting arguments, either an
unbalanced jinja2 block or quotes: bp={{ bench_port | default(8080) }}'. The
bench step exited 4 before running a single cell (run 27902033388). Reworded
the comment with no apostrophes/unpaired quotes and added an NB warning.
Verified: all 93 free-form shell/command tasks across ansible/ pass
split_args, and every shell body passes bash -n.
… go.mod tidy)

Make feat/h2c-64k-resources pass the Lint + Test merge gates:
- .github/actionlint.yaml: declare the 'msr1' self-hosted-runner label so
  benchmark-tier.yml's 'runs-on: [self-hosted, celeris-cluster, msr1]'
  validates (was the sole Lint failure).
- cmd/runner/main_test.go: two stale assertions vs intentional changes —
  TestParseArgs_Defaults expected Duration=120s but DefaultConfig is now 45s
  (rated-off fast profile, 2d40f6d); TestNativeH2cColumnsAreH2cOnly listed
  'ntex-h2' which was deliberately removed from the registry.
- servers/{gin,hertz,iris}/go.mod+go.sum: go mod tidy (x/net 0.55->0.56 drift)
  so their Test jobs stop failing the tidy check.
- root: bradfitz/gomemcache -> latest; modernc.org/sqlite v1.52.0 -> v1.53.0
  (pulls modernc.org/libc v1.73.4).
- servers/{chi,fasthttp,gin}: bradfitz/gomemcache -> latest.
- CI: actions/checkout v6 -> v7 across all six workflows (supersedes
  dependabot #198).
The goceleris/probatorium pseudo-version 'updates' are replace-backed
local placeholders (untouched); the validation/refapp celeris pins are
version-syncs handled at the v1.5.3 re-pin. root build+vet+runner tests
green; servers build; actionlint clean.
celeris v1.5.3 is now released, so replace the smoke-only pseudo-version
and the stale 1.4.15 conformance pins with the clean tag:
- servers/celeris: v1.5.3-0.20260619... pseudo-version -> v1.5.3.
- validation/refapp/*: celeris 1.4.15 -> v1.5.3 (all 8 SDD-conformance
  refapps; observability also middleware/metrics+otel 1.4.15 -> v1.5.3).
All rebuild clean against the released v1.5.3 (+ middleware v1.5.3 tags).
@FumingPower3925 FumingPower3925 merged commit 7ed0f6a into main Jun 21, 2026
11 checks passed
@FumingPower3925 FumingPower3925 deleted the feat/h2c-64k-resources branch June 21, 2026 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant