feat(bench): h2c expansion, mid-size payloads, resources/NIC-bound, full-grid weekly#197
Merged
Conversation
…ull-grid weekly Prepares the next bench round. Four changes, all harness-only (no celeris/loadgen edits). 1. h2c coverage (+7 framework columns). Adds prior-knowledge h2c-noupg columns that run only the H2 scenarios: axum-h2/ntex-h2/hyper-h2 (Rust), aspnet-h2 (Kestrel), fastapi-h2 (hypercorn), hono-h2/elysia-h2 (node:http2 bridge). All verified locally with curl --http2-prior-knowledge. drogon gets NO h2 column — drogon 1.9.x has no server-side HTTP/2 (its -engine h2c fails fast). Native h2 columns share their h1 sibling's build via NativeBinary.BinName; every native adapter gained -engine parsing and resolveBenchColumns now passes -engine h1/h2c. 2. 64k network-bound fix. (a) Mid-size payload rows get/post-json-8k/16k stay under the 20G fabric ceiling so fast servers stay CPU-bound and differentiate (byte-identical across all 19 adapters; 8286/16463 bytes). (b) schema v5.5 adds a per-cell network_bound flag (BuildDocument flags cells at >=80% of fabric line rate with loadgen headroom) + a markdown "Network-bound cells ranked by CPU efficiency" section. 3. Resources. Server CPU/RSS (observer.sqlite + cpu.log) was captured + fetched but the merge->Aggregate->BuildDocument path dropped it (CellResult/ CellAggregate had no Resources field). Threaded end-to-end; benchmarks[]. resources now populates — the lever that makes the NIC-bound 64k cells interpretable. 4. Weekly = full grid. HeadlineWeekly() now runs the full grid (Globs "*/*"), same coverage as Full(), differing only by per-cell window (60s/15s ~21.6h fits 24h single-arch vs 90s/20s). Both profiles set Arches:1 (amd64-only; BenchTier overrides per BENCH_TARGET). Rated sweep stays curated. Removed HeadlineServers/HeadlineScenarios/headlineGlobs. Budget: FullRealizedCells 520->820 (realized grid ~799). Full Go suite green; all polyglot adapters build + h2c-smoke-tested locally.
…lumns + h2o/lithium fixes
Adapters (all contract-verified on the cluster, amd64):
- rust/actix, cpp/lithium, c/libreactor+h2o, zig/httpzig, bun/bunraw,
python/starlette, node/{uws,fastify,express}, java/{vertx,netty}, go/nbio
- new ansible roles: node (Node 24 LTS), java (Temurin 21 + Maven); cpp +Boost;
c role extended to build libh2o; deploy.yml + build_native_competitor.yml node/java wiring
Registry / matrix:
- 5 new celeris engine columns (adaptive x2, epoll-async x2, iouring-h1-sync)
- 256c / 512c concurrency points; removed auto-mix-111 (synthetic mixed-proto)
- removed ntex-h2: ntex's h2c DNFs under the loadgen's concurrent dial pool
(works single-conn; guaranteed-DNF cell, same policy as drogon-h2)
Fixes found via the full smoke sweep:
- h2o: single-evloop -> fork-per-core + SO_REUSEPORT (190k -> 1.3M RPS)
- vertx: pin netty 4.2.0 epoll classes (EpollIoHandler) for vert.x 5.0.0
- libreactor: build from master (the v1.0.1 tag's dynamic.h dep is broken)
celeris adapter pinned to milestone/v1.5.3 (v1.5.3-0.20260619092256-90839abb91cb).
…y schedule The full grid is 1257 saturation cells (not the stale 820 pin), and rated mode runs on EVERY cell (run_bench_cell.yml gates -rated on the global bench_rated, ignoring the curated RatedServers×RatedScenarios subset) — 4 closed-loop passes × ~50s = ~196s/cell, 63% of the run. Together that's ~80–100h, not the ~21.6h the projection (820 cells + 24 rated cells) claimed. New "fast" profile = the FULL grid (every server × every scenario, capability- gated, */*) in SATURATION ONLY (rated off) at 35s/10s → 1257 cells × 62s = 21h39m, fits 24h with 2h21m headroom (FitWithin asserts it). RatedPasses=0 makes BenchTier skip the -rated flag entirely. ForProfile default + unknown → fast (full coverage, never a curated subset). full/headline kept for rare rated deep-dives. benchmark-tier.yml: default dispatch profile=fast; re-activate the weekly cadence (cron Wed 04:00 UTC) running fast; BENCH_BUDGET asserts 24h for fast, raises to 70h only for an explicit full dispatch. Also removes the ntex-h2 column (ntex h2c DNFs under the loadgen's concurrent dial pool; ntex-h1 unaffected).
…/loadgen hosts) The matrix job's ansible control loop + result aggregation must not share a host with the SUT (msa2-server) or loadgen (msa2-client) — otherwise it steals CPU/IO from the measurement. Pin runs-on to the otherwise-idle msr1 (safe as a conductor; only unusable as a bench TARGET under NIC load, celeris#312). Applies to future dispatches incl. the weekly cron; the in-flight run is unaffected.
… respawn Three fixes from the full-run failure analysis (httpzig was the sole publish blocker): - httpzig: build ReleaseSafe (not ReleaseFast) at BOTH sites (servers.go + the httpzig nativeBuildSpec in mage_cluster.go). http.zig's NonBlocking worker hits an `unreachable` under connection churn (churn-close); ReleaseFast makes it a silent process-killing UB, ReleaseSafe makes it a recoverable panic. - lithium: pin the upstream SHA + sed-patch the vendored single-header's input buffer from 50KiB -> 2MiB (with a FATAL_ERROR assert if the literal drifts). post-64k/post-1m bodies overflowed the 50KiB cap -> zero successful requests. - harness: wrap the SUT launch in a bounded (max 5) respawn supervisor so a crash mid-column self-heals in ~0.5s instead of cascading into N server-down cells (httpzig lost 16 cells this way). Teardown kills the supervisor (fixed-path pid) FIRST so the loop can't re-bind the port. Still pending for a fully clean re-run: loadgen-side drogon churn-backoff + h2 large-body frame/window handling; the audit confirmed the other 49 servers are crash/dead-SUT-safe.
…ol + h1 read-EOF backoff) v1.4.9 fixes the loadgen-side DNFs from the last full matrix run: drogon churn-close (h1 read-EOF backoff), post-64k-h2 across the 5 h2 columns (frame-size + send-window flow control), and fastapi-h2 under hypercorn (re-dial connections the server GOAWAYs mid-cell — was a 1.1B-error hot loop). Pulls x/net 0.56.0 transitively.
… for targeted re-runs
A column that DNF'd in a full run previously forced re-running the entire
~21.6h grid. Expose three workflow_dispatch inputs so a subset can be
re-run cheaply:
- competitors -> BENCH_COMPETITORS (which columns to bench)
- deploy_competitors -> DEPLOY_COMPETITORS (which binaries to build; h2
columns reuse their h1 sibling's binary, so the
deploy set differs from the bench set)
- publish=false -> BENCH_PUBLISH=0 (data-only; no docs push / no
integrity gate) for a re-run you compose + publish
manually
All default to the full-grid/publishing behaviour, so the weekly schedule
and a no-arg dispatch are unchanged.
… inherits it A subset re-run (run 27901600674) failed: the explicit Deploy step staged the right binaries, but BenchTier found no manifest (the prior run's Cleanup wiped it) and ran its internal auto-deploy, which falls back to BENCH_COMPETITORS for DEPLOY_COMPETITORS when the latter is unset in its env. BENCH_COMPETITORS is COLUMN slugs (aspnet-h2, ...) and Deploy only accepts MODULE slugs (aspnet, ...), so it aborted: '"aspnet-h2" resolves to neither a Go competitor nor a native one'. Scoping DEPLOY_COMPETITORS to the Deploy step left the BenchTier step's env without it. Move it to the job env so both the explicit Deploy and the auto-deploy use the module names.
… split_args)
The mid-column respawn change added a comment 'the prior column's respawn
supervisor' to the free-form (shell: |) stop task. ansible's ModuleArgsParser
runs split_args over the ENTIRE free-form string to extract trailing k=v
params, and it does not treat a leading # as a shell comment — so the lone
apostrophe in column's read as an unterminated single-quote and aborted task
loading for the whole file: 'failed at splitting arguments, either an
unbalanced jinja2 block or quotes: bp={{ bench_port | default(8080) }}'. The
bench step exited 4 before running a single cell (run 27902033388). Reworded
the comment with no apostrophes/unpaired quotes and added an NB warning.
Verified: all 93 free-form shell/command tasks across ansible/ pass
split_args, and every shell body passes bash -n.
… go.mod tidy) Make feat/h2c-64k-resources pass the Lint + Test merge gates: - .github/actionlint.yaml: declare the 'msr1' self-hosted-runner label so benchmark-tier.yml's 'runs-on: [self-hosted, celeris-cluster, msr1]' validates (was the sole Lint failure). - cmd/runner/main_test.go: two stale assertions vs intentional changes — TestParseArgs_Defaults expected Duration=120s but DefaultConfig is now 45s (rated-off fast profile, 2d40f6d); TestNativeH2cColumnsAreH2cOnly listed 'ntex-h2' which was deliberately removed from the registry. - servers/{gin,hertz,iris}/go.mod+go.sum: go mod tidy (x/net 0.55->0.56 drift) so their Test jobs stop failing the tidy check.
- root: bradfitz/gomemcache -> latest; modernc.org/sqlite v1.52.0 -> v1.53.0
(pulls modernc.org/libc v1.73.4).
- servers/{chi,fasthttp,gin}: bradfitz/gomemcache -> latest.
- CI: actions/checkout v6 -> v7 across all six workflows (supersedes
dependabot #198).
The goceleris/probatorium pseudo-version 'updates' are replace-backed
local placeholders (untouched); the validation/refapp celeris pins are
version-syncs handled at the v1.5.3 re-pin. root build+vet+runner tests
green; servers build; actionlint clean.
celeris v1.5.3 is now released, so replace the smoke-only pseudo-version and the stale 1.4.15 conformance pins with the clean tag: - servers/celeris: v1.5.3-0.20260619... pseudo-version -> v1.5.3. - validation/refapp/*: celeris 1.4.15 -> v1.5.3 (all 8 SDD-conformance refapps; observability also middleware/metrics+otel 1.4.15 -> v1.5.3). All rebuild clean against the released v1.5.3 (+ middleware v1.5.3 tags).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Prepares the next bench round. All changes are harness-only (no celeris/loadgen edits). Everything compiles and the full Go suite is green; every polyglot adapter was built and h2c-smoke-tested locally (
curl --http2-prior-knowledge).1. h2c coverage — +7 framework columns
The h2 scenarios were N/A for every non-Go framework. Added prior-knowledge h2c (
-h2, h2c-noupg so they run only the h2 rows):http2builder / ntexHttpService::h2HttpProtocols.Http2node:http2cleartext bridge toapp.fetchdrogon got no h2 column — drogon 1.9.x has zero server-side HTTP/2 (verified against its headers); its
-engine h2cfails fast and its N/A is correct, not a gap. Native h2 columns share their h1 sibling's build viaNativeBinary.BinName; every native adapter gained-engineparsing andresolveBenchColumnsnow passes-engine h1/-engine h2c.2. The 64k problem
get/post-json-8k/16k— under the 20G ceiling so fast servers stay CPU-bound and differentiate. Endpoints + byte-identical payloads (8286 / 16463 bytes) added to all 19 adapters.BuildDocumentflags cells whose bandwidth sits at the fabric line rate while the loadgen has headroom, and a new markdown section ranks those cells by Gbps-per-CPU% instead of raw RPS.3. Resources — the real bug
Server CPU/RSS (
observer.sqlite+cpu.log) was captured and fetched, but the merge→aggregate→document path silently dropped it (CellResult/CellAggregatehad noResourcesfield). Threaded end-to-end;benchmarks[].resourcesnow populates — exactly what makes the NIC-bound 64k cells interpretable.4. Weekly = full grid
HeadlineWeekly()now runs the full grid (Globs: ["*/*"]), same coverage asFull(), differing only by per-cell window:BENCH_BUDGET=70hBoth profiles set
Arches: 1(amd64-only today;BenchTieroverrides perBENCH_TARGET). Rated sweep stays curated (expanding it would blow the budget). RemovedHeadlineServers/HeadlineScenarios/headlineGlobs().Budget
FullRealizedCells520→820 (realized grid ~799).Full()projects ~30h, headline ~21.6h, both under the 70h manual budget.Verification
cargo, dotnetdotnet build, Bun conformance, Python venv) with h2c confirmed by curl.