diff --git a/README.md b/README.md index 1040f61..a583eb9 100644 --- a/README.md +++ b/README.md @@ -1,174 +1,43 @@ -# Celeris Benchmarks +# Celeris Benchmarks — archived -Reproducible HTTP server benchmarks on dedicated bare-metal hardware with 10GbE point-to-point networking. Compares production Go frameworks against theoretical maximum performance using raw Linux syscalls. +This repository is **archived as of celeris v1.4.3**. All benchmark + validation work has moved to: -## Why This Exists +**[github.com/goceleris/probatorium](https://github.com/goceleris/probatorium)** -Most HTTP benchmarks run on shared VMs with noisy neighbors, variable network hops, and throttled I/O — making results unreliable and non-reproducible. This suite runs on dedicated bare-metal machines with direct 10GbE links, automated kernel tuning, and CPU pinning, so every release gets consistent, comparable numbers. +## Why the move -We measure three categories of servers: -- **Baseline**: Production Go frameworks (Gin, Fiber, Echo, Chi, Iris, Hertz, FastHTTP, stdlib) -- **Celeris**: The [Celeris](https://github.com/goceleris/celeris) HTTP engine with io_uring, epoll, and adaptive backends -- **Theoretical**: Raw epoll/io_uring implementations showing the syscall performance ceiling +`goceleris/benchmarks` predated: -## Hardware +- `goceleris/loadgen` — the dedicated, HdrHistogram-aware load generator. The bench + loop here forked its own minimal client and could not measure latency under coordinated + omission correction. +- `celeris/test/perfmatrix` — the in-tree scenario / interleave / report scaffolding that + the celeris team uses for release-gate matrix runs. +- The 3-host LACP cluster + ansible orchestration. This repo was structurally + single-host: it ran the loadgen and the server in the same process and could not split + them across machines. +- The validation tier (TigerBeetle-VOPR-inspired property soak + RESTler-style fuzzing + + deterministic-seed fault injection). Probatorium folds bench and validation into one + pipeline so every celeris release is gated on **both** "no regression vs baseline" AND + "no invariant violation under 10-day soak." -Three dedicated Minisforum mini PCs connected via 10GbE point-to-point links: +Probatorium subsumes everything this repo did and adds the validation tier, the cluster +fabric, the per-arch matrix, and the publish-to-docs cascade. -| Machine | Role | CPU | Cores/Threads | RAM | Network | -|---------|------|-----|---------------|-----|---------| -| MS-A2 | Client (self-hosted runner) | AMD Ryzen 9 9955HX (Zen 5) | 16C/32T | 32 GB DDR5 | 10GbE SFP+ | -| MS-A2 | x86 Server | AMD Ryzen 7 7745HX (Zen 4) | 8C/16T | 32 GB DDR5 | 10GbE SFP+ | -| MS-R1 | ARM64 Server | CIX CP8180 | 12C/12T | 64 GB LPDDR5 | Dual 10GbE RJ45 (RTL8127) | +## Where each piece went -All machines run Debian 13 (Trixie) with kernel 6.12+ for full io_uring support. The client machine is the GitHub Actions self-hosted runner that orchestrates everything via SSH. +| In this repo | In probatorium | +| ------------------------------------------ | ----------------------------------------------- | +| `servers/baseline/{gin,echo,chi,...}/` | `servers//` (one go.mod per adapter) | +| `cmd/bench/`, `internal/runner/` | `cmd/runner/` + scenario/interleave packages | +| `internal/dashboard/` | `report/` (v5.0 schema, HdrHistogram-aware) | +| `magefile.go` cluster targets | `mage_cluster.go` + `ansible/` (cluster-driven) | +| Result schema v4.0 | Result schema v5.0 (additive over v4) | +| (no validation tier) | `validation/`, `cmd/validator{,-checker,-replay}/` | -## Benchmark Types - -### Standard Level (7 types, ~66 min per architecture) - -| Type | Endpoint | What It Tests | -|------|----------|---------------| -| `simple` | `GET /` | Plain text — pure framework overhead | -| `json` | `GET /json` | JSON serialization | -| `path` | `GET /users/:id` | Path parameter extraction + routing | -| `body` | `POST /upload` | 2 KB request body read | -| `headers` | `GET /users/:id` | Realistic API headers (~850 bytes: JWT, cookies, tracing) | -| `json-64k` | `GET /json-64k` | 64 KB JSON response — I/O throughput, efficiency metric | -| `churn` | `GET /` | New TCP connection per request — tests `accept()`, `SO_REUSEPORT` | - -### Full Level (15 types, ~142 min per architecture) - -Adds a **concurrency sweep** that scales connections from 1 to 10,000 on the `simple` endpoint: - -``` -simple@1 simple@10 simple@50 simple@100 simple@500 simple@1000 simple@5000 simple@10000 -``` - -This produces scaling curves that show where goroutine-based frameworks plateau and where event-loop servers keep climbing. - -## Servers Tested - -### Production Frameworks (Baseline) - -| Server | Protocols | Framework | -|--------|-----------|-----------| -| stdhttp | H1, H2C, Hybrid | Go stdlib `net/http` | -| gin | H1, H2C, Hybrid | [Gin](https://github.com/gin-gonic/gin) | -| echo | H1, H2C, Hybrid | [Echo](https://github.com/labstack/echo) | -| chi | H1, H2C, Hybrid | [Chi](https://github.com/go-chi/chi) | -| iris | H1, H2C, Hybrid | [Iris](https://github.com/kataras/iris) | -| hertz | H1, H2C, Hybrid | [Hertz](https://github.com/cloudwego/hertz) | -| fiber | H1 | [Fiber](https://github.com/gofiber/fiber) (fasthttp-based) | -| fasthttp | H1 | [FastHTTP](https://github.com/valyala/fasthttp) | - -### Celeris - -| Server | Protocols | Engine | -|--------|-----------|--------| -| celeris-iouring | H1, H2C, Hybrid | io_uring (Linux 5.10+) | -| celeris-epoll | H1, H2C, Hybrid | epoll (Linux 2.6+) | -| celeris-adaptive | H1, H2C, Hybrid | Runtime engine selection | - -Each engine runs with three resource profiles: `latency`, `throughput`, and `balanced`. - -### Theoretical Maximum - -| Server | Protocols | Implementation | -|--------|-----------|----------------| -| epoll | H1, H2C, Hybrid | Raw epoll with SO_REUSEPORT, SIMD header parsing, zero-alloc response path | -| iouring | H1, H2C, Hybrid | io_uring with SQPOLL, multishot accept, linked SQEs | - -## Dashboard & Results - -Results are published to [goceleris/docs](https://github.com/goceleris/docs) as dashboard-format JSON (schema v4.0), keyed by Celeris version: - -- `results/latest/{arch}.json` — most recent run -- `results/{version}/{arch}.json` — per-version archive - -Dashboard data includes: -- **RPS and latency percentiles** (P50, P75, P90, P99, P999, P9999) per server per benchmark type -- **Concurrency scaling curves** — RPS at each concurrency level (full level only) -- **Efficiency metric** — RPS / Server CPU% per server, normalizing across core counts -- **System metrics** — server CPU, memory RSS, GC pauses (Go servers only) -- **Timeseries** — per-second RPS and P99 latency snapshots - -## Running Benchmarks - -Benchmarks are designed to run through GitHub Actions workflows. The self-hosted runner on the client machine handles everything: SSH into servers, deploy binaries, tune kernels, run benchmarks, collect results. - -### Via GitHub Actions (Primary Method) - -- **Release benchmarks**: Trigger automatically on every release, or manually via the `benchmark.yml` workflow dispatch. Releases run at `full` level (includes concurrency sweep). -- **PR benchmarks**: Add the `benchmark` label to a pull request. Runs at `standard` level. - -### Local Development - -For local development and testing (not full benchmarks): - -```bash -# Build server and bench binaries -mage build - -# Run a quick local smoke test (5s per server, localhost) -mage benchmarkQuick -``` - -## CI/CD - -| Workflow | Trigger | Level | Timeout | -|----------|---------|-------|---------| -| `benchmark.yml` | Release (auto) or manual dispatch | `full` on release, configurable on manual | 480 min | -| `benchmark-pr.yml` | PR with `benchmark` label | `standard` | 240 min | - -Both workflows SSH to the bare-metal servers, deploy the server binary, run benchmarks, and collect results. Release runs also publish to the docs repository and trigger a site rebuild. - -## Project Structure - -``` -cmd/bench/ Benchmark runner CLI (specs, runner, checkpoint) -cmd/server/ Server binary (all implementations + control daemon) -servers/ - baseline/ Production frameworks (gin, echo, chi, iris, etc.) - celeris/ Celeris HTTP engine - theoretical/ Raw epoll/iouring implementations - common/ Shared types, payload generators, SIMD helpers -internal/ - dashboard/ Dashboard JSON format (schema v4.0) - metrics/ Prometheus metrics definitions - version/ Version info -config/ - hosts.json Machine addresses and hardware metadata -``` - -## Contributing - -### Requirements - -- **Go 1.24+**: [Download](https://go.dev/dl/) -- **Mage**: `go install github.com/magefile/mage@latest` - -### Development - -```bash -mage check # deps + lint + vet + build -mage test # run tests -mage fmt # format code -``` - -### Adding a Server - -1. Create a package under `servers/baseline/` (or `servers/theoretical/`) -2. Implement all benchmark endpoints: `GET /`, `GET /json`, `GET /json-1k`, `GET /json-64k`, `GET /users/:id`, `POST /upload` -3. Register the server type in `cmd/server/main.go` -4. Add to the server list in `cmd/bench/main.go` - -### Adding a Benchmark Type - -1. Add the endpoint to all server implementations -2. Add a `BenchmarkSpec` entry in `cmd/bench/main.go` -3. Update dashboard format if new fields are needed (`internal/dashboard/format.go`) +The v4.0 result JSONs published from this repo remain readable by probatorium's v5.0 +parser — the schema bump was additive. ## License -Apache 2.0 +[Apache 2.0](LICENSE), unchanged. Use these snapshots for historical reference.