Skip to content
This repository was archived by the owner on May 10, 2026. It is now read-only.
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
193 changes: 31 additions & 162 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,174 +1,43 @@
# Celeris Benchmarks
# Celeris Benchmarks — archived

Reproducible HTTP server benchmarks on dedicated bare-metal hardware with 10GbE point-to-point networking. Compares production Go frameworks against theoretical maximum performance using raw Linux syscalls.
This repository is **archived as of celeris v1.4.3**. All benchmark + validation work has moved to:

## Why This Exists
**[github.com/goceleris/probatorium](https://github.com/goceleris/probatorium)**

Most HTTP benchmarks run on shared VMs with noisy neighbors, variable network hops, and throttled I/O — making results unreliable and non-reproducible. This suite runs on dedicated bare-metal machines with direct 10GbE links, automated kernel tuning, and CPU pinning, so every release gets consistent, comparable numbers.
## Why the move

We measure three categories of servers:
- **Baseline**: Production Go frameworks (Gin, Fiber, Echo, Chi, Iris, Hertz, FastHTTP, stdlib)
- **Celeris**: The [Celeris](https://github.com/goceleris/celeris) HTTP engine with io_uring, epoll, and adaptive backends
- **Theoretical**: Raw epoll/io_uring implementations showing the syscall performance ceiling
`goceleris/benchmarks` predated:

## Hardware
- `goceleris/loadgen` — the dedicated, HdrHistogram-aware load generator. The bench
loop here forked its own minimal client and could not measure latency under coordinated
omission correction.
- `celeris/test/perfmatrix` — the in-tree scenario / interleave / report scaffolding that
the celeris team uses for release-gate matrix runs.
- The 3-host LACP cluster + ansible orchestration. This repo was structurally
single-host: it ran the loadgen and the server in the same process and could not split
them across machines.
- The validation tier (TigerBeetle-VOPR-inspired property soak + RESTler-style fuzzing +
deterministic-seed fault injection). Probatorium folds bench and validation into one
pipeline so every celeris release is gated on **both** "no regression vs baseline" AND
"no invariant violation under 10-day soak."

Three dedicated Minisforum mini PCs connected via 10GbE point-to-point links:
Probatorium subsumes everything this repo did and adds the validation tier, the cluster
fabric, the per-arch matrix, and the publish-to-docs cascade.

| Machine | Role | CPU | Cores/Threads | RAM | Network |
|---------|------|-----|---------------|-----|---------|
| MS-A2 | Client (self-hosted runner) | AMD Ryzen 9 9955HX (Zen 5) | 16C/32T | 32 GB DDR5 | 10GbE SFP+ |
| MS-A2 | x86 Server | AMD Ryzen 7 7745HX (Zen 4) | 8C/16T | 32 GB DDR5 | 10GbE SFP+ |
| MS-R1 | ARM64 Server | CIX CP8180 | 12C/12T | 64 GB LPDDR5 | Dual 10GbE RJ45 (RTL8127) |
## Where each piece went

All machines run Debian 13 (Trixie) with kernel 6.12+ for full io_uring support. The client machine is the GitHub Actions self-hosted runner that orchestrates everything via SSH.
| In this repo | In probatorium |
| ------------------------------------------ | ----------------------------------------------- |
| `servers/baseline/{gin,echo,chi,...}/` | `servers/<name>/` (one go.mod per adapter) |
| `cmd/bench/`, `internal/runner/` | `cmd/runner/` + scenario/interleave packages |
| `internal/dashboard/` | `report/` (v5.0 schema, HdrHistogram-aware) |
| `magefile.go` cluster targets | `mage_cluster.go` + `ansible/` (cluster-driven) |
| Result schema v4.0 | Result schema v5.0 (additive over v4) |
| (no validation tier) | `validation/`, `cmd/validator{,-checker,-replay}/` |

## Benchmark Types

### Standard Level (7 types, ~66 min per architecture)

| Type | Endpoint | What It Tests |
|------|----------|---------------|
| `simple` | `GET /` | Plain text — pure framework overhead |
| `json` | `GET /json` | JSON serialization |
| `path` | `GET /users/:id` | Path parameter extraction + routing |
| `body` | `POST /upload` | 2 KB request body read |
| `headers` | `GET /users/:id` | Realistic API headers (~850 bytes: JWT, cookies, tracing) |
| `json-64k` | `GET /json-64k` | 64 KB JSON response — I/O throughput, efficiency metric |
| `churn` | `GET /` | New TCP connection per request — tests `accept()`, `SO_REUSEPORT` |

### Full Level (15 types, ~142 min per architecture)

Adds a **concurrency sweep** that scales connections from 1 to 10,000 on the `simple` endpoint:

```
simple@1 simple@10 simple@50 simple@100 simple@500 simple@1000 simple@5000 simple@10000
```

This produces scaling curves that show where goroutine-based frameworks plateau and where event-loop servers keep climbing.

## Servers Tested

### Production Frameworks (Baseline)

| Server | Protocols | Framework |
|--------|-----------|-----------|
| stdhttp | H1, H2C, Hybrid | Go stdlib `net/http` |
| gin | H1, H2C, Hybrid | [Gin](https://github.com/gin-gonic/gin) |
| echo | H1, H2C, Hybrid | [Echo](https://github.com/labstack/echo) |
| chi | H1, H2C, Hybrid | [Chi](https://github.com/go-chi/chi) |
| iris | H1, H2C, Hybrid | [Iris](https://github.com/kataras/iris) |
| hertz | H1, H2C, Hybrid | [Hertz](https://github.com/cloudwego/hertz) |
| fiber | H1 | [Fiber](https://github.com/gofiber/fiber) (fasthttp-based) |
| fasthttp | H1 | [FastHTTP](https://github.com/valyala/fasthttp) |

### Celeris

| Server | Protocols | Engine |
|--------|-----------|--------|
| celeris-iouring | H1, H2C, Hybrid | io_uring (Linux 5.10+) |
| celeris-epoll | H1, H2C, Hybrid | epoll (Linux 2.6+) |
| celeris-adaptive | H1, H2C, Hybrid | Runtime engine selection |

Each engine runs with three resource profiles: `latency`, `throughput`, and `balanced`.

### Theoretical Maximum

| Server | Protocols | Implementation |
|--------|-----------|----------------|
| epoll | H1, H2C, Hybrid | Raw epoll with SO_REUSEPORT, SIMD header parsing, zero-alloc response path |
| iouring | H1, H2C, Hybrid | io_uring with SQPOLL, multishot accept, linked SQEs |

## Dashboard & Results

Results are published to [goceleris/docs](https://github.com/goceleris/docs) as dashboard-format JSON (schema v4.0), keyed by Celeris version:

- `results/latest/{arch}.json` — most recent run
- `results/{version}/{arch}.json` — per-version archive

Dashboard data includes:
- **RPS and latency percentiles** (P50, P75, P90, P99, P999, P9999) per server per benchmark type
- **Concurrency scaling curves** — RPS at each concurrency level (full level only)
- **Efficiency metric** — RPS / Server CPU% per server, normalizing across core counts
- **System metrics** — server CPU, memory RSS, GC pauses (Go servers only)
- **Timeseries** — per-second RPS and P99 latency snapshots

## Running Benchmarks

Benchmarks are designed to run through GitHub Actions workflows. The self-hosted runner on the client machine handles everything: SSH into servers, deploy binaries, tune kernels, run benchmarks, collect results.

### Via GitHub Actions (Primary Method)

- **Release benchmarks**: Trigger automatically on every release, or manually via the `benchmark.yml` workflow dispatch. Releases run at `full` level (includes concurrency sweep).
- **PR benchmarks**: Add the `benchmark` label to a pull request. Runs at `standard` level.

### Local Development

For local development and testing (not full benchmarks):

```bash
# Build server and bench binaries
mage build

# Run a quick local smoke test (5s per server, localhost)
mage benchmarkQuick
```

## CI/CD

| Workflow | Trigger | Level | Timeout |
|----------|---------|-------|---------|
| `benchmark.yml` | Release (auto) or manual dispatch | `full` on release, configurable on manual | 480 min |
| `benchmark-pr.yml` | PR with `benchmark` label | `standard` | 240 min |

Both workflows SSH to the bare-metal servers, deploy the server binary, run benchmarks, and collect results. Release runs also publish to the docs repository and trigger a site rebuild.

## Project Structure

```
cmd/bench/ Benchmark runner CLI (specs, runner, checkpoint)
cmd/server/ Server binary (all implementations + control daemon)
servers/
baseline/ Production frameworks (gin, echo, chi, iris, etc.)
celeris/ Celeris HTTP engine
theoretical/ Raw epoll/iouring implementations
common/ Shared types, payload generators, SIMD helpers
internal/
dashboard/ Dashboard JSON format (schema v4.0)
metrics/ Prometheus metrics definitions
version/ Version info
config/
hosts.json Machine addresses and hardware metadata
```

## Contributing

### Requirements

- **Go 1.24+**: [Download](https://go.dev/dl/)
- **Mage**: `go install github.com/magefile/mage@latest`

### Development

```bash
mage check # deps + lint + vet + build
mage test # run tests
mage fmt # format code
```

### Adding a Server

1. Create a package under `servers/baseline/` (or `servers/theoretical/`)
2. Implement all benchmark endpoints: `GET /`, `GET /json`, `GET /json-1k`, `GET /json-64k`, `GET /users/:id`, `POST /upload`
3. Register the server type in `cmd/server/main.go`
4. Add to the server list in `cmd/bench/main.go`

### Adding a Benchmark Type

1. Add the endpoint to all server implementations
2. Add a `BenchmarkSpec` entry in `cmd/bench/main.go`
3. Update dashboard format if new fields are needed (`internal/dashboard/format.go`)
The v4.0 result JSONs published from this repo remain readable by probatorium's v5.0
parser — the schema bump was additive.

## License

Apache 2.0
[Apache 2.0](LICENSE), unchanged. Use these snapshots for historical reference.
Loading