Spike: transplant established connections between epoll ↔ io_uring
Context
The adaptive engine's new start-engine algorithm works around connection pinning:
an established connection cannot migrate between the epoll and io_uring event
loops, so the start engine decides keep-alive throughput and the runtime switch
can only route new connections. This is fine for ramps/churn but inert for a
pure keep-alive burst (it stays on whatever engine it landed on).
This spike explores lifting that limitation: can we transplant an already
established connection's fd + state across engines, so a sustained high-load
ramp can migrate its existing keep-alives onto io_uring (not just new arrivals)?
Feasibility verdict (from the design investigation)
The two engines share no connection abstraction — engine/iouring/conn.go
and engine/epoll/conn.go are disjoint structs with incompatible lifecycles
(level-triggered dirty-list vs. kernel-op generation/kernelInflight). The only
existing detach (hijackConn) hands off the bare fd and discards all parser/
engine state.
| Case |
Verdict |
Why |
| H1, idle between requests, epoll→io_uring, real fd |
HARD — the only feasible case |
At a request boundary the migratable state is minimal (parser at position 0, response flushed, no handler running). The fd is a plain TCP socket. |
| H1, mid-request |
IMPOSSIBLE |
Opaque parser cursor / header-machine / chunked state; handler not yet run. |
| H2 (any time) |
IMPOSSIBLE |
HPACK dynamic tables, live streams + running handler goroutines, flow-control/continuation state — none copyable; zero-stream H2 == close+reopen. |
io_uring→epoll is additionally blocked for fixed-file conns (fd is a table
index, not a real fd). So the only slice worth prototyping is H1-idle,
epoll→io_uring, real-fd.
Proposed prototype (behind a feature flag, off by default)
- Quiesce on source (epoll loop thread): reach an H1 request boundary;
assert response flushed, no async handler in flight, not detached.
- Detach fd from epoll:
EPOLL_CTL_DEL + remove from the loop's conn map,
WITHOUT wrapping in net.Conn; capture carry-over state (any pipelined
buffered bytes, KeepAlive, remoteAddr, ctx); discard epoll-specific fields.
- Attach to an io_uring worker (worker thread): mirror
onAcceptedFD —
track the fd, arm multishot recv with a provided buffer, install the H1
parser at a boundary, re-inject buffered bytes so a pipelined request re-parses.
- Cross-engine handoff signal: detach must run on the epoll loop thread and
attach on the io_uring worker thread — needs an eventfd/queue coordination
that does not exist today.
Known blockers / risks
- io_uring in-flight ops must drain (
kernelInflight==0, -ECANCELED terminal
CQE) before the fd is clean — adds a CQE round-trip of latency.
- Provided-buffer ownership must be returned before release.
- Cross-thread atomicity (no cross-engine channel exists yet).
- Race-prone subsystem; must be validated with
-race and the strict matrix.
Acceptance criteria for the spike
References
- Pinning + new adaptive algorithm:
adaptive/engine.go (chooseStartEngine,
performSwitch), adaptive/controller.go.
- Disjoint conn structs:
engine/iouring/conn.go, engine/epoll/conn.go.
- Existing fd handoff:
hijackConn in engine/epoll/loop.go, engine/iouring/worker.go.
Spike: transplant established connections between epoll ↔ io_uring
Context
The adaptive engine's new start-engine algorithm works around connection pinning:
an established connection cannot migrate between the epoll and io_uring event
loops, so the start engine decides keep-alive throughput and the runtime switch
can only route new connections. This is fine for ramps/churn but inert for a
pure keep-alive burst (it stays on whatever engine it landed on).
This spike explores lifting that limitation: can we transplant an already
established connection's fd + state across engines, so a sustained high-load
ramp can migrate its existing keep-alives onto io_uring (not just new arrivals)?
Feasibility verdict (from the design investigation)
The two engines share no connection abstraction —
engine/iouring/conn.goand
engine/epoll/conn.goare disjoint structs with incompatible lifecycles(level-triggered dirty-list vs. kernel-op generation/
kernelInflight). The onlyexisting detach (
hijackConn) hands off the bare fd and discards all parser/engine state.
io_uring→epoll is additionally blocked for fixed-file conns (fd is a table
index, not a real fd). So the only slice worth prototyping is H1-idle,
epoll→io_uring, real-fd.
Proposed prototype (behind a feature flag, off by default)
assert response flushed, no async handler in flight, not detached.
EPOLL_CTL_DEL+ remove from the loop's conn map,WITHOUT wrapping in
net.Conn; capture carry-over state (any pipelinedbuffered bytes, KeepAlive, remoteAddr, ctx); discard epoll-specific fields.
onAcceptedFD—track the fd, arm multishot recv with a provided buffer, install the H1
parser at a boundary, re-inject buffered bytes so a pipelined request re-parses.
attach on the io_uring worker thread — needs an eventfd/queue coordination
that does not exist today.
Known blockers / risks
kernelInflight==0,-ECANCELEDterminalCQE) before the fd is clean — adds a CQE round-trip of latency.
-raceand the strict matrix.Acceptance criteria for the spike
flag, with a test proving the next request on that conn is served by io_uring.
"route new conns only" baseline on a ramping/keep-alive workload.
"route new conns only" and rely on
WorkloadHintfor high-conc deployments?References
adaptive/engine.go(chooseStartEngine,performSwitch),adaptive/controller.go.engine/iouring/conn.go,engine/epoll/conn.go.hijackConninengine/epoll/loop.go,engine/iouring/worker.go.