Skip to content

Spike: transplant established connections between epoll ↔ io_uring (lift connection pinning) #383

@FumingPower3925

Description

@FumingPower3925

Spike: transplant established connections between epoll ↔ io_uring

Context

The adaptive engine's new start-engine algorithm works around connection pinning:
an established connection cannot migrate between the epoll and io_uring event
loops, so the start engine decides keep-alive throughput and the runtime switch
can only route new connections. This is fine for ramps/churn but inert for a
pure keep-alive burst (it stays on whatever engine it landed on).

This spike explores lifting that limitation: can we transplant an already
established connection's fd + state across engines, so a sustained high-load
ramp can migrate its existing keep-alives onto io_uring (not just new arrivals)?

Feasibility verdict (from the design investigation)

The two engines share no connection abstractionengine/iouring/conn.go
and engine/epoll/conn.go are disjoint structs with incompatible lifecycles
(level-triggered dirty-list vs. kernel-op generation/kernelInflight). The only
existing detach (hijackConn) hands off the bare fd and discards all parser/
engine state.

Case Verdict Why
H1, idle between requests, epoll→io_uring, real fd HARD — the only feasible case At a request boundary the migratable state is minimal (parser at position 0, response flushed, no handler running). The fd is a plain TCP socket.
H1, mid-request IMPOSSIBLE Opaque parser cursor / header-machine / chunked state; handler not yet run.
H2 (any time) IMPOSSIBLE HPACK dynamic tables, live streams + running handler goroutines, flow-control/continuation state — none copyable; zero-stream H2 == close+reopen.

io_uring→epoll is additionally blocked for fixed-file conns (fd is a table
index, not a real fd). So the only slice worth prototyping is H1-idle,
epoll→io_uring, real-fd
.

Proposed prototype (behind a feature flag, off by default)

  1. Quiesce on source (epoll loop thread): reach an H1 request boundary;
    assert response flushed, no async handler in flight, not detached.
  2. Detach fd from epoll: EPOLL_CTL_DEL + remove from the loop's conn map,
    WITHOUT wrapping in net.Conn; capture carry-over state (any pipelined
    buffered bytes, KeepAlive, remoteAddr, ctx); discard epoll-specific fields.
  3. Attach to an io_uring worker (worker thread): mirror onAcceptedFD
    track the fd, arm multishot recv with a provided buffer, install the H1
    parser at a boundary, re-inject buffered bytes so a pipelined request re-parses.
  4. Cross-engine handoff signal: detach must run on the epoll loop thread and
    attach on the io_uring worker thread — needs an eventfd/queue coordination
    that does not exist today.

Known blockers / risks

  • io_uring in-flight ops must drain (kernelInflight==0, -ECANCELED terminal
    CQE) before the fd is clean — adds a CQE round-trip of latency.
  • Provided-buffer ownership must be returned before release.
  • Cross-thread atomicity (no cross-engine channel exists yet).
  • Race-prone subsystem; must be validated with -race and the strict matrix.

Acceptance criteria for the spike

  • A prototype that migrates an idle H1 keep-alive conn epoll→io_uring behind a
    flag, with a test proving the next request on that conn is served by io_uring.
  • Measured switch + migration latency, and the throughput delta vs. the
    "route new conns only" baseline on a ramping/keep-alive workload.
  • A go/no-go recommendation: is the complexity worth the gain, or do we keep
    "route new conns only" and rely on WorkloadHint for high-conc deployments?

References

  • Pinning + new adaptive algorithm: adaptive/engine.go (chooseStartEngine,
    performSwitch), adaptive/controller.go.
  • Disjoint conn structs: engine/iouring/conn.go, engine/epoll/conn.go.
  • Existing fd handoff: hijackConn in engine/epoll/loop.go, engine/iouring/worker.go.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions