Skip to content

test(uffd): switch cross-process helper to page-state snapshots#2472

Merged
ValentaTomas merged 5 commits intomainfrom
feat/uffd-page-state-test-harness
Apr 21, 2026
Merged

test(uffd): switch cross-process helper to page-state snapshots#2472
ValentaTomas merged 5 commits intomainfrom
feat/uffd-page-state-test-harness

Conversation

@ValentaTomas
Copy link
Copy Markdown
Member

@ValentaTomas ValentaTomas commented Apr 21, 2026

Summary

Splits out the pure test-infra slice from #1896 so the REMOVE-event work
can land in smaller, reviewable pieces.

The cross-process userfaultfd test harness currently reports a flat list
of faulted offsets. REMOVE-event tests need to distinguish between
pages that were faulted, pages that were removed, and pages that were
never touched. This PR migrates the wire format and helper API to a
per-state snapshot of the handler's pageTracker so the follow-up PR
can plug in REMOVE handling without further harness changes.

  • cross_process_helpers_test.go:
    • New pageStateEntries() on Userfaultfd snapshots every tracked
      page + its state under settleRequests.
    • Emit one pageStateEntry{State uint8, Offset uint64} per page
      over the existing offsets pipe (binary-friendly, fixed-size
      records).
    • Replace offsetsOnce with pageStatesOnce, which decodes the new
      wire format into handlerPageStates.
  • helpers_test.go: introduce handlerPageStates with
    allAccessed(), which returns the sorted union of offsets the
    handler touched in any non-missing state. Only faulted is exposed
    for now; the follow-up adds more state-specific fields without
    touching call sites.
  • missing_test.go / missing_write_test.go: switch from
    h.offsetsOnce() + assert.ElementsMatch to
    h.pageStatesOnce().allAccessed() + assert.Equal (both sides are
    sorted, so the comparison is stricter).

No behavior change for MISSING-only handling. This is preparation only.

Follow-ups

  • feat(uffd): handle UFFD_EVENT_REMOVE in Serve — adds
    pageState removed, pageTracker.get(addr), the REMOVE batch
    handling in Serve, deferred-fault retry, gated pause/resume test
    knobs, and remove_test.go.
  • feat: wire freePageReporting through template / build / sandbox / FC
    (capstone that turns the feature on).

Splits #1896 as discussed there.

Test plan

  • go build ./...
  • go vet ./pkg/sandbox/uffd/userfaultfd/...
  • golangci-lint run ./pkg/sandbox/uffd/userfaultfd/...
  • CI: TestMissing* and TestParallelMissing* pass on the runner
    (userfaultfd requires privileged execution which isn't available
    on a dev laptop).

Prepare the userfaultfd test harness for upcoming REMOVE-event support
(#1896) by changing the cross-process
wire format from a flat list of faulted offsets to a per-state snapshot
of the handler's pageTracker.

- `pageTracker`: add a `removed` state constant and a `get(addr)` helper.
  The new state is not produced yet; the follow-up PR wires it up when
  we start handling `UFFD_EVENT_REMOVE`.
- `cross_process_helpers_test.go`:
  - Add `pageStateEntries()` on `Userfaultfd` that snapshots every
    tracked page + its state under `settleRequests`.
  - Emit one `pageStateEntry{State uint8, Offset uint64}` per page over
    the existing offsets pipe (binary-friendly, fixed-size records).
  - Replace `offsetsOnce` with `pageStatesOnce`, which decodes the new
    wire format into `handlerPageStates{faulted, removed}`.
- `helpers_test.go`: introduce `handlerPageStates` with an
  `allAccessed()` helper that returns the sorted union of offsets the
  handler touched in any non-missing state; swap the `testHandler` field
  accordingly.
- `missing_test.go` / `missing_write_test.go`: switch from
  `h.offsetsOnce()` + `assert.ElementsMatch` to
  `h.pageStatesOnce().allAccessed()` + `assert.Equal` (both sides are
  sorted, so the comparison is stricter).

No behavior change for MISSING-only handling; the refactor just lets
REMOVE tests reuse the same snapshot helper in the follow-up PR.
@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 21, 2026

PR Summary

Low Risk
Low risk: changes are confined to the userfaultfd cross-process test harness and assertions, with no impact on production fault-handling logic. Main risk is test flakiness/hangs if the new binary stream decoding blocks or if helper output ordering differs across runs.

Overview
Updates the cross-process userfaultfd test harness to request and decode a snapshot of the helper process’ tracked pages as (state, offset) entries (instead of a flat list of offsets), exposing this via pageStatesOnce()/handlerPageStates and adjusting MISSING read/write tests to assert against the sorted union of accessed offsets derived from these state snapshots.

Reviewed by Cursor Bugbot for commit ca80f8f. Bugbot is set up for automated code reviews on this repo. Configure here.

It's only needed by the REMOVE-event handler in the follow-up PR, so
keep it there to avoid the `unused` linter warning in this test-infra
slice.
Comment thread packages/orchestrator/pkg/sandbox/uffd/userfaultfd/helpers_test.go Outdated
It belongs with the REMOVE-event handler in the follow-up PR. handlerPageStates
keeps room for additional state-specific fields without touching call sites.
Drop the hand-rolled `1 + 8` frame size in `pageStatesOnce`. Reading
one `pageStateEntry` at a time with binary.Read keeps the decoder in
lockstep with the producer-side binary.Write — both use the same
fixed-size field layout, so the wire size is never hard-coded.
bitset.Set uses raw byte offsets as bit positions, so a single hugepage
offset (e.g. 7 * 2 MiB) would allocate ~1.8 MB of backing storage in a
test helper. pageStatesOnce already returns faulted sorted and each page
has a single state in pageTracker, so no dedup is needed. Follow-up PRs
adding more per-state fields should sorted-merge them here instead.
@ValentaTomas ValentaTomas enabled auto-merge (squash) April 21, 2026 20:18
@ValentaTomas ValentaTomas merged commit 17aca3c into main Apr 21, 2026
45 checks passed
@ValentaTomas ValentaTomas deleted the feat/uffd-page-state-test-harness branch April 21, 2026 21:49
ValentaTomas added a commit that referenced this pull request Apr 21, 2026
Resolves conflicts after #2472 (test harness split-out) merged into main:
- helpers_test.go: keep handlerPageStates with both faulted and removed
  fields, but adopt main's no-bitset approach via a sorted merge of the
  two already-sorted per-state slices. Carry forward servePause/serveResume
  hooks needed by the REMOVE event tests.
- cross_process_helpers_test.go: keep main's binary.Read decoding loop and
  re-add the case removed branch. Restore HEAD's gated pause/resume
  plumbing on the testHandler.
- userfaultfd.go / prefault.go: convert plain-string fmt.Errorf calls to
  errors.New (perfsprint lint introduced by main).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants