feat: skip unchanged checkpoints via in-guest eBPF inspector#2586
Open
void-main wants to merge 5 commits intoe2b-dev:mainfrom
Open
feat: skip unchanged checkpoints via in-guest eBPF inspector#2586void-main wants to merge 5 commits intoe2b-dev:mainfrom
void-main wants to merge 5 commits intoe2b-dev:mainfrom
Conversation
Lands the plumbing for issue e2b-dev#2580: an in-guest InspectorService that the orchestrator will consult on POST /sandboxes/{id}/snapshots when the request carries skip_if_unchanged=true. Trackers (filesystem + process) ship in follow-ups. This change is fully no-op at runtime: - QueryChanges always returns degraded=true, so any caller falls through to the existing always-pause checkpoint path. - The orchestrator does not yet consult the inspector; that wiring lands with the short-circuit branch. What's in: - proto: InspectorService { QueryChanges, ResetEpoch, Status }. - envd: scaffold service mounted alongside process / filesystem. - envd version bump 0.5.15 -> 0.5.16; gated via new CheckEnvdVersionForInspector helper. - featureflags: InspectorSkipUnchangedFlag (default false). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the filesystem half of the in-guest Inspector. The BPF program counts modifying syscalls (openat with write/create flags, unlinkat, renameat2, write, pwrite64, writev, pwritev2, truncate, ftruncate, mkdirat, linkat, symlinkat, fallocate, fchmodat, fchownat, openat2) issued by processes in user-spawned cgroups. The userspace tracker exposes Query / Reset and feeds the Inspector's QueryChanges RPC. Build is gated by the `inspector_bpf` build tag — without it (or on non-linux) the package compiles with a stub tracker that always reports degraded=true. With the tag, the BPF object embedded by bpf2go is loaded at envd startup; failures (kernel too old, BPF denied) leave the tracker in degraded mode, which the orchestrator already treats as "fall through to a full checkpoint". No correctness regression. Notes: - libbpf headers v1.4 vendored under bpf/headers/; CO-RE not used to keep the kernel-config bar low (raw syscall tracepoints have stable argument layout per Documentation/trace/ftrace.rst). - bpf2go output is checked in. `go generate` requires clang>=10 + the vendored headers; the envd binary build itself stays pure-Go. - Process / memory tracking lands separately (PR 3). - Cgroup filter starts empty; envd's process service will populate it via Inspector.AddTrackedCgroup once the wiring lands in PR 3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the second half of the in-guest Inspector. The new procTracker:
- Reads cgroup.procs from envd's three v2 cgroups
(user/ptys/socats) for membership delta detection.
- Resets soft-dirty bits via /proc/PID/clear_refs at every epoch.
- Walks /proc/PID/pagemap on survivors, looking for any bit-55 set
(per Documentation/admin-guide/mm/soft-dirty.rst); short-circuits
on the first dirty page.
- Excludes envd's own PID, otherwise its always-dirty heap forces
every turn to be reported as changed.
Wiring:
- New Config struct passed via Handle(...). main.go computes the
three cgroup paths from cgroupRoot and the existing
cgroups.ProcessType{User,...} constants.
- QueryChanges now combines fs and process signals; either reporting
changed surfaces in the response.
- Status RPC reports SoftDirtySupported and BTFPresent for telemetry.
Tests:
- Membership delta path uses a hand-rolled cgroup.procs file in a
temp dir. No root needed.
- Soft-dirty path spawns a real child that writes 512 KiB after the
reset and asserts the tracker fires. Skipped if the kernel lacks
CONFIG_MEM_SOFT_DIRTY.
Correctness contract preserved: any tracker returning ok=false
propagates as degraded=true at the RPC layer, and the orchestrator
falls through to a full checkpoint.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the orchestrator-side short-circuit and the public API plumbing.
With InspectorSkipUnchangedFlag on, a snapshots POST that carries
skipIfUnchanged=true now consults the in-guest InspectorService and
returns the previous snapshot's IDs (with unchanged=true) when nothing
recovery-relevant has changed since then.
orchestrator.proto:
- SandboxCheckpointRequest.skip_if_unchanged
- SandboxCheckpointResponse.{unchanged, published_build_id}
Server.Checkpoint:
- Branch BEFORE MarkStopping: open Connect-RPC client to envd's
InspectorService, call QueryChanges (750ms cap), short-circuit on
ok && !degraded && !filesystem_changed && !processes_changed.
- After successful full path: record buildID in lastPublishedSnapshot
and call ResetEpoch on the resumed sandbox (best-effort, async).
- New per-Server lastPublishedSnapshot map (in-memory; restart forces
the next call to fall through).
API:
- spec/openapi.yml: skipIfUnchanged on the request, unchanged on the
response. oapi-codegen output regenerated.
- snapshot_template handler forwards body.SkipIfUnchanged into opts
and surfaces result.Unchanged in the SnapshotInfo response.
- orchestrator.CreateSnapshotTemplate: when the orchestrator returns
unchanged=true, mark the speculative new build as failed, look up
the env_id that owns the prior build via a new sqlc query
(GetTemplateIDByBuildID), and return that prior template+build to
the caller.
Failure modes preserve correctness:
- feature flag off, envd version too old, inspector RPC error or
timeout, no prior snapshot recorded, degraded inspector — all fall
through to the existing always-pause path.
- cilium/ebpf was already an indirect dep (v0.21.0); no new tools
required for the build pipeline.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds observability and end-to-end coverage for the skipIfUnchanged
checkpoint path landed in the previous PRs.
Metric:
- New CounterType InspectorCheckpointDecisions
("orchestrator.inspector.checkpoint.decisions"), sliced by
decision={skipped,fallthrough,full}. Wired into Server.Checkpoint:
every call increments "full" exactly once; skip-eligible calls
also bump "skipped" or "fallthrough" depending on the inspector
result. Builds the skip-rate dashboard out of the box.
Spans:
- "inspector.query" wraps the QueryChanges RPC.
- The parent "sandbox-checkpoint" span gets attribute
"checkpoint.short_circuit=true" when the request is answered
without pausing.
Integration tests (tests/integration/.../snapshot_skip_unchanged_test.go):
- DefaultOmittedField — regression guard: omitting skipIfUnchanged
preserves historical behavior.
- FirstCallNeverSkips — orchestrator logic property, holds even
with the inspector loaded.
- TwoCallsAfterWrite — write between calls -> never unchanged.
- RoundTripFieldShape — locks SnapshotInfo.unchanged as *bool
and skipIfUnchanged=false ≡ omitted.
Tests are permissive on the *outcome* of the inspector decision
(unchanged=true vs false) because the in-guest tracker is only
active when envd was built with -tags inspector_bpf and the guest
kernel supports BPF + soft-dirty. We only assert structural
guarantees that hold regardless of tracker state.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
We require contributors to sign our Contributor License Agreement, and we don't have @void-main on file. You can sign our CLA at https://e2b.dev/docs/cla . Once you've signed, post a comment here that says '@cla-bot check' |
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
There was a problem hiding this comment.
Code Review
The scanVMASoftDirty function in proc_tracker_linux.go hardcodes pageSize to 4096 bytes. This logic will fail on Linux kernels configured with larger page sizes, such as 16KB or 64KB common on ARM64, leading to incorrect pagemap offsets and potentially missed dirty pages. The system page size should be retrieved at runtime using os.Getpagesize() to ensure correctness across all supported architectures.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the proposal in #2580.
Adapts the Crab paper's "skip unchanged checkpoints" idea (arXiv 2604.28138) to e2b's Firecracker-microVM model. When the SDK calls
POST /sandboxes/{id}/snapshotswith the new opt-inskipIfUnchanged: true, the orchestrator consults an in-guest eBPF inspector insideenvdand short-circuits to the prior snapshot's IDs (zero pause cost) when nothing recovery-relevant has changed. Default behavior is unchanged.Why microVMs need a different approach than the paper
Crab's prototype uses runc + CRIU + ZFS containers with eBPF on the host kernel. Firecracker microVMs trap guest syscalls to the guest kernel, so host eBPF cannot see guest VFS, file paths, or PIDs. The paper itself cites e2b/Firecracker only as an expensive baseline in Table 1, not as a supported target.
This PR therefore:
envd).Architecture
Three slices, all gated behind the
inspector-skip-unchangedLaunchDarkly flag and a coordinated envd version bump (0.5.15 → 0.5.16):Inspector (in-guest, in
envd)InspectorServicewithQueryChanges,ResetEpoch,Status.openatw/ write flags,unlinkat,renameat2,write*,truncate*,mkdirat,linkat,symlinkat,fallocate,fchmodat,fchownat,openat2). Counter is incremented only whenbpf_get_current_cgroup_id()matches a userspace-populated allowlist of envd's user-spawned cgroups./proc/PID/pagemapsoft-dirty bits on survivors, with/proc/PID/clear_refsreset at each epoch. Excludes envd's own PID.QueryChangesreportsdegraded=trueand the orchestrator falls through to a full pause. Correctness preserved at every step.inspector_bpfbuild tag so the envd binary still compiles cleanly on hosts without BPF tooling. Pre-compiled BPF objects are embedded viabpf2go; the envd binary itself stays pure-Go.Orchestrator
SandboxCheckpointRequest.skip_if_unchangedandSandboxCheckpointResponse.{unchanged, published_build_id}.Server.Checkpointshort-circuit branch beforeMarkStopping: opens a Connect-RPC client to the in-guest inspector (750ms cap), and on!degraded && !filesystem_changed && !processes_changedreturnsUnchanged: true, PublishedBuildId: <prior>immediately.lastPublishedSnapshotmap (in-memory, best-effort; restart forces the next call to fall through).Inspector.ResetEpochso subsequent queries measure changes relative to the new baseline.OrchestratorInspectorCheckpointDecisionscounter sliced bydecision={skipped,fallthrough,full}for skip-rate dashboards.API
POST /sandboxes/{sandboxID}/snapshotsacceptsskipIfUnchanged: booland returnsunchanged: boolonSnapshotInfo.GetTemplateIDByBuildIDlooks up the env_id for the prior build when the orchestrator short-circuits, so the speculative new build can be marked failed and the prior template returned to the caller.Failure modes (all preserve correctness)
degraded=true→ falls through.Rollout
read_file-heavy workload to measure actual skip rate. Crab reports 87% on Claude-code/Terminal-Bench; ours will differ because Firecracker pause cost and per-snapshot overhead are different.Snapshot/upload paths; deferred.Verification
go vetclean across envd, orchestrator, api, shared, db, integration tests, with and without-tags inspector_bpf.CONFIG_MEM_SOFT_DIRTYis missing).lastSnapshotMaptest.tests/integration/.../snapshot_skip_unchanged_test.go:unchanged=trueand produces a fresh template id.skipIfUnchanged=falseis identical to omitting it.Tests are permissive on the outcome of the inspector decision (
unchanged=truevsfalse) because the in-guest tracker is only active when envd was built with-tags inspector_bpfand the guest kernel supports BPF + soft-dirty. We assert structural guarantees that hold regardless of tracker state.Asks for the maintainers
CONFIG_DEBUG_INFO_BTF,CONFIG_KPROBES,CONFIG_MEM_SOFT_DIRTY,CONFIG_BPF_SYSCALL. If not, the inspector ships in degraded mode (no win, no regression) until the kernel is rebuilt.make generate/envdneed clang≥10 + the vendored libbpf headers in this PR. The compiled.ois checked in, so CI building the envd binary doesn't need clang.inspector-skip-unchanged) and the envd version-bump pattern.Closes #2580.
Test plan
-tags inspector_bpf, confirmStatusRPC reportsbpf_loaded=true, soft_dirty_supported=true.Statusreportsdegraded_reason="fs tracker disabled...".tests/integration/.../snapshot_skip_unchanged_test.goagainst a self-hosted stack.e2b dashboardshould showorchestrator_inspector_checkpoint_decisions_total{decision=...}after running the integration tests.make testandmake lintare clean across all packages.🤖 Generated with Claude Code