Skip to content

feat: skip unchanged checkpoints via in-guest eBPF inspector#2586

Open
void-main wants to merge 5 commits intoe2b-dev:mainfrom
void-main:feat/inspector-skip-unchanged
Open

feat: skip unchanged checkpoints via in-guest eBPF inspector#2586
void-main wants to merge 5 commits intoe2b-dev:mainfrom
void-main:feat/inspector-skip-unchanged

Conversation

@void-main
Copy link
Copy Markdown

Implements the proposal in #2580.

Adapts the Crab paper's "skip unchanged checkpoints" idea (arXiv 2604.28138) to e2b's Firecracker-microVM model. When the SDK calls POST /sandboxes/{id}/snapshots with the new opt-in skipIfUnchanged: true, the orchestrator consults an in-guest eBPF inspector inside envd and short-circuits to the prior snapshot's IDs (zero pause cost) when nothing recovery-relevant has changed. Default behavior is unchanged.

Why microVMs need a different approach than the paper

Crab's prototype uses runc + CRIU + ZFS containers with eBPF on the host kernel. Firecracker microVMs trap guest syscalls to the guest kernel, so host eBPF cannot see guest VFS, file paths, or PIDs. The paper itself cites e2b/Firecracker only as an expensive baseline in Table 1, not as a supported target.

This PR therefore:

  • Runs the eBPF inspector inside the guest (in envd).
  • Drops everything else from the paper that depends on host visibility (partial-component checkpoints, host-scoped scheduler, fast-forward replay, HTTPS MITM).
  • Keeps the part that pays for itself: the skip-vs-full decision.

Architecture

Three slices, all gated behind the inspector-skip-unchanged LaunchDarkly flag and a coordinated envd version bump (0.5.15 → 0.5.16):

Inspector (in-guest, in envd)

  • New Connect-RPC InspectorService with QueryChanges, ResetEpoch, Status.
  • Filesystem net-change tracker. eBPF program attached to 16 raw syscall tracepoints (openat w/ write flags, unlinkat, renameat2, write*, truncate*, mkdirat, linkat, symlinkat, fallocate, fchmodat, fchownat, openat2). Counter is incremented only when bpf_get_current_cgroup_id() matches a userspace-populated allowlist of envd's user-spawned cgroups.
  • Process / memory net-change tracker. Pure /proc reader: cgroup.procs membership delta + /proc/PID/pagemap soft-dirty bits on survivors, with /proc/PID/clear_refs reset at each epoch. Excludes envd's own PID.
  • Graceful degradation. If the BPF object can't load or soft-dirty isn't supported, QueryChanges reports degraded=true and the orchestrator falls through to a full pause. Correctness preserved at every step.
  • Build is gated by the inspector_bpf build tag so the envd binary still compiles cleanly on hosts without BPF tooling. Pre-compiled BPF objects are embedded via bpf2go; the envd binary itself stays pure-Go.

Orchestrator

  • SandboxCheckpointRequest.skip_if_unchanged and SandboxCheckpointResponse.{unchanged, published_build_id}.
  • Server.Checkpoint short-circuit branch before MarkStopping: opens a Connect-RPC client to the in-guest inspector (750ms cap), and on !degraded && !filesystem_changed && !processes_changed returns Unchanged: true, PublishedBuildId: <prior> immediately.
  • New per-server lastPublishedSnapshot map (in-memory, best-effort; restart forces the next call to fall through).
  • After successful full-path checkpoints, async Inspector.ResetEpoch so subsequent queries measure changes relative to the new baseline.
  • New OrchestratorInspectorCheckpointDecisions counter sliced by decision={skipped,fallthrough,full} for skip-rate dashboards.

API

  • POST /sandboxes/{sandboxID}/snapshots accepts skipIfUnchanged: bool and returns unchanged: bool on SnapshotInfo.
  • New sqlc query GetTemplateIDByBuildID looks up the env_id for the prior build when the orchestrator short-circuits, so the speculative new build can be marked failed and the prior template returned to the caller.

Failure modes (all preserve correctness)

  • Feature flag off → falls through to a full checkpoint.
  • envd version < 0.5.16 → falls through.
  • Inspector RPC error/timeout → falls through.
  • Inspector reports degraded=true → falls through.
  • No prior snapshot recorded for the sandbox → falls through.
  • Orchestrator restart → in-memory map cleared → next call falls through.

Rollout

  1. Feature flag default-off. Internal benchmarks on a read_file-heavy workload to measure actual skip rate. Crab reports 87% on Claude-code/Terminal-Bench; ours will differ because Firecracker pause cost and per-snapshot overhead are different.
  2. Default-on for opted-in templates after we observe meaningful skip rates with no recovery regressions.
  3. (Separate RFC, not in this PR) Partial checkpoints — memory-only/rootfs-only artifacts paired into a versioned manifest. That's significantly bigger surgery in Snapshot/upload paths; deferred.

Verification

  • go vet clean across envd, orchestrator, api, shared, db, integration tests, with and without -tags inspector_bpf.
  • envd unit tests: scaffold contract tests + a real-kernel soft-dirty test (skipped if CONFIG_MEM_SOFT_DIRTY is missing).
  • orchestrator: lastSnapshotMap test.
  • Integration tests in tests/integration/.../snapshot_skip_unchanged_test.go:
    • Default omitted field preserves historical behavior.
    • First call against a fresh sandbox cannot short-circuit (orchestrator-logic property; holds even with the inspector loaded).
    • After a real file write between two skip-eligible calls, the second never returns unchanged=true and produces a fresh template id.
    • Wire-shape lock: skipIfUnchanged=false is identical to omitting it.

Tests are permissive on the outcome of the inspector decision (unchanged=true vs false) because the in-guest tracker is only active when envd was built with -tags inspector_bpf and the guest kernel supports BPF + soft-dirty. We assert structural guarantees that hold regardless of tracker state.

Asks for the maintainers

  • Confirmation that e2b's microVM kernel ships with CONFIG_DEBUG_INFO_BTF, CONFIG_KPROBES, CONFIG_MEM_SOFT_DIRTY, CONFIG_BPF_SYSCALL. If not, the inspector ships in degraded mode (no win, no regression) until the kernel is rebuilt.
  • Greenlight on the build pipeline change: developers running make generate/envd need clang≥10 + the vendored libbpf headers in this PR. The compiled .o is checked in, so CI building the envd binary doesn't need clang.
  • Greenlight on the LaunchDarkly flag name (inspector-skip-unchanged) and the envd version-bump pattern.

Closes #2580.

Test plan

  • Build envd with -tags inspector_bpf, confirm Status RPC reports bpf_loaded=true, soft_dirty_supported=true.
  • Build envd without the tag, confirm Status reports degraded_reason="fs tracker disabled...".
  • Run tests/integration/.../snapshot_skip_unchanged_test.go against a self-hosted stack.
  • Sanity-check the new metric: e2b dashboard should show orchestrator_inspector_checkpoint_decisions_total{decision=...} after running the integration tests.
  • Verify make test and make lint are clean across all packages.

🤖 Generated with Claude Code

void-main and others added 5 commits May 6, 2026 14:40
Lands the plumbing for issue e2b-dev#2580: an in-guest InspectorService that
the orchestrator will consult on POST /sandboxes/{id}/snapshots when
the request carries skip_if_unchanged=true. Trackers (filesystem +
process) ship in follow-ups.

This change is fully no-op at runtime:
- QueryChanges always returns degraded=true, so any caller falls
  through to the existing always-pause checkpoint path.
- The orchestrator does not yet consult the inspector; that wiring
  lands with the short-circuit branch.

What's in:
- proto: InspectorService { QueryChanges, ResetEpoch, Status }.
- envd: scaffold service mounted alongside process / filesystem.
- envd version bump 0.5.15 -> 0.5.16; gated via new
  CheckEnvdVersionForInspector helper.
- featureflags: InspectorSkipUnchangedFlag (default false).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the filesystem half of the in-guest Inspector. The BPF program
counts modifying syscalls (openat with write/create flags, unlinkat,
renameat2, write, pwrite64, writev, pwritev2, truncate, ftruncate,
mkdirat, linkat, symlinkat, fallocate, fchmodat, fchownat, openat2)
issued by processes in user-spawned cgroups. The userspace tracker
exposes Query / Reset and feeds the Inspector's QueryChanges RPC.

Build is gated by the `inspector_bpf` build tag — without it (or on
non-linux) the package compiles with a stub tracker that always
reports degraded=true. With the tag, the BPF object embedded by
bpf2go is loaded at envd startup; failures (kernel too old, BPF
denied) leave the tracker in degraded mode, which the orchestrator
already treats as "fall through to a full checkpoint". No correctness
regression.

Notes:
- libbpf headers v1.4 vendored under bpf/headers/; CO-RE not used to
  keep the kernel-config bar low (raw syscall tracepoints have stable
  argument layout per Documentation/trace/ftrace.rst).
- bpf2go output is checked in. `go generate` requires clang>=10 +
  the vendored headers; the envd binary build itself stays pure-Go.
- Process / memory tracking lands separately (PR 3).
- Cgroup filter starts empty; envd's process service will populate
  it via Inspector.AddTrackedCgroup once the wiring lands in PR 3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the second half of the in-guest Inspector. The new procTracker:
- Reads cgroup.procs from envd's three v2 cgroups
  (user/ptys/socats) for membership delta detection.
- Resets soft-dirty bits via /proc/PID/clear_refs at every epoch.
- Walks /proc/PID/pagemap on survivors, looking for any bit-55 set
  (per Documentation/admin-guide/mm/soft-dirty.rst); short-circuits
  on the first dirty page.
- Excludes envd's own PID, otherwise its always-dirty heap forces
  every turn to be reported as changed.

Wiring:
- New Config struct passed via Handle(...). main.go computes the
  three cgroup paths from cgroupRoot and the existing
  cgroups.ProcessType{User,...} constants.
- QueryChanges now combines fs and process signals; either reporting
  changed surfaces in the response.
- Status RPC reports SoftDirtySupported and BTFPresent for telemetry.

Tests:
- Membership delta path uses a hand-rolled cgroup.procs file in a
  temp dir. No root needed.
- Soft-dirty path spawns a real child that writes 512 KiB after the
  reset and asserts the tracker fires. Skipped if the kernel lacks
  CONFIG_MEM_SOFT_DIRTY.

Correctness contract preserved: any tracker returning ok=false
propagates as degraded=true at the RPC layer, and the orchestrator
falls through to a full checkpoint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the orchestrator-side short-circuit and the public API plumbing.
With InspectorSkipUnchangedFlag on, a snapshots POST that carries
skipIfUnchanged=true now consults the in-guest InspectorService and
returns the previous snapshot's IDs (with unchanged=true) when nothing
recovery-relevant has changed since then.

orchestrator.proto:
- SandboxCheckpointRequest.skip_if_unchanged
- SandboxCheckpointResponse.{unchanged, published_build_id}

Server.Checkpoint:
- Branch BEFORE MarkStopping: open Connect-RPC client to envd's
  InspectorService, call QueryChanges (750ms cap), short-circuit on
  ok && !degraded && !filesystem_changed && !processes_changed.
- After successful full path: record buildID in lastPublishedSnapshot
  and call ResetEpoch on the resumed sandbox (best-effort, async).
- New per-Server lastPublishedSnapshot map (in-memory; restart forces
  the next call to fall through).

API:
- spec/openapi.yml: skipIfUnchanged on the request, unchanged on the
  response. oapi-codegen output regenerated.
- snapshot_template handler forwards body.SkipIfUnchanged into opts
  and surfaces result.Unchanged in the SnapshotInfo response.
- orchestrator.CreateSnapshotTemplate: when the orchestrator returns
  unchanged=true, mark the speculative new build as failed, look up
  the env_id that owns the prior build via a new sqlc query
  (GetTemplateIDByBuildID), and return that prior template+build to
  the caller.

Failure modes preserve correctness:
- feature flag off, envd version too old, inspector RPC error or
  timeout, no prior snapshot recorded, degraded inspector — all fall
  through to the existing always-pause path.
- cilium/ebpf was already an indirect dep (v0.21.0); no new tools
  required for the build pipeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds observability and end-to-end coverage for the skipIfUnchanged
checkpoint path landed in the previous PRs.

Metric:
- New CounterType InspectorCheckpointDecisions
  ("orchestrator.inspector.checkpoint.decisions"), sliced by
  decision={skipped,fallthrough,full}. Wired into Server.Checkpoint:
  every call increments "full" exactly once; skip-eligible calls
  also bump "skipped" or "fallthrough" depending on the inspector
  result. Builds the skip-rate dashboard out of the box.

Spans:
- "inspector.query" wraps the QueryChanges RPC.
- The parent "sandbox-checkpoint" span gets attribute
  "checkpoint.short_circuit=true" when the request is answered
  without pausing.

Integration tests (tests/integration/.../snapshot_skip_unchanged_test.go):
- DefaultOmittedField    — regression guard: omitting skipIfUnchanged
                            preserves historical behavior.
- FirstCallNeverSkips     — orchestrator logic property, holds even
                            with the inspector loaded.
- TwoCallsAfterWrite      — write between calls -> never unchanged.
- RoundTripFieldShape     — locks SnapshotInfo.unchanged as *bool
                            and skipIfUnchanged=false ≡ omitted.

Tests are permissive on the *outcome* of the inspector decision
(unchanged=true vs false) because the in-guest tracker is only
active when envd was built with -tags inspector_bpf and the guest
kernel supports BPF + soft-dirty. We only assert structural
guarantees that hold regardless of tracker state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented May 7, 2026

We require contributors to sign our Contributor License Agreement, and we don't have @void-main on file. You can sign our CLA at https://e2b.dev/docs/cla . Once you've signed, post a comment here that says '@cla-bot check'

@qodo-code-review
Copy link
Copy Markdown

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The scanVMASoftDirty function in proc_tracker_linux.go hardcodes pageSize to 4096 bytes. This logic will fail on Linux kernels configured with larger page sizes, such as 16KB or 64KB common on ARM64, leading to incorrect pagemap offsets and potentially missed dirty pages. The system page size should be retrieved at runtime using os.Getpagesize() to ensure correctness across all supported architectures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC: skip unchanged checkpoints via in-guest eBPF inspector

2 participants