Skip to content

docs: correct memory-sharing model from #632 measurements#666

Merged
ejc3 merged 2 commits into
mainfrom
work-632-sharing-docs
Jun 12, 2026
Merged

docs: correct memory-sharing model from #632 measurements#666
ejc3 merged 2 commits into
mainfrom
work-632-sharing-docs

Conversation

@ejc3

@ejc3 ejc3 commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Doc/comment-only corrections driven by the measured experiments in #632 (no code behavior change).

Summary

  • track_dirty_pages does NOT erode page-cache sharing — measured in [epic] Hypervisor-agnostic abstraction: support Firecracker AND Cloud Hypervisor (the two microVMs) behind a pluggable trait #632: 3× 1GiB File-backend clones ≈ 230MiB total PSS with tracking ON or OFF (<1MiB apart). Corrected the RestoreParams comment (src/commands/common.rs) and the --no-dirty-tracking help text (src/cli/args.rs), both of which claimed tracking defeats sharing. The flag's real costs: KVM logging overhead + 2MB→4K hugepage split.
  • The UFFD serve/clone path is lazy population, not cross-VM sharing — guest RAM is MAP_ANONYMOUS and UFFDIO_COPY makes a private per-VM copy of each faulted page (verified against the firecracker fork's persist.rs/vstate/memory.rs). Fixed .claude/CLAUDE.md UFFD sections, the --pid help text, and cmd_snapshot_run doc/output wording that said "memory sharing".
  • docs/hypervisor-abstraction.md refreshed with measured results: split shared_memory_clones into file_backed_cow_restore / external_uffd_lazy_restore / internal_uffd_lazy_restore; recorded the CH v52 ARM64 snapshot blocker (GetAarchCoreRegister EINVAL) and CH boot requirements (console=ttyAMA0, image_type=raw, --cpus boot=, P0.5 as guest-channel prerequisite); softened the "byte-for-byte identical vsock / one GuestChannel impl" claim to "same design, live CONNECT parity unverified pending P0.5".

Second commit addresses local codex review findings (stale wording that contradicted the corrected model).

Test Results

$ cargo fmt --check          # clean
$ cargo clippy --all-targets -- -D warnings   # clean

Comment/doc-only; measurements backing the claims: #632 (comment)

Measurements in #632 (3x 1GiB File-backend clones, smaps_rollup) disproved
two long-standing claims:

- track_dirty_pages does NOT erode page-cache sharing: ON vs OFF differed
  by <1MiB total PSS (230MiB for 3 clones either way). The mapping is
  MAP_PRIVATE in both modes; tracking's real costs are KVM logging overhead
  and the 2MB->4K hugepage split. Fixed the RestoreParams comment in
  commands/common.rs and the --no-dirty-tracking help text in cli/args.rs,
  which both claimed tracking defeats sharing.

- The UFFD serve/clone path is lazy population, NOT cross-VM page sharing:
  guest RAM is MAP_ANONYMOUS and UFFDIO_COPY makes a private per-VM copy of
  each faulted page. Fixed the .claude/CLAUDE.md UFFD sections that claimed
  kernel page-cache sharing across UFFD clones.

Also refreshed docs/hypervisor-abstraction.md with the measured results:
split shared_memory_clones into file_backed_cow_restore /
external_uffd_lazy_restore / internal_uffd_lazy_restore, recorded the CH
v52 ARM64 snapshot blocker (GetAarchCoreRegister EINVAL) and the CH boot
requirements (ttyAMA0, image_type=raw, --cpus boot=, P0.5 prerequisite for
guest channels).

Comment/doc-only change; no code behavior altered.

Tested: cargo fmt --check, cargo clippy --all-targets -D warnings (clean)
@claude-claude

claude-claude Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

🔍 Claude Review

SEVERITY: none

Findings

No issues found. This is a well-crafted documentation-only PR that corrects two previously incorrect claims based on actual measurements from #632:

  1. UFFD sharing correction (.claude/CLAUDE.md): Correctly updates the UFFD clone description from "kernel page cache sharing across UFFD clones" to "lazy population with per-VM copies (UFFDIO_COPY)", and distinguishes File-backend CoW sharing (which does work via MAP_PRIVATE page cache) from UFFD behavior.

  2. Dirty tracking correction (src/cli/args.rs, src/commands/common.rs): Removes the incorrect claim that track_dirty_pages defeats page-cache sharing. The updated comments correctly note sharing works in both modes (measured: <1MiB PSS difference).

  3. Design doc refresh (docs/hypervisor-abstraction.md): Splits the monolithic shared_memory_clones capability into three precise capabilities (file_backed_cow_restore / external_uffd_lazy_restore / internal_uffd_lazy_restore), adds measured results, records CH v52 ARM64 snapshot blocker, and documents P1 boot requirements. The capability table, struct definition, hard-problems section, phased plan, and open research items are all internally consistent.

All four changed files are comment/doc-only — no code behavior altered, confirmed by diff inspection.

Summary

Clean documentation correction grounded in empirical measurements. No code, consistency, or accuracy issues found.


Review by Claude | Run

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 767ff6bd05

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

pub diff_snapshots: bool, // FC: true, CH: false
pub file_backed_cow_restore: bool, // FC: true (measured, #632), CH: unverified (snapshot blocked)
pub external_uffd_lazy_restore: bool, // FC: true (fork handshake w/ page_size), CH: false
pub internal_uffd_lazy_restore: bool, // FC: n/a, CH: ondemand (v52; unverified on ARM64)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate CH ondemand restore until ARM64 snapshots work

For the ARM64 scope this design is documenting, CH v52 cannot create snapshots (GetAarchCoreRegister EINVAL is recorded later in this same file), so advertising internal_uffd_lazy_restore as a CH capability here can lead the implementation/test matrix to enable a restore path that is known to fail before restore even starts. Please make this capability false or explicitly arch/version-gated until snapshot creation works on ARM64.

Useful? React with 👍 / 👎.

Comment thread .claude/CLAUDE.md
Comment on lines +1214 to +1215
- UFFD clones populate lazily (faulted pages are per-VM copies); File-backend
clones share clean pages via the page cache (measured in #632)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove the stale one-copy RAM claim

This section now correctly says UFFD faults become per-VM copies, but the next bullet still claims an original plus two clones use only one VM's RAM (~512MB). For UFFD clones that fault the same working set, memory can grow toward one copy per VM (plus the server's cached snapshot), so agents reading this status note can still repeat the old sharing assumption and under-size clone experiments; scope that number to an unfaulted/lazy case or replace it with the measured File-backend result.

Useful? React with 👍 / 👎.

Local codex review of the previous commit flagged wording that still
contradicted the corrected model:
- args.rs --pid help and snapshot.rs doc/output called the UFFD path
  'memory sharing' — now 'lazy on-demand paging' / 'served on-demand'
- CLAUDE.md clone RAM claim qualified as idle/working-set behavior
- design doc no longer claims byte-for-byte vsock identity / a proven
  single GuestChannel impl — live CONNECT parity is unverified pending
  P0.5 (matches the open research items)

Tested: cargo fmt --check, cargo clippy --all-targets -D warnings (clean)
@ejc3 ejc3 merged commit bc6da9a into main Jun 12, 2026
6 of 17 checks passed
@ejc3 ejc3 deleted the work-632-sharing-docs branch June 12, 2026 06:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant