docs: correct memory-sharing model from #632 measurements#666
Conversation
Measurements in #632 (3x 1GiB File-backend clones, smaps_rollup) disproved two long-standing claims: - track_dirty_pages does NOT erode page-cache sharing: ON vs OFF differed by <1MiB total PSS (230MiB for 3 clones either way). The mapping is MAP_PRIVATE in both modes; tracking's real costs are KVM logging overhead and the 2MB->4K hugepage split. Fixed the RestoreParams comment in commands/common.rs and the --no-dirty-tracking help text in cli/args.rs, which both claimed tracking defeats sharing. - The UFFD serve/clone path is lazy population, NOT cross-VM page sharing: guest RAM is MAP_ANONYMOUS and UFFDIO_COPY makes a private per-VM copy of each faulted page. Fixed the .claude/CLAUDE.md UFFD sections that claimed kernel page-cache sharing across UFFD clones. Also refreshed docs/hypervisor-abstraction.md with the measured results: split shared_memory_clones into file_backed_cow_restore / external_uffd_lazy_restore / internal_uffd_lazy_restore, recorded the CH v52 ARM64 snapshot blocker (GetAarchCoreRegister EINVAL) and the CH boot requirements (ttyAMA0, image_type=raw, --cpus boot=, P0.5 prerequisite for guest channels). Comment/doc-only change; no code behavior altered. Tested: cargo fmt --check, cargo clippy --all-targets -D warnings (clean)
🔍 Claude ReviewSEVERITY: none FindingsNo issues found. This is a well-crafted documentation-only PR that corrects two previously incorrect claims based on actual measurements from #632:
All four changed files are comment/doc-only — no code behavior altered, confirmed by diff inspection. SummaryClean documentation correction grounded in empirical measurements. No code, consistency, or accuracy issues found. Review by Claude | Run |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 767ff6bd05
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pub diff_snapshots: bool, // FC: true, CH: false | ||
| pub file_backed_cow_restore: bool, // FC: true (measured, #632), CH: unverified (snapshot blocked) | ||
| pub external_uffd_lazy_restore: bool, // FC: true (fork handshake w/ page_size), CH: false | ||
| pub internal_uffd_lazy_restore: bool, // FC: n/a, CH: ondemand (v52; unverified on ARM64) |
There was a problem hiding this comment.
Gate CH ondemand restore until ARM64 snapshots work
For the ARM64 scope this design is documenting, CH v52 cannot create snapshots (GetAarchCoreRegister EINVAL is recorded later in this same file), so advertising internal_uffd_lazy_restore as a CH capability here can lead the implementation/test matrix to enable a restore path that is known to fail before restore even starts. Please make this capability false or explicitly arch/version-gated until snapshot creation works on ARM64.
Useful? React with 👍 / 👎.
| - UFFD clones populate lazily (faulted pages are per-VM copies); File-backend | ||
| clones share clean pages via the page cache (measured in #632) |
There was a problem hiding this comment.
Remove the stale one-copy RAM claim
This section now correctly says UFFD faults become per-VM copies, but the next bullet still claims an original plus two clones use only one VM's RAM (~512MB). For UFFD clones that fault the same working set, memory can grow toward one copy per VM (plus the server's cached snapshot), so agents reading this status note can still repeat the old sharing assumption and under-size clone experiments; scope that number to an unfaulted/lazy case or replace it with the measured File-backend result.
Useful? React with 👍 / 👎.
Local codex review of the previous commit flagged wording that still contradicted the corrected model: - args.rs --pid help and snapshot.rs doc/output called the UFFD path 'memory sharing' — now 'lazy on-demand paging' / 'served on-demand' - CLAUDE.md clone RAM claim qualified as idle/working-set behavior - design doc no longer claims byte-for-byte vsock identity / a proven single GuestChannel impl — live CONNECT parity is unverified pending P0.5 (matches the open research items) Tested: cargo fmt --check, cargo clippy --all-targets -D warnings (clean)
Doc/comment-only corrections driven by the measured experiments in #632 (no code behavior change).
Summary
track_dirty_pagesdoes NOT erode page-cache sharing — measured in [epic] Hypervisor-agnostic abstraction: support Firecracker AND Cloud Hypervisor (the two microVMs) behind a pluggable trait #632: 3× 1GiB File-backend clones ≈ 230MiB total PSS with tracking ON or OFF (<1MiB apart). Corrected theRestoreParamscomment (src/commands/common.rs) and the--no-dirty-trackinghelp text (src/cli/args.rs), both of which claimed tracking defeats sharing. The flag's real costs: KVM logging overhead + 2MB→4K hugepage split.persist.rs/vstate/memory.rs). Fixed.claude/CLAUDE.mdUFFD sections, the--pidhelp text, andcmd_snapshot_rundoc/output wording that said "memory sharing".docs/hypervisor-abstraction.mdrefreshed with measured results: splitshared_memory_clonesintofile_backed_cow_restore/external_uffd_lazy_restore/internal_uffd_lazy_restore; recorded the CH v52 ARM64 snapshot blocker (GetAarchCoreRegisterEINVAL) and CH boot requirements (console=ttyAMA0,image_type=raw,--cpus boot=, P0.5 as guest-channel prerequisite); softened the "byte-for-byte identical vsock / one GuestChannel impl" claim to "same design, live CONNECT parity unverified pending P0.5".Second commit addresses local codex review findings (stale wording that contradicted the corrected model).
Test Results
Comment/doc-only; measurements backing the claims: #632 (comment)