Address open review feedback from #7942#1
Closed
Talador12 wants to merge 18 commits into
Closed
Conversation
Add support for guest_memfd (available in Linux kernel v6.8+), which enables private memory for confidential VMs. Key changes: - Introduce UserMemoryRegion abstraction with guest_memfd fields - Add From impls between kvm_userspace_memory_region2 and UserMemoryRegion - Convert all KVM memory region operations from kvm_userspace_memory_region to kvm_userspace_memory_region2, with automatic fallback to v1 when guest_memfd is not supported - Add set_user_memory_region() wrapper that dispatches to v1/v2 based on kvm_guest_memfd_supported capability - Create guest_memfd via KVM_CREATE_GUEST_MEMFD ioctl when supported - Extend KvmDirtyLogSlot to preserve region2 fields across dirty log start/stop cycles This is prerequisite infrastructure for KVM-based confidential computing (SEV-SNP, TDX) that requires private guest memory backed by guest_memfd. Co-authored-by: Alex Orozco <aorozco@google.com> Signed-off-by: Keith Adler <kadler@cloudflare.com> Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Previously, the payload validation rejected an IGVM file combined with a kernel or firmware. Relax this constraint to allow an IGVM carrying a firmware (e.g Oak stage0) to be paired with a separate kernel image. This enables fw_cfg-style boot where stage0 loads a kernel provided through fw_cfg rather than embedded in the IGVM file itself. Signed-off-by: Ruben Hakobyan <hruben@meta.com>
When we use igvm + kvm, we setup the regs and sregs using the cpuid page. We still need to setup the fpu in configure_vcpu. Co-authored-by: Keith Adler <kadler@cloudflare.com> Signed-off-by: Keith Adler <kadler@cloudflare.com> Co-authored-by: Alex Orozco <aorozco@google.com> Signed-off-by: Alex Orozco <aorozco@google.com> Signed-off-by: Ruben Hakobyan <hruben@meta.com>
The SNP guest policy (AMD SEV-SNP ABI bits controlling SMT, migration, debug, etc.) was previously hardcoded inside the MSHV implementation. Widen Vm::sev_snp_init() to accept an SnpPolicy parameter so each hypervisor backend receives the policy at init time. Add get_default_sev_snp_guest_policy() in the VMM to construct the default policy. Co-authored-by: Keith Adler <kadler@cloudflare.com> Signed-off-by: Keith Adler <kadler@cloudflare.com> Co-authored-by: Alex Orozco <aorozco@google.com> Signed-off-by: Alex Orozco <aorozco@google.com> Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Change configure_system to take an option<GuestAddress> since rsdp is wrapped into an option anyways (we use configure system to setup the mptables). Co-authored-by: Keith Adler <kadler@cloudflare.com> Signed-off-by: Keith Adler <kadler@cloudflare.com> Co-authored-by: Alex Orozco <aorozco@google.com> Signed-off-by: Alex Orozco <aorozco@google.com> Signed-off-by: Ruben Hakobyan <hruben@meta.com>
The load_payload and load_payload_async functions previously received a sev_snp_enabled flag to decide whether to call load_igvm with or without the host_data parameter. Replace this with a single code path that always passes host_data behind a cfg(feature = "sev_snp") gate, removing the runtime branch and the extra parameter threaded through three call sites. Co-authored-by: Keith Adler <kadler@cloudflare.com> Signed-off-by: Keith Adler <kadler@cloudflare.com> Co-authored-by: Alex Orozco <aorozco@google.com> Signed-off-by: Alex Orozco <aorozco@google.com> Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Move IGVM file parsing from load_igvm() into a dedicated parse_igvm() helper in igvm/mod.rs, and parse the file upfront in Vm::new() so the resulting IgvmFile struct is available throughout VM initialization. This is a prerequisite for extracting VMSA SEV features from the parsed IGVM before issuing KVM_SEV_INIT2, which needs sev_features. Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Introduce the SevFd abstraction that wraps /dev/sev and implements the KVM_SEV_INIT2 and KVM_SEV_SNP_LAUNCH_START ioctls for SEV-SNP VM initialization on KVM. Key changes: - Add sev.rs with KvmSevInit and KvmSevSnpLaunchStart ioctl structs matching the kernel layout (linux/arch/x86/include/uapi/asm/kvm.h) - Implement KVM_SEV_INIT2 and KVM_SEV_SNP_LAUNCH_START ioctls - Set KVM_MEMORY_ATTRIBUTE_PRIVATE on newly created memory regions when guest_memfd is supported - Widen SevSnpPageAccessProxy cfg gates from mshv-only to all sev_snp-enabled builds - Make sev_snp_init a required trait method (remove default impl) - Include KVM_SEV_SNP_LAUNCH_START in the seccomp allowlist - Parse VMSA SEV features from IGVM and include them in the KVM_SEV_INIT2 ioctl Co-authored-by: Keith Adler <kadler@cloudflare.com> Signed-off-by: Keith Adler <kadler@cloudflare.com> Co-authored-by: Alex Orozco <aorozco@google.com> Signed-off-by: Alex Orozco <aorozco@google.com> Co-developed-by: Rob Bradford <rbradford@meta.com> Signed-off-by: Rob Bradford <rbradford@meta.com> Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Implement the KVM_SEV_SNP_LAUNCH_UPDATE ioctl. Extend Vm::import_isolated_pages() with a uaddrs parameter carrying host virtual addresses, which KVM needs, unlike MSHV. Compute uaddrs from guest memory mappings in the IGVM loader. Add KVM_SEV_SNP_LAUNCH_UPDATE to the seccomp allowlist. Co-authored-by: Keith Adler <kadler@cloudflare.com> Signed-off-by: Keith Adler <kadler@cloudflare.com> Co-authored-by: Alex Orozco <aorozco@google.com> Signed-off-by: Alex Orozco <aorozco@google.com> Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Add the KVM_SEV_SNP_LAUNCH_FINISH ioctl, which finalizes the SNP launch sequence and transitions the VM into a runnable encrypted state. Additionally, add KVM_SEV_SNP_LAUNCH_FINISH to the seccomp allowlist. Co-authored-by: Keith Adler <kadler@cloudflare.com> Signed-off-by: Keith Adler <kadler@cloudflare.com> Co-authored-by: Alex Orozco <aorozco@google.com> Signed-off-by: Alex Orozco <aorozco@google.com> Signed-off-by: Ruben Hakobyan <hruben@meta.com>
SEV-SNP guests will issue this hypercall to signal a change in the page encryption status to the hypervisor. Handle VcpuExit::Hypercall in the KVM vCPU run loop: decode the GPA, page count, and private/shared attribute from the hypercall arguments, then call KVM_SET_MEMORY_ATTRIBUTES to update the page state. Co-authored-by: Keith Adler <kadler@cloudflare.com> Signed-off-by: Keith Adler <kadler@cloudflare.com> Co-authored-by: Alex Orozco <aorozco@google.com> Signed-off-by: Alex Orozco <aorozco@google.com> Signed-off-by: Ruben Hakobyan <hruben@meta.com>
During SNP boot all guest RAM is initially marked KVM_MEMORY_ATTRIBUTE_PRIVATE. Pages imported via SNP_LAUNCH_UPDATE are properly accepted by the guest, but generic RAM pages (e.g. the AP trampoline at GPA 0xD000) are not. When stage0 on the BSP starts secondary vCPUs via x2APIC, the APs try to execute from the trampoline page through the shared mapping while KVM still has it marked private, causing a KVM_EXIT_MEMORY_FAULT (flags=KVM_MEMORY_EXIT_FLAG_PRIVATE) that previously fell through to the catch-all error, killing the VM. Handle VcpuExit::MemoryFault by toggling the page's memory attribute between private and shared based on the exit flags, allowing the vCPU to retry the access. Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Adapt the IGVM loader to work with both MSHV and KVM backends, which differ in page type constants, CPUID page layout, and VMSA handling. Abstract page types into a PageTypeConfig struct populated at runtime from the detected hypervisor, replacing hardcoded mshv_bindings constants. Apply the VMSA register state to each vCPU via setup_sev_snp_regs(), translating SevSelector attributes to KVM segment format using a bitfield decoder. KVM's SNP launch path sanitizes certain CPUID bits that could lead to an insecure guest. If the VMM sets these bits, KVM rejects the CPUID page import on the first attempt, requiring a retry with the firmware-corrected values. Pre-clear the known problematic bits before import to avoid the reject-and-retry cycle: - Leaf 0x1, ECX bit 24: TSC_DEADLINE (filtered by KVM) - Leaf 0x7, EBX bit 1: SGX (filtered by KVM) - Leaf 0x7, EDX: clear entirely (contains speculative features) - Leaf 0x80000008, EBX bit 25: filtered by KVM - Leaf 0x80000021, ECX: clear entirely This keeps the CPUID page stable across launch updates and avoids noisy error logs from the retry path. Co-authored-by: Keith Adler <kadler@cloudflare.com> Signed-off-by: Keith Adler <kadler@cloudflare.com> Co-authored-by: Alex Orozco <aorozco@google.com> Signed-off-by: Alex Orozco <aorozco@google.com> Co-authored-by: Dylan Reid <dgreid@fb.com> Signed-off-by: Dylan Reid <dgreid@fb.com> Signed-off-by: Ruben Hakobyan <hruben@meta.com>
A bootloader/firmware (e.g. stage0) and the VMSA page require dedicated memory regions at fixed GPAs. Add reserve_region_for_stage0() to allocate these regions before IGVM loading begins: - Stage0 at GPA 0xffc0_0000 (4 MB) - VMSA page at GPA 0xffff_ffff_f000 (4 KB) These reservations are KVM-only; MSHV handles stage0/VMSA placement through its own isolated import path. Also add fw_cfg device creation and SYS_statx to the vCPU seccomp allowlist (needed by stage0's file access pattern). Co-authored-by: Keith Adler <kadler@cloudflare.com> Signed-off-by: Keith Adler <kadler@cloudflare.com> Co-authored-by: Alex Orozco <aorozco@google.com> Signed-off-by: Alex Orozco <aorozco@google.com> Signed-off-by: Ruben Hakobyan <hruben@meta.com>
The Linux x86 boot protocol defines the setup area as (setup_sects + 1) * 512 bytes. Previously we exported only the boot_params buffer (4096 bytes), which is wrong for kernels with setup_sects >= 8 where the actual setup area exceeds boot_params. Truncate or extend the existing buffer to the correct setup_sects- derived length, reading any extra bytes directly from the kernel file. This avoids an extra allocation in the common case (setup_sects <= 7) and matches QEMU's fw_cfg_add_kernel() behavior in hw/i386/x86-common.c. Signed-off-by: Dylan Reid <dgreid@fb.com>
Boot-time block devices on PCI segment 0 use 32-bit BARs so early firmware can access them without additional identity mapping in the firmware page tables. However, hot-plugged block devices are only ever seen by the OS kernel which handles 64-bit BARs natively. Switch hot-plugged block devices to 64-bit BARs to avoid exhausting the scarce 32-bit MMIO window (typically 2-3 GB between RAM and 4 GB) when many devices are hot-plugged. Extract the BAR sizing decision into use_64bit_bar_for_virtio_device() and thread an is_hotplug flag through add_virtio_pci_device(). Add unit tests covering all relevant combinations. Signed-off-by: Dylan Reid <dgreid@fb.com>
Add build and clippy jobs for kvm+sev_snp+igvm+fw_cfg feature combination Signed-off-by: Keith Adler <kadler@cloudflare.com> Signed-off-by: Ruben Hakobyan <hruben@meta.com>
- igvm_loader: add comment explaining the hypervisor match is exhaustive at compile time via #[cfg]-gated HypervisorType variants - igvm/mod.rs: add TODO noting IsolationType::Snp is hardcoded and should be parameterized when TDX IGVM support lands - quality.yaml: add mshv + kvm + igvm clippy target to cover the #[cfg(all(feature = "igvm", feature = "mshv"))] code paths Signed-off-by: Keith Adler <kadler@cloudflare.com>
59d73c9 to
8ab8332
Compare
Owner
|
Hi @Talador12, thanks a lot for making the changes -- I have squashed the relevant parts into the upstream PR commits and they're now all been merged. Closing this PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Small fixes for open review threads on cloud-hypervisor#7942.
Changes
igvm_loader.rs: comment explaining the hypervisor match is exhaustive via#[cfg]-gated variants (CookieComputing thread)igvm/mod.rs: TODO for hardcodedIsolationType::Snp(CookieComputing thread)quality.yaml: addmshv + kvm + igvmclippy target for untested cfg paths (CookieComputing thread)