Skip to content

Address open review feedback from #7942#1

Closed
Talador12 wants to merge 18 commits into
rhakobyan:kvm_sev_snp_revampedfrom
Talador12:7942-review-fixes
Closed

Address open review feedback from #7942#1
Talador12 wants to merge 18 commits into
rhakobyan:kvm_sev_snp_revampedfrom
Talador12:7942-review-fixes

Conversation

@Talador12
Copy link
Copy Markdown

Summary

Small fixes for open review threads on cloud-hypervisor#7942.

Changes

  • igvm_loader.rs: comment explaining the hypervisor match is exhaustive via #[cfg]-gated variants (CookieComputing thread)
  • igvm/mod.rs: TODO for hardcoded IsolationType::Snp (CookieComputing thread)
  • quality.yaml: add mshv + kvm + igvm clippy target for untested cfg paths (CookieComputing thread)

Talador12 and others added 18 commits April 10, 2026 12:16
Add support for guest_memfd (available in Linux kernel v6.8+), which
enables private memory for confidential VMs.

Key changes:
- Introduce UserMemoryRegion abstraction with guest_memfd fields
- Add From impls between kvm_userspace_memory_region2 and UserMemoryRegion
- Convert all KVM memory region operations from kvm_userspace_memory_region
  to kvm_userspace_memory_region2, with automatic fallback to v1 when
  guest_memfd is not supported
- Add set_user_memory_region() wrapper that dispatches to v1/v2 based on
  kvm_guest_memfd_supported capability
- Create guest_memfd via KVM_CREATE_GUEST_MEMFD ioctl when supported
- Extend KvmDirtyLogSlot to preserve region2 fields across dirty log
  start/stop cycles

This is prerequisite infrastructure for KVM-based confidential computing
(SEV-SNP, TDX) that requires private guest memory backed by guest_memfd.

Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Previously, the payload validation rejected an IGVM file combined with
a kernel or firmware.

Relax this constraint to allow an IGVM carrying a firmware (e.g Oak
stage0) to be paired with a separate kernel image.

This enables fw_cfg-style boot where stage0 loads a kernel provided
through fw_cfg rather than embedded in the IGVM file itself.

Signed-off-by: Ruben Hakobyan <hruben@meta.com>
When we use igvm + kvm, we setup the regs and sregs using the cpuid
page. We still need to setup the fpu in configure_vcpu.

Co-authored-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
The SNP guest policy (AMD SEV-SNP ABI bits controlling SMT, migration,
debug, etc.) was previously hardcoded inside the MSHV implementation.
Widen Vm::sev_snp_init() to accept an SnpPolicy parameter so each
hypervisor backend receives the policy at init time.

Add get_default_sev_snp_guest_policy() in the VMM to construct the
default policy.

Co-authored-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Change configure_system to take an option<GuestAddress>
since rsdp is wrapped into an option anyways (we use configure
system to setup the mptables).

Co-authored-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
The load_payload and load_payload_async functions previously received a
sev_snp_enabled flag to decide whether to call load_igvm with or
without the host_data parameter. Replace this with a single code path
that always passes host_data behind a cfg(feature = "sev_snp") gate,
removing the runtime branch and the extra parameter threaded through
three call sites.

Co-authored-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Move IGVM file parsing from load_igvm() into a dedicated parse_igvm()
helper in igvm/mod.rs, and parse the file upfront in Vm::new() so the
resulting IgvmFile struct is available throughout VM initialization.

This is a prerequisite for extracting VMSA SEV features from the parsed
IGVM before issuing KVM_SEV_INIT2, which needs sev_features.

Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Introduce the SevFd abstraction that wraps /dev/sev and implements the
KVM_SEV_INIT2 and KVM_SEV_SNP_LAUNCH_START ioctls for SEV-SNP VM
initialization on KVM.

Key changes:
- Add sev.rs with KvmSevInit and KvmSevSnpLaunchStart ioctl structs
  matching the kernel layout (linux/arch/x86/include/uapi/asm/kvm.h)
- Implement KVM_SEV_INIT2 and KVM_SEV_SNP_LAUNCH_START ioctls
- Set KVM_MEMORY_ATTRIBUTE_PRIVATE on newly created memory regions
  when guest_memfd is supported
- Widen SevSnpPageAccessProxy cfg gates from mshv-only to all
  sev_snp-enabled builds
- Make sev_snp_init a required trait method (remove default impl)
- Include KVM_SEV_SNP_LAUNCH_START in the seccomp allowlist
- Parse VMSA SEV features from IGVM and include them in the
  KVM_SEV_INIT2 ioctl

Co-authored-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Alex Orozco <aorozco@google.com>
Co-developed-by: Rob Bradford <rbradford@meta.com>
Signed-off-by: Rob Bradford <rbradford@meta.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Implement the KVM_SEV_SNP_LAUNCH_UPDATE ioctl.

Extend Vm::import_isolated_pages() with a uaddrs parameter carrying
host virtual addresses, which KVM needs, unlike MSHV. Compute uaddrs
from guest memory mappings in the IGVM loader.

Add KVM_SEV_SNP_LAUNCH_UPDATE to the seccomp allowlist.

Co-authored-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Add the KVM_SEV_SNP_LAUNCH_FINISH ioctl, which finalizes the SNP
launch sequence and transitions the VM into a runnable encrypted
state.

Additionally, add KVM_SEV_SNP_LAUNCH_FINISH to the seccomp allowlist.

Co-authored-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
SEV-SNP guests will issue this hypercall to signal a change in the page
encryption status to the hypervisor.

Handle VcpuExit::Hypercall in the KVM vCPU run loop: decode the GPA,
page count, and private/shared attribute from the hypercall arguments,
then call KVM_SET_MEMORY_ATTRIBUTES to update the page state.

Co-authored-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
During SNP boot all guest RAM is initially marked
KVM_MEMORY_ATTRIBUTE_PRIVATE. Pages imported via SNP_LAUNCH_UPDATE are
properly accepted by the guest, but generic RAM pages (e.g. the AP
trampoline at GPA 0xD000) are not. When stage0 on the BSP starts
secondary vCPUs via x2APIC, the APs try to execute from the trampoline
page through the shared mapping while KVM still has it marked private,
causing a KVM_EXIT_MEMORY_FAULT (flags=KVM_MEMORY_EXIT_FLAG_PRIVATE)
that previously fell through to the catch-all error, killing the VM.

Handle VcpuExit::MemoryFault by toggling the page's memory attribute
between private and shared based on the exit flags, allowing the vCPU
to retry the access.

Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Adapt the IGVM loader to work with both MSHV and KVM backends, which
differ in page type constants, CPUID page layout, and VMSA handling.

Abstract page types into a PageTypeConfig struct populated at runtime
from the detected hypervisor, replacing hardcoded mshv_bindings constants.

Apply the VMSA register state to each vCPU via setup_sev_snp_regs(),
translating SevSelector attributes to KVM segment format using a bitfield
decoder.

KVM's SNP launch path sanitizes certain CPUID bits that could lead to
an insecure guest. If the VMM sets these bits, KVM rejects the CPUID
page import on the first attempt, requiring a retry with the
firmware-corrected values.

Pre-clear the known problematic bits before import to avoid the
reject-and-retry cycle:

- Leaf 0x1, ECX bit 24: TSC_DEADLINE (filtered by KVM)
- Leaf 0x7, EBX bit 1: SGX (filtered by KVM)
- Leaf 0x7, EDX: clear entirely (contains speculative features)
- Leaf 0x80000008, EBX bit 25: filtered by KVM
- Leaf 0x80000021, ECX: clear entirely

This keeps the CPUID page stable across launch updates and avoids
noisy error logs from the retry path.

Co-authored-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Alex Orozco <aorozco@google.com>
Co-authored-by: Dylan Reid <dgreid@fb.com>
Signed-off-by: Dylan Reid <dgreid@fb.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
A bootloader/firmware (e.g. stage0) and the VMSA page require dedicated
memory regions at fixed GPAs.

Add reserve_region_for_stage0() to allocate these regions before IGVM
loading begins:
- Stage0 at GPA 0xffc0_0000 (4 MB)
- VMSA page at GPA 0xffff_ffff_f000 (4 KB)

These reservations are KVM-only; MSHV handles stage0/VMSA placement
through its own isolated import path.

Also add fw_cfg device creation and SYS_statx to the vCPU seccomp
allowlist (needed by stage0's file access pattern).

Co-authored-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
The Linux x86 boot protocol defines the setup area as
(setup_sects + 1) * 512 bytes. Previously we exported only the
boot_params buffer (4096 bytes), which is wrong for kernels with
setup_sects >= 8 where the actual setup area exceeds boot_params.

Truncate or extend the existing buffer to the correct setup_sects-
derived length, reading any extra bytes directly from the kernel file.
This avoids an extra allocation in the common case (setup_sects <= 7)
and matches QEMU's fw_cfg_add_kernel() behavior in
hw/i386/x86-common.c.

Signed-off-by: Dylan Reid <dgreid@fb.com>
Boot-time block devices on PCI segment 0 use 32-bit BARs so early
firmware can access them without additional identity mapping in the
firmware page tables. However, hot-plugged block devices are only ever
seen by the OS kernel which handles 64-bit BARs natively.

Switch hot-plugged block devices to 64-bit BARs to avoid exhausting the
scarce 32-bit MMIO window (typically 2-3 GB between RAM and 4 GB) when
many devices are hot-plugged.

Extract the BAR sizing decision into use_64bit_bar_for_virtio_device()
and thread an is_hotplug flag through add_virtio_pci_device(). Add unit
tests covering all relevant combinations.

Signed-off-by: Dylan Reid <dgreid@fb.com>
Add build and clippy jobs for kvm+sev_snp+igvm+fw_cfg feature combination

Signed-off-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
- igvm_loader: add comment explaining the hypervisor match is exhaustive
  at compile time via #[cfg]-gated HypervisorType variants
- igvm/mod.rs: add TODO noting IsolationType::Snp is hardcoded and
  should be parameterized when TDX IGVM support lands
- quality.yaml: add mshv + kvm + igvm clippy target to cover the
  #[cfg(all(feature = "igvm", feature = "mshv"))] code paths

Signed-off-by: Keith Adler <kadler@cloudflare.com>
@rhakobyan
Copy link
Copy Markdown
Owner

Hi @Talador12, thanks a lot for making the changes -- I have squashed the relevant parts into the upstream PR commits and they're now all been merged. Closing this PR.

@rhakobyan rhakobyan closed this Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants