Skip to content

[wip] emulated SMMU#3458

Draft
jstarks wants to merge 28 commits into
microsoft:mainfrom
jstarks:smmu
Draft

[wip] emulated SMMU#3458
jstarks wants to merge 28 commits into
microsoft:mainfrom
jstarks:smmu

Conversation

@jstarks
Copy link
Copy Markdown
Member

@jstarks jstarks commented May 11, 2026

No description provided.

jstarks added 28 commits May 8, 2026 09:44
Replace the GICv2m MSI controller with KVM's in-kernel GICv3 ITS for
aarch64 PCIe MSI/MSI-X delivery. GICv2m maps MSI writes to a fixed pool
of 64 SPIs, which doesn't scale (a single NVMe device with 128 queues
exhausts it) and is incompatible with the ITS-based device ID model
needed for future SMMU support. The ITS routes MSIs via LPIs using
(DeviceID, EventID) lookup, supporting thousands of interrupt vectors
across all devices.

KVM provides a complete in-kernel ITS (KVM_DEV_TYPE_ARM_VGIC_ITS) that
handles all guest MMIO and command queue processing. The VMM creates the
device, sets its base address, and initializes it. For emulated devices,
MSIs are injected via KVM_SIGNAL_MSI with KVM_MSI_VALID_DEVID. For irqfd
(VFIO passthrough), the kvm_irq_routing_msi entry carries the devid so
the kernel signals the ITS directly.

The main design challenge is that PCIe devices don't know their own
requester ID (bus/device/function), since bus numbers are assigned
dynamically by guest firmware. This is solved with a per-device
AssignedBusRange that the PCIe port updates atomically when the guest
programs secondary/subordinate bus numbers. ITS wrappers (ItsSignalMsi,
ItsIrqFd) compose the full 32-bit device ID as (segment << 16 | BDF) at
interrupt delivery time, transparent to the devices themselves.

The SignalMsi trait changes from `rid: u32` (always passed as 0) to
`devid: Option<u32>`, and IrqFdRoute::enable gains a matching parameter.
This is a mechanical change across all backends (KVM, WHP, MSHV, HVF).

Also adds ACPI IORT (IO Remapping Table) generation for aarch64, with
ITS Group and PCI Root Complex nodes with ID mappings. The MADT gains a
GIC ITS entry. DeviceTree generation emits an ITS child node under the
GIC when ITS is configured, with msi-parent on PCIe host bridges
pointing to the ITS phandle instead of v2m.

ITS support is probed at KVM init time via KVM_CREATE_DEVICE_TEST,
falling back to GICv2m on kernels or hardware without ITS. A --gic-msi
CLI option (auto/its/v2m) allows overriding the default selection.
GICv2m remains available for GICv2-only configurations.
Add the smmu crate with spec-derived type definitions for the Arm SMMUv3
architecture (IHI 0070). This is the foundation for the SMMU emulator —
all subsequent sub-phases import these types.

The spec module contains:
- registers: MMIO register offsets and bitfield types (IDR0-5, CR0/CR0ACK,
  STRTAB_BASE, CMDQ/EVTQ base/prod/cons, IRQ_CTRL, GERROR, MSI config)
- commands: command queue entry format and per-command bitfield types
  (CFGI_STE, CFGI_CD, TLBI_*, CMD_SYNC with MSI completion)
- events: event queue entry with typed header/flags/address fields and
  convenience constructors for fault events
- ste: stream table entry with typed quadword fields (SteDw0, SteDw1)
  and SteConfig/S1Fmt/Strw enums
- cd: context descriptor with typed quadword fields (CdDw0, CdDw1)
  and Tg0/Ips enums with helper methods
- pt: AArch64 VMSAv8 stage 1 page table descriptor with AP/shareability
  enums and output address extraction for all granule sizes

All types use zerocopy derives for safe guest memory access and
bitfield_struct for field-level access. 65 unit tests verify bitfield
round-trips, spec constant values, and address encoding.
Unknown IPS or TG0 values from the guest are malformed CD entries.
Returning a silent default would cause incorrect translation. Return
Option instead so callers can generate C_BAD_CD fault events.
Implement SmmuDevice as a ChipsetDevice with MmioIntercept for the SMMUv3
register file. The device emulates a 128KB MMIO region (page 0 + page 1)
with all registers needed for the Linux SMMUv3 driver's hw_probe and
enable sequence.

IDR registers report: S1P, AArch64 TTF, COHACC, ASID16, MSI, LE endian,
linear stream table, 16-bit SIDSIZE, 40-bit OAS, 4K granule, SMMUv3.3.

CR0/CR0ACK echo protocol with immediate acknowledge. IRQ_CTRL/IRQ_CTRLACK
echo. GBPA with UPDATE bit auto-clear. GERROR/GERRORN toggle protocol.
Stream table base, command queue base/prod/cons, event queue base/prod/cons,
and per-queue MSI config registers all readable and writable.

15 unit tests covering IDR readback, CR0 enable sequence, STRTAB_BASE
round-trip, IRQ_CTRL ACK, GBPA update bit, page 1 access, read-only
register write rejection, MSI config registers, and invalid access sizes.
Implement CMDQ consumption: when the guest writes CMDQ_PROD, the emulator
processes all pending commands up to PROD, advancing CONS. Supported
commands: PREFETCH_CFG, CFGI_STE, CFGI_STE_RANGE, CFGI_CD, CFGI_CD_ALL,
TLBI_NH_ALL, TLBI_NH_ASID, TLBI_NH_VA, TLBI_NH_VAA, TLBI_S12_VMALL,
TLBI_NSNH_ALL (all no-ops for now), and CMD_SYNC.

CMD_SYNC with CS=SIG_IRQ writes MSI data to the MSI address in guest
memory, which is how Linux detects command completion. Unknown opcodes
trigger CERROR_ILL in CMDQ_CONS and toggle GERROR.CMDQ_ERR, halting
further processing until the guest acknowledges the error.

7 new unit tests (87 total) covering basic consumption, MSI sync writes,
queue wrapping, unknown opcode error handling, the Linux reset sequence,
error-stops-processing semantics, and disabled-CMDQ behavior.
Implement EVTQ write logic: write_event() appends a 32-byte event record
to the guest's event queue, advances EVTQ_PROD, and fires the EVTQ MSI
interrupt if enabled via IRQ_CTRL.EVENTQ_IRQEN. When the queue is full
(PROD and CONS differ only in the wrap bit), toggles GERROR.EVTQ_ABT_ERR
and drops the event.

EVTQ_CONS on page 1 is updated by the guest to signal consumed events,
freeing queue space.

4 new unit tests (91 total) covering event write and read-back, MSI
signaling, queue full behavior, and CONS freeing space.
Implement sub-phases 1E and 1F of the SMMU emulator plan:

1E: Stream table and context descriptor parsing
- lookup_ste(): reads STE from guest memory, validates V bit, checks SID range
- ste_config_action(): dispatches on STE.Config (abort/bypass/S1 translate)
- lookup_cd(): reads CD from guest memory, validates V and AA64 bits
- translation_context(): extracts page table parameters (TTB0, T0SZ, TG0, IPS)

1F: AArch64 VMSAv8 stage 1 page table walker
- walk_s1(): walks 4K-granule page tables from TTB0 through up to 4 levels
- Supports block descriptors (1GB at L1, 2MB at L2) and page descriptors (L3)
- Permission checking (AP bits for write access)
- Access flag checking (AF=0 produces F_ACCESS fault)
- Output address size checking against IPS/OAS
- Input IOVA range checking against T0SZ

14 new unit tests (120 total).
Add FromBytes, IntoBytes, Immutable, KnownLayout derives to PtDesc so
it can be used directly with GuestMemory::read_plain() instead of reading
a raw u64 and converting. Also fix a manual_div_ceil clippy lint.
Store IDR0, IDR1, IDR5 as their proper bitfield types instead of raw u32,
and add Inspect derives to those types. Replace the bespoke
inspect_u32_hex/inspect_u64_hex helpers with #[inspect(hex)] for the
remaining raw integer fields.
Add SmmuSharedState, SmmuTranslatingMemory, and SmmuSignalMsi for
per-device IOVA translation through the SMMUv3 emulator.

SmmuSharedState holds the SMMU configuration (stream table base, enable
state) behind an RwLock, allowing concurrent translations (read path)
while register writes are exclusive. SmmuDevice creates and owns the
shared state, syncing CR0.SMMUEN and STRTAB_BASE/CFG changes to it.

SmmuTranslatingMemory implements GuestMemoryAccess with mapping()=None,
routing all reads/writes through the SMMU page table walker. It derives
the stream ID at translation time from AssignedBusRange (shared with the
ITS wrappers and PCIe port). Page-crossing accesses are split at 4K
boundaries with independent translations per page.

SmmuSignalMsi translates the MSI address (which may be an IOVA when Linux
maps doorbells via iommu_dma_prepare_msi) before forwarding to the inner
SignalMsi target. Device identity (devid) is passed through unchanged.

The factory method create_device_context() produces paired (GuestMemory,
SmmuSignalMsi) wrappers for each PCI device behind the SMMU.

14 new tests covering translated read/write, page crossing, bypass,
abort, unmapped fault, unassigned bus, disabled SMMU, MSI translation,
MSI bypass, MSI fault, and devid passthrough. 134 total tests pass.
Add IortSmmuV3 struct to acpi_spec for the IORT SMMUv3 node (type 0x04,
revision 4). Extend the IORT builder to optionally insert an SMMUv3 node
between PCI root complexes and the ITS group when smmu_base is configured
on AcpiTablesBuilder.

When SMMU is present: RC → SMMUv3 → ITS Group. The SMMU node has COHACC
set, generic model, zero GSIVs (MSI mode). Each RC's ID mapping targets
the SMMUv3 node with (segment << 16) output_base. The SMMUv3 node's ID
mapping targets the ITS group with a full identity map.

When SMMU is not configured, the existing RC → ITS Group path is unchanged.

Add DEFAULT_SMMU_BASE (0xEFFA_0000) address constant below the ITS region.

Four new IORT tests: smmu+its topology, multi-RC with SMMU, no-SMMU
regression, and SMMUv3 node field verification.
Move smmu_base from AcpiTablesBuilder (arch-neutral) into
AcpiArchConfig::Aarch64 as smmu_bases: Vec<u64>. This correctly
scopes SMMU configuration to aarch64 only, and supports multiple
SMMU instances (each with its own MMIO base). The IORT builder
creates one SMMUv3 node per entry.

Currently all root complexes map to the first SMMU. Per-RC SMMU
assignment can be added when needed.

x86 construction sites no longer need to specify smmu_base: None.
Replace smmu_bases: Vec<u64> with smmu_base: Option<u64> in
AcpiArchConfig::Aarch64. Multiple SMMUs can be added when per-RC
SMMU assignment is actually needed.
Add --smmu CLI flag and wire SmmuDevice into the aarch64 chipset.
When enabled, each PCIe device gets SmmuTranslatingMemory for DMA
and SmmuSignalMsi for MSI address translation. The IORT table
includes the SMMUv3 node in the RC→SMMUv3→ITS chain.

SmmuDevice gets ChangeDeviceState and SaveRestore (not-supported)
impls required by VmmChipsetDevice.
…ase 1J.1)

Add a comprehensive integration-style unit test that exercises the complete
SMMU stack through the MMIO interface, mimicking the Linux SMMUv3 driver
initialization sequence:

1. Probe: read IDR registers, verify feature bits
2. Reset: disable SMMU, program CR1, stream table, CMDQ/EVTQ, enable
   each subsystem in sequence (CMDQEN → EVTQEN → SMMUEN)
3. Command queue: issue CFGI_ALL, TLBI_NSNH_ALL, CFGI_STE, CFGI_CD,
   each followed by CMD_SYNC with MSI completion signaling
4. Attach: configure STE (S1_TRANS) and CD in guest memory, build a
   3-level AArch64 4K page table hierarchy
5. DMA: read/write through SmmuTranslatingMemory at translated IOVAs
6. MSI: fire MSI through SmmuSignalMsi with IOVA-mapped doorbell page,
   verify address translation with intra-page offset
7. Fault: access unmapped IOVA, verify translation fault event in EVTQ
   with correct event type, stream ID, and faulting address
A guest can program CMDQ_BASE.LOG2SIZE and EVTQ_BASE.LOG2SIZE with
values larger than what IDR1.CMDQS/EVENTQS advertise. Without bounds
checking, this allows a malicious guest to force the SMMU to iterate
over an excessively large command queue on each CMDQ_PROD write.

Clamp the effective log2size to the IDR1-advertised maximum. This
matches the CONSTRAINED UNPREDICTABLE behavior real hardware uses
for out-of-range queue sizes.
Hot-plugged PCIe devices were bypassing the SMMU, receiving raw
GuestMemory and only ITS-wrapped SignalMsi. Since the IORT advertises
the root complex behind the SMMU, the guest programs IOVA mappings for
hot-plugged devices too, and DMA with untranslated IOVAs would fail.

Store the Arc<SmmuSharedState> in LoadedVmInner (previously it was a
local variable used only during boot) and use it in the AddPcieDevice
handler to wrap the device's GuestMemory and SignalMsi with
SmmuTranslatingMemory and SmmuSignalMsi, matching the static device
path.
Switch the SMMU from MSI-based interrupt delivery (IDR0.MSI=1) to wired
SPI interrupts (IDR0.MSI=0), matching QEMU's approach. The previous MSI
implementation used guest_memory.write_at() which silently fails for MMIO
addresses like ITS doorbells, so EVTQ and GERROR interrupts were never
delivered.

With wired SPIs, the SMMU device gets LineInterrupt objects for EVTQ
(SPI 35) and GERROR (SPI 36) from the chipset builder, and pulses them
directly. CMD_SYNC completion continues to use the guest RAM polling
path (MSIWrite), which is the mechanism Linux uses when IDR0.MSI=0.

The IORT SMMUv3 node now carries populated GSIVs for the event and
gerror interrupts. The device_id_mapping with DEVICEID_VALID is retained
for ITS configurations because Linux's IORT MSI domain resolution
requires it for the RC-to-SMMUv3-to-ITS node traversal.

Also add SMMUv3 to the aarch64 device tree (FDT) for non-ACPI boot.
The DT node uses the arm,smmu-v3 compatible string with eventq/gerror
interrupt entries, and PCIe host bridge nodes get iommu-map properties
linking RIDs to SMMU stream IDs.
When a device behind the SMMU programs its MSI-X table, the MSI address
is an IOVA (the guest's IOMMU driver maps the doorbell page into the
device's IOVA space). The irqfd path programs this address directly into
the kernel's MSI routing table, bypassing SMMU translation entirely.
This means the kernel route would be configured with an untranslated
IOVA instead of the physical GIC/ITS address.

Add SmmuIrqFd and SmmuIrqFdRoute wrappers that translate the MSI address
through the SMMU page tables on IrqFdRoute::enable(), before forwarding
to the inner irqfd route (which may itself be an ITS wrapper). The
composition order is SmmuIrqFd(ItsIrqFd(partition.irqfd())), matching
the userspace SmmuSignalMsi(ItsSignalMsi(...)) chain.
…pping

Phase 2 foundation: decouple SMMU stream IDs from PCI segment numbers
and make translation stage support configurable.

- Add SmmuFeatures struct with s1_supported/s2_supported flags that
  control IDR0.S1P and IDR0.S2P advertisement to the guest. This
  allows creating S2-only SMMUs for VFIO scenarios.

- Replace segment: u16 with stream_id_base: u32 in all per-device
  SMMU wrappers (SmmuTranslatingMemory, SmmuSignalMsi, SmmuIrqFd,
  SmmuIrqFdRoute). The SMMU-local stream ID is now computed as
  stream_id_base + bdf, where stream_id_base comes from the IORT
  ID_MAPPING output_base for the root complex. This decouples SMMU
  stream table indexing from PCI segment numbers, enabling multiple
  root complexes to map into different regions of a single SMMU's
  stream table.

- Add AssignedBusRange::compose_stream_id() method for SMMU-specific
  stream ID composition (parallel to compose_device_id for ITS).

- Remove the segment==0 restriction in dispatch.rs — all segments
  behind the SMMU now get translation wrappers, not just segment 0.

- Add tests for non-zero stream_id_base translation, IDR0 feature
  bit configuration (S1-only, S1+S2, S2-only), and compose_stream_id.
Replace the single --smmu bool with a repeatable --smmu <rc-name> CLI
arg. Each invocation creates an SMMU instance covering the named PCIe
root complex. The internal config uses SmmuInstanceConfig with rc_name
instead of segment, and the dispatch wiring builds per-port SMMU lookup
maps from the root complex topology.

IORT generation emits one SMMUv3 node per instance. Each root complex's
ID mapping points to its SMMU (if configured) or directly to the ITS.
Device tree generation similarly emits per-SMMU nodes with per-RC
iommu-map entries.

The hotplug path uses pcie_rc_names (parallel to pcie_host_bridges) to
look up the SMMU shared state for dynamically added devices.

SPI allocation for SMMU interrupts: instance N uses vectors (3+N*2) and
(4+N*2), giving each SMMU its own event queue and global error SPIs.
VFIO assigned devices bypass the emulated SMMU's stage 1 page tables
because the host IOMMU uses its own tables (VFIO type1 identity mapping).
Until iommufd nested translation is available, block VFIO devices on root
complexes covered by an S1-capable SMMU at configuration time with a
clear diagnostic.

A set of port names behind S1-capable SMMUs is built during chipset
construction. Before resolving each PCIe device, the resource ID is
checked — if it is "vfio" and the port is in the S1 set, an error is
returned advising the user to move the device, use S2-only mode, or
enable iommufd.
Copilot AI review requested due to automatic review settings May 11, 2026 21:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds support for an emulated SMMUv3 on aarch64 and updates PCIe MSI routing to support GICv3 ITS (device-id based routing) in addition to GICv2m.

Changes:

  • Introduces SMMUv3 emulation (spec types + translation logic) and plumbs per-device bus-range identity to support ITS/SMMU requester/device ID composition.
  • Adds ACPI IORT generation (and DT iommu-map) for PCIe interrupt/DMA remapping; adds MADT ITS entry and backend ITS capability detection (KVM).
  • Updates MSI/irqfd plumbing to carry an optional device identity (devid) end-to-end.

Reviewed changes

Copilot reviewed 70 out of 71 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
vmm_core/vmotherboard/src/lib.rs Re-exports PCIe bus-range identity type for consumers.
vmm_core/vmotherboard/src/chipset/builder/mod.rs Threads optional PCIe device identity through builder registrations.
vmm_core/vmotherboard/src/chipset/backing/arc_mutex/services.rs Extends PCIe registration API to accept optional device identity.
vmm_core/vmotherboard/src/chipset/backing/arc_mutex/pci.rs Stores and forwards optional device identity during PCIe bus resolution.
vmm_core/vmotherboard/src/chipset/backing/arc_mutex/device.rs Adds builder hook to attach PCIe bus-range identity to devices.
vmm_core/vmotherboard/src/base_chipset.rs Forwards optional device identity into PCIe enumerator device attach.
vmm_core/virt_whp/src/synic.rs Adapts SignalMsi signature to optional device identity.
vmm_core/virt_whp/src/lib.rs Switches to topology-provided MSI controller config; advertises ITS support=false.
vmm_core/virt_whp/src/device.rs Adapts SignalMsi signature to optional device identity.
vmm_core/virt_mshv/src/x86_64/mod.rs Adapts SignalMsi signature to optional device identity.
vmm_core/virt_mshv/src/irqfd.rs Adapts IrqFdRoute::enable signature to accept optional device identity.
vmm_core/virt_mshv/src/aarch64/mod.rs Adapts MSI signaling for new SignalMsi API; advertises ITS support=false.
vmm_core/virt_kvm/src/lib.rs Stores MSI controller config and ITS device FD; prepares KVM backend for ITS.
vmm_core/virt_kvm/src/gsi.rs Plumbs optional devid into KVM irq routing builder path.
vmm_core/virt_kvm/src/arch/x86_64/mod.rs Sets devid=None for x86 MSI routes; adapts SignalMsi signature.
vmm_core/virt_kvm/src/arch/aarch64/mod.rs Probes ITS support, creates in-kernel ITS, adds ITS irqfd/MSI routing support.
vmm_core/virt_hvf/src/lib.rs Advertises ITS support=false.
vmm_core/virt/src/x86/apic_software_device.rs Adapts MSI forwarding to new SignalMsi API.
vmm_core/virt/src/generic.rs Extends PlatformInfo with ITS capability and adapts SignalMsi signature.
vmm_core/virt/src/aarch64/gic_v2m.rs Adapts SignalMsi signature to optional device identity.
vmm_core/virt/src/aarch64/gic_software_device.rs Adapts SignalMsi signature to optional device identity.
vmm_core/src/device_builder.rs Accepts per-device bus-range identity and passes into PCIe device builder.
vmm_core/src/acpi_builder.rs Adds IORT construction + SMMU config, MADT ITS entries, and extensive tests.
vm/vmcore/vm_topology/src/processor/aarch64.rs Replaces gic_v2m with gic_msi controller enum (None/V2m/Its).
vm/vmcore/src/irqfd.rs Extends irqfd route enable API with optional device identity.
vm/kvm/src/lib.rs Adds MSI route devid support and propagates flags into KVM irq routing.
vm/devices/virtio/virtio/src/transport/core.rs Forces access_platform feature bit for virtio devices behind an IOMMU.
vm/devices/user_driver_emulated_mock/src/lib.rs Updates MSI controller mock to ignore device identity.
vm/devices/storage/nvme_test/src/tests/test_helpers.rs Updates MSI test helper to new SignalMsi signature.
vm/devices/storage/nvme/src/tests/test_helpers.rs Updates MSI test helper to new SignalMsi signature.
vm/devices/pci/vpci/src/test_helpers/mod.rs Updates MSI test helper to new SignalMsi signature.
vm/devices/pci/pcie/src/switch.rs Uses port-side-effecting cfg write path; plumbs optional bus-range identity.
vm/devices/pci/pcie/src/root.rs Plumbs optional bus-range identity, ensures port tracks bus-range on cfg writes, adds tests.
vm/devices/pci/pcie/src/port.rs Adds shared assigned-bus-range tracking and cfg-write side effects.
vm/devices/pci/pcie/src/lib.rs Exposes new bus_range + its modules.
vm/devices/pci/pcie/src/its.rs Adds ITS wrappers for SignalMsi and IrqFd that inject device IDs.
vm/devices/pci/pcie/src/bus_range.rs Adds shared atomic bus-range tracking and device/stream ID composition helpers.
vm/devices/pci/pcie/fuzz/fuzz_pcie.rs Updates fuzz harness for new PCIe add-device signature.
vm/devices/pci/pcie/Cargo.toml Adds pal_event dependency for irqfd route wrapper event access.
vm/devices/pci/pci_core/src/test_helpers/mod.rs Updates MSI test helper to new SignalMsi signature.
vm/devices/pci/pci_core/src/msi.rs Updates SignalMsi API; adds route/target helpers to pass optional device identity.
vm/devices/pci/pci_core/src/capabilities/msix.rs Updates MSI-X delivery to new MsiTarget API.
vm/devices/iommu/smmu/src/translate.rs Adds SMMUv3 STE/CD lookup and stage-1 page table walker + tests.
vm/devices/iommu/smmu/src/spec/ste.rs Adds SMMUv3 STE layout/types + tests.
vm/devices/iommu/smmu/src/spec/registers.rs Adds SMMUv3 register offsets/bitfields + tests.
vm/devices/iommu/smmu/src/spec/pt.rs Adds AArch64 stage-1 page table descriptor helpers + tests.
vm/devices/iommu/smmu/src/spec/mod.rs Exposes SMMU spec modules.
vm/devices/iommu/smmu/src/spec/events.rs Adds SMMU event queue entry types + constructors + tests.
vm/devices/iommu/smmu/src/spec/commands.rs Adds SMMU command queue entry types + helpers + tests.
vm/devices/iommu/smmu/src/spec/cd.rs Adds SMMU context descriptor layout/types + tests.
vm/devices/iommu/smmu/src/lib.rs Introduces new smmu crate module surface.
vm/devices/iommu/smmu/Cargo.toml Adds new smmu crate definition + dependencies.
vm/acpi_spec/src/madt.rs Adds MADT GIC ITS structure support.
vm/acpi_spec/src/lib.rs Exposes new ACPI IORT module.
vm/acpi_spec/src/iort.rs Adds IORT node/mapping structures used by ACPI builder.
tmk/tmk_vmm/src/run.rs Updates aarch64 platform config to use gic_msi.
openvmm/openvmm_entry/src/lib.rs Adds CLI/config wiring for GIC MSI controller selection and SMMU instances.
openvmm/openvmm_entry/src/cli_args.rs Adds --gic-msi and --smmu CLI flags for aarch64.
openvmm/openvmm_defs/src/config.rs Adds defaults for ITS/SMMU MMIO layout and SMMU/GIC MSI config structs.
openvmm/openvmm_core/src/worker/vm_loaders/linux.rs Builds DT with ITS and SMMU nodes + iommu-map; passes SMMU configs.
openvmm/openvmm_core/src/worker/dispatch.rs Selects ITS vs v2m, instantiates SMMU devices, wraps per-device MSI/irqfd/memory.
openvmm/openvmm_core/Cargo.toml Adds smmu dependency to OpenVMM core.
openhcl/virt_mshv_vtl/src/lib.rs Updates SignalMsi implementation signature.
openhcl/underhill_core/src/loader/mod.rs Extends loader config to include (placeholder) SMMU base field.
openhcl/bootloader_fdt_parser/src/lib.rs Updates parsed platform config to use gic_msi.
Guide/src/reference/emulated/pcie/overview.md Documents aarch64 MSI routing via ITS vs v2m and the new CLI flag.
Guide/src/reference/devices/firmware/linux_direct.md Updates docs to mention ITS/IORT in ACPI mode for PCIe routing.
Cargo.toml Adds new workspace crate smmu.
Comments suppressed due to low confidence (4)

vmm_core/src/acpi_builder.rs:1

  • The IORT RC mapping logic uses a global rc_mapping_count and defaults an unmapped RC to its_group_offset even when there is no ITS. If has_smmu == true and has_its == false (and not every RC is covered by an SMMU), RCs without an SMMU will incorrectly map to offset IORT_NODE_OFFSET (which will be the first SMMU node), effectively claiming they are behind the wrong SMMU. Fix by computing the mapping count and target per root complex: emit an RC ID mapping only if that RC has an SMMU offset, or if an ITS is actually present; otherwise set that RC node’s mapping_count to 0 and append no IortIdMapping entry.
    vmm_core/src/acpi_builder.rs:1
  • The IORT RC mapping logic uses a global rc_mapping_count and defaults an unmapped RC to its_group_offset even when there is no ITS. If has_smmu == true and has_its == false (and not every RC is covered by an SMMU), RCs without an SMMU will incorrectly map to offset IORT_NODE_OFFSET (which will be the first SMMU node), effectively claiming they are behind the wrong SMMU. Fix by computing the mapping count and target per root complex: emit an RC ID mapping only if that RC has an SMMU offset, or if an ITS is actually present; otherwise set that RC node’s mapping_count to 0 and append no IortIdMapping entry.
    vmm_core/src/acpi_builder.rs:1
  • The test suite exercises IORT generation with ITS and with SMMU+ITS, but doesn’t cover the important configuration where has_smmu == true and has_its == false (including the case where only a subset of RCs are covered by SMMUs). Adding tests for “SMMU without ITS” and “partial RC coverage” would catch incorrect RC mapping counts/targets (and would have exposed the current incorrect unwrap_or(its_group_offset) fallback when no ITS exists).
    vm/devices/pci/pci_core/src/capabilities/msix.rs:217
  • With the new optional devid plumbing intended for ITS routing, this MSI-X delivery path always signals with devid=None, which prevents identifying the correct PCI function for multi-function devices (where ITS device ID must include the function number). If multi-function endpoints are in scope for ITS mode, consider extending the MSI-X interrupt target state to carry the function’s BDF (or RID) and signaling with signal_msi_with_rid(...) (or passing Some(bdf) down to the ITS wrapper) so the composed ITS device ID is accurate.
    fn deliver(&self) {
        let mut state = self.0.lock();
        if state.enabled {
            state.target.signal_msi(state.address, state.data);
        } else {
            state.pending = true;
        }
    }

Comment on lines +502 to +503
// through its SMMU instance.
node = node.add_u32_array(p_iommu_map, &[0, *phandle, 0, 0x10000])?;
Comment on lines +206 to +225
fn compute_start_level(tg0: Tg0, t0sz: u8) -> Option<(u8, u8)> {
let va_bits = 64u8.checked_sub(t0sz)?;
let bits_per_level = tg0.bits_per_level()?;
let page_shift = tg0.page_shift()?;

// Number of address bits resolved by the page table walk (excluding page
// offset). For 4K/9 bits per level: va_bits - 12 bits are resolved by
// the walk.
let resolve_bits = va_bits.checked_sub(page_shift)?;

// Number of full levels needed = ceil(resolve_bits / bits_per_level).
// Start level = 4 - num_levels (levels are numbered 0..3).
let num_levels = resolve_bits.div_ceil(bits_per_level);
if num_levels > 4 {
return None;
}
let start_level = 4 - num_levels;

Some((start_level, va_bits))
}
Comment on lines 176 to 179
if state.pending {
state.target.signal_msi(0, address, data);
state.target.signal_msi(address, data);
state.pending = false;
}
@github-actions github-actions Bot added the unsafe Related to unsafe code label May 11, 2026
@github-actions
Copy link
Copy Markdown

⚠️ Unsafe Code Detected

This PR modifies files containing unsafe Rust code. Extra scrutiny is required during review.

For more on why we check whole files, instead of just diffs, check out the Rustonomicon

@github-actions github-actions Bot added the Guide label May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Guide unsafe Related to unsafe code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants