nvme_driver: disable keepalive for a specific class of devices#3438
Merged
gurasinghMS merged 5 commits intoMay 8, 2026
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds a per-device NVMe keepalive compatibility gate in OpenHCL so that “nd2” (incompatible) NVMe devices are forced to reset across servicing even when NVMe keepalive is enabled VM-wide. It also extends the NVMe fault controller to override PCI vendor/device IDs, enabling an integration test that validates mixed keepalive behavior within a single VM.
Changes:
- Gate NVMe keepalive/save-restore per device by reading PCI vendor/device IDs from sysfs and disabling keepalive for the incompatible ID pair.
- Extend the NVMe fault controller fault configuration to allow overriding PCI vendor/device IDs reported via config space/sysfs.
- Add a VMM test that attaches two NVMe controllers and asserts keepalive is honored for one while downgraded for the other after servicing.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs | Adds a servicing test to validate per-device NVMe keepalive behavior with two controllers. |
| vm/devices/storage/nvme_test/src/pci.rs | Applies hardware-ID overrides (vendor/device) to the emulated PCI config space for NVMe fault controller instances. |
| vm/devices/storage/nvme_resources/src/fault.rs | Introduces HardwareConfigFaultConfig and wires it into FaultConfiguration. |
| openhcl/underhill_core/src/nvme_manager/mod.rs | Adds sysfs vendor/device ID reading and a keepalive compatibility helper. |
| openhcl/underhill_core/src/nvme_manager/manager.rs | Uses per-device compatibility to gate keepalive shutdown behavior and save/restore. |
| openhcl/underhill_core/src/nvme_manager/device.rs | Stores and exposes a keepalive_compatible flag on NvmeDriverManager. |
0b14cdf to
36e0c22
Compare
chris-oo
approved these changes
May 8, 2026
…ule out a perf hit
chris-oo
approved these changes
May 8, 2026
amy-microsoft
approved these changes
May 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change is intended to disable keepalive entirely for any devices with VendorId = 0x1414 and DeviceId = 0xb111. Disablement is done in the nvme_manager when the device is first loaded by reading the VendorID and DeviceID from the config space and the manager stores a flag the determines keepalive compatibility for the device (so that we are not repeatedly reading in the config space).
It also makes sure that even when a device is being restored, it is not automatically assumed to be keepalive compatible.