Skip to content

nvme_driver: disable keepalive for a specific class of devices#3438

Merged
gurasinghMS merged 5 commits into
microsoft:mainfrom
gurasinghMS:no-keepalive-for-nd2-devices
May 8, 2026
Merged

nvme_driver: disable keepalive for a specific class of devices#3438
gurasinghMS merged 5 commits into
microsoft:mainfrom
gurasinghMS:no-keepalive-for-nd2-devices

Conversation

@gurasinghMS
Copy link
Copy Markdown
Contributor

@gurasinghMS gurasinghMS commented May 7, 2026

This change is intended to disable keepalive entirely for any devices with VendorId = 0x1414 and DeviceId = 0xb111. Disablement is done in the nvme_manager when the device is first loaded by reading the VendorID and DeviceID from the config space and the manager stores a flag the determines keepalive compatibility for the device (so that we are not repeatedly reading in the config space).
It also makes sure that even when a device is being restored, it is not automatically assumed to be keepalive compatible.

Copilot AI review requested due to automatic review settings May 7, 2026 23:00
@gurasinghMS gurasinghMS changed the title No keepalive for nd2 devices nvme_driver: disable keepalive for a specific class of devices May 7, 2026
@gurasinghMS gurasinghMS marked this pull request as ready for review May 7, 2026 23:03
@gurasinghMS gurasinghMS requested review from a team as code owners May 7, 2026 23:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a per-device NVMe keepalive compatibility gate in OpenHCL so that “nd2” (incompatible) NVMe devices are forced to reset across servicing even when NVMe keepalive is enabled VM-wide. It also extends the NVMe fault controller to override PCI vendor/device IDs, enabling an integration test that validates mixed keepalive behavior within a single VM.

Changes:

  • Gate NVMe keepalive/save-restore per device by reading PCI vendor/device IDs from sysfs and disabling keepalive for the incompatible ID pair.
  • Extend the NVMe fault controller fault configuration to allow overriding PCI vendor/device IDs reported via config space/sysfs.
  • Add a VMM test that attaches two NVMe controllers and asserts keepalive is honored for one while downgraded for the other after servicing.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs Adds a servicing test to validate per-device NVMe keepalive behavior with two controllers.
vm/devices/storage/nvme_test/src/pci.rs Applies hardware-ID overrides (vendor/device) to the emulated PCI config space for NVMe fault controller instances.
vm/devices/storage/nvme_resources/src/fault.rs Introduces HardwareConfigFaultConfig and wires it into FaultConfiguration.
openhcl/underhill_core/src/nvme_manager/mod.rs Adds sysfs vendor/device ID reading and a keepalive compatibility helper.
openhcl/underhill_core/src/nvme_manager/manager.rs Uses per-device compatibility to gate keepalive shutdown behavior and save/restore.
openhcl/underhill_core/src/nvme_manager/device.rs Stores and exposes a keepalive_compatible flag on NvmeDriverManager.

Comment thread openhcl/underhill_core/src/nvme_manager/mod.rs
Comment thread vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs Outdated
Copilot AI review requested due to automatic review settings May 7, 2026 23:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Comment thread openhcl/underhill_core/src/nvme_manager/manager.rs Outdated
Comment thread openhcl/underhill_core/src/nvme_manager/manager.rs
Comment thread openhcl/underhill_core/src/nvme_manager/manager.rs Outdated
Copilot AI review requested due to automatic review settings May 7, 2026 23:39
@gurasinghMS gurasinghMS force-pushed the no-keepalive-for-nd2-devices branch from 0b14cdf to 36e0c22 Compare May 7, 2026 23:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Comment thread openhcl/underhill_core/src/nvme_manager/manager.rs
Comment thread openhcl/underhill_core/src/nvme_manager/manager.rs Outdated
Comment thread openhcl/underhill_core/src/nvme_manager/manager.rs Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

Copilot AI review requested due to automatic review settings May 8, 2026 21:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment thread openhcl/underhill_core/src/nvme_manager/mod.rs
Comment thread openhcl/underhill_core/src/nvme_manager/manager.rs
@gurasinghMS gurasinghMS enabled auto-merge (squash) May 8, 2026 22:08
@gurasinghMS gurasinghMS merged commit 47303dc into microsoft:main May 8, 2026
69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants