[linux-nvidia-6.17-next] CXL VFIO: Add CXL Type-2 device passthrough support#407
[linux-nvidia-6.17-next] CXL VFIO: Add CXL Type-2 device passthrough support#407JiandiAnNVIDIA wants to merge 51 commits into
Conversation
On new platforms greater than QM_HW_V3, the configuration region for the live migration function of the accelerator device is no longer placed in the VF, but is instead placed in the PF. Therefore, the configuration region of the live migration function needs to be opened when the QM driver is loaded. When the QM driver is uninstalled, the driver needs to clear this configuration. Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Shameer Kolothum <shameerkolothum@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/20251030015744.131771-2-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 4868d2d) Signed-off-by: Jiandi An <jan@nvidia.com>
On new platforms greater than QM_HW_V3, the migration region has been relocated from the VF to the PF. The VF's own configuration space is restored to the complete 64KB, and there is no need to divide the size of the BAR configuration space equally. The driver should be modified accordingly to adapt to the new hardware device. On the older hardware platform QM_HW_V3, the live migration configuration region is placed in the latter 32K portion of the VF's BAR2 configuration space. On the new hardware platform QM_HW_V4, the live migration configuration region also exists in the same 32K area immediately following the VF's BAR2, just like on QM_HW_V3. However, access to this region is now controlled by hardware. Additionally, a copy of the live migration configuration region is present in the PF's BAR2 configuration space. On the new hardware platform QM_HW_V4, when an older version of the driver is loaded, it behaves like QM_HW_V3 and uses the configuration region in the VF, ensuring that the live migration function continues to work normally. When the new version of the driver is loaded, it directly uses the configuration region in the PF. Meanwhile, hardware configuration disables the live migration configuration region in the VF's BAR2: reads return all 0xF values, and writes are silently ignored. Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Shameer Kolothum <shameerkolothum@gmail.com> Link: https://lore.kernel.org/r/20251030015744.131771-3-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 2131c15) Signed-off-by: Jiandi An <jan@nvidia.com>
PR Validation ReportPatchscan ✅ No Missing FixesAll cherry-picked commits checked — no missing upstream fixes found. PR Lint ✅ All checks passedDetailsChecking 51 commits... Cherry-pick digest: ┌──────────────┬──────────────────────────────────────────────────────────────────┬────────────┬─────────┬───────────────────────────┐ │ Local │ Referenced upstream / Patch subject │ Patch-ID │ Subject │ SoB chain │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ aef7e33d0e74 │ [SAUCE] config: enable config_vfio_cxl_core for cxl type-2 passt │ N/A │ N/A │ jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ e8c83314f4be │ [SAUCE] vfio/cxl: implement vfio_cxl_reset() │ N/A │ N/A │ mhonap, jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 37fca8563f93 │ [SAUCE] vfio/cxl: virtualize dvsec status2 register in vconfig s │ N/A │ N/A │ mhonap, jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 15ef3e96a36c │ [SAUCE] vfio/cxl: preserve hdm decoder base addresses across res │ N/A │ N/A │ mhonap, jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 8c92d195d1d0 │ [SAUCE] vfio/cxl: ensure pci memory space is enabled before post │ N/A │ N/A │ mhonap, jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ e5183b49c784 │ [SAUCE] vfio/pci: wire cxl dpa reset handling │ N/A │ N/A │ mhonap, jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ d53532800534 │ [SAUCE] cxl: export the cxl reset helpers for vfio users │ N/A │ N/A │ mhonap, jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 646f12a6a8f8 │ docs: vfio-pci: document cxl type-2 device passthrough │ match │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 5bc0b3ea82af │ vfio/cxl: provide opt-out for cxl feature │ match │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 534faac8aa63 │ vfio/pci: advertise cxl cap and sparse component bar to userspac │ match │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 24dd6678d476 │ vfio/cxl: register regions with vfio layer │ match │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 1447b99bccc3 │ vfio/cxl: virtualize cxl dvsec config writes │ noted │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 05b9195202cc │ vfio/cxl: dpa vfio region with demand fault mmap and reset zap │ match │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ fb580ac046d5 │ vfio/cxl: cxl region management support │ match │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ d64c61c2ba91 │ vfio/cxl: wait for hdm ranges and create memdev │ match │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ ad3979839aba │ vfio/cxl: introduce hdm decoder register emulation framework │ match │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 0fbd7b2effd7 │ vfio/pci: export config access helpers │ match │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 84fbfbcead31 │ vfio/cxl: detect cxl dvsec and probe hdm block │ match │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ cb87876e8e1d │ vfio/pci: add config_vfio_cxl_core and stub cxl hooks │ match │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ de3e1a60a995 │ vfio/pci: add cxl state to vfio_pci_core_device │ noted │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 05c1da9d786a │ vfio: uapi for cxl-capable pci device assignment │ match │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ d3141453f48b │ cxl: record bir and bar offset in cxl_register_map │ noted │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ d0fde9879972 │ cxl: split cxl_await_range_active() from media-ready wait │ noted │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 199d5d2f2ca4 │ cxl: move component/hdm register defines to uapi/cxl/cxl_regs.h │ noted │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ e02c1b7ac02a │ cxl: declare cxl_find_regblock and cxl_probe_component_regs in p │ noted │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ fd317b86093e │ cxl: add cxl_get_hdm_info() for hdm decoder metadata │ match │ found │ ok, backporter: jan │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 54d50bbc6111 │ 56c069307dfd vfio: Remove the get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 21085759fbcd │ dc10734610e2 vfio: Move the remaining drivers to get_region_info │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ c0ad388ba741 │ 182c62861ba5 vfio/platform: Convert to get_region_info_caps │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 2bf5a2cbb154 │ 1b0ecb5baf4a vfio/pci: Convert all PCI drivers to get_region_inf │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ bc1c993e783d │ 973af0c40eaf vfio/ccw: Convert to get_region_info_caps │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 0282af066b10 │ 93165757c023 vfio/gvt: Convert to get_region_info_caps │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 29e1217fd909 │ 45f9fa18109d vfio/mbochs: Convert mbochs to use vfio_info_add_ca │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 7dd77b841190 │ 775f726a742a vfio: Add get_region_info_caps op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ e7da10685f7f │ f97859503859 vfio: Require drivers to implement get_region_info │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 6c250ce18f9e │ e664067b6035 vfio/gvt: Provide a get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 76b5171d117d │ 61b3f7b5a729 vfio/ccw: Provide a get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 619333df0ce8 │ b9827eff6b4a vfio/cdx: Provide a get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 8ba94bf6a94e │ 6cdae5d0c326 vfio/fsl: Provide a get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 073f13c17982 │ d4635df279f5 vfio/platform: Provide a get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 554dca9a1de1 │ 8339fccda837 vfio/mbochs: Provide a get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 0fbfd736592c │ cf16acc0af09 vfio/mdpy: Provide a get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 4df20815cb64 │ 078775527109 vfio/mtty: Provide a get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ e54b8e086acd │ f3fddb71dd50 vfio/pci: Fill in the missing get_region_info ops │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 702622746ce4 │ 5ac720647477 vfio/nvgrace: Convert to the get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ fad0d0d38ca4 │ c044eefa4786 vfio/virtio: Convert to the get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 6b97c1b33bef │ e238f147d517 vfio/hisi: Convert to the get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 897cefa739f7 │ 113557b04068 vfio: Provide a get_region_info op │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 449e051b54c2 │ 767b1ed8b980 vfio/nvgrace-gpu: fix grammatical error │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 0c7d38232410 │ 2131c1517f30 hisi_acc_vfio_pci: adapt to new migration configura │ match │ match │ preserved + jan added │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 38c6eb3eed52 │ 4868d2d52df6 crypto: hisilicon - qm updates BAR configuration │ match │ match │ preserved + jan added │ └──────────────┴──────────────────────────────────────────────────────────────────┴────────────┴─────────┴───────────────────────────┘ Lint: all checks passed. |
The word "as" in the comment should be replaced with "is", and there is an extra space in the comment. Signed-off-by: Morduan Zang <zhangdandan@uniontech.com> Reviewed-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/54E1ED6C5A2682C8+20250814110358.285412-1-zhangdandan@uniontech.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> (cherry picked from commit 767b1ed) Signed-off-by: Jiandi An <jan@nvidia.com>
Instead of hooking the general ioctl op, have the core code directly decode VFIO_DEVICE_GET_REGION_INFO and call an op just for it. This is intended to allow mechanical changes to the drivers to pull their VFIO_DEVICE_GET_REGION_INFO int oa function. Later patches will improve the function signature to consolidate more code. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 113557b) Signed-off-by: Jiandi An <jan@nvidia.com>
Change the function signature of hisi_acc_vfio_pci_ioctl() and re-indent it. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (backported from commit e238f14) [jan: resolve minor conflict in hisi_acc_vfio_pci_ioctl()] Signed-off-by: Jiandi An <jan@nvidia.com>
Remove virtiovf_vfio_pci_core_ioctl() and change the signature of virtiovf_pci_ioctl_get_region_info(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit c044eef) Signed-off-by: Jiandi An <jan@nvidia.com>
Change the signature of nvgrace_gpu_ioctl_get_region_info() Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Ankit Agrawal <ankita@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 5ac7206) Signed-off-by: Jiandi An <jan@nvidia.com>
Now that every variant driver provides a get_region_info op remove the ioctl based dispatch from vfio_pci_core_ioctl(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit f3fddb7) Signed-off-by: Jiandi An <jan@nvidia.com>
Move it out of mtty_ioctl() and re-indent it. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 0787755) Signed-off-by: Jiandi An <jan@nvidia.com>
Move it out of mdpy_ioctl() and re-indent it. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit cf16acc) Signed-off-by: Jiandi An <jan@nvidia.com>
Move it out of mbochs_ioctl() and re-indent it. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 8339fcc) Signed-off-by: Jiandi An <jan@nvidia.com>
Move it out of vfio_platform_ioctl() and re-indent it. Add it to all platform drivers. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit d4635df) Signed-off-by: Jiandi An <jan@nvidia.com>
Move it out of vfio_fsl_mc_ioctl() and re-indent it. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/10-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 6cdae5d) Signed-off-by: Jiandi An <jan@nvidia.com>
Change the signature of vfio_cdx_ioctl_get_region_info() and hook it to the op. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/11-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit b9827ef) Signed-off-by: Jiandi An <jan@nvidia.com>
Move it out of vfio_ccw_mdev_ioctl() and re-indent it. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/12-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 61b3f7b) Signed-off-by: Jiandi An <jan@nvidia.com>
Move it out of intel_vgpu_ioctl() and re-indent it. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/13-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit e664067) Signed-off-by: Jiandi An <jan@nvidia.com>
Remove the fallback through the ioctl callback, no drivers use this now. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/14-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit f978595) Signed-off-by: Jiandi An <jan@nvidia.com>
This op does the copy to/from user for the info and can return back a cap chain through a vfio_info_cap * result. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/15-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 775f726) Signed-off-by: Jiandi An <jan@nvidia.com>
This driver open codes the cap chain manipulations. Instead use vfio_info_add_capability() and the get_region_info_caps() op. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/16-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 45f9fa1) Signed-off-by: Jiandi An <jan@nvidia.com>
Remove the duplicate code and change info to a pointer. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/17-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 9316575) Signed-off-by: Jiandi An <jan@nvidia.com>
Remove the duplicate code and flatten the call chain. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/18-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 973af0c) Signed-off-by: Jiandi An <jan@nvidia.com>
Since the core function signature changes it has to flow up to all drivers. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/19-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 1b0ecb5) Signed-off-by: Jiandi An <jan@nvidia.com>
Remove the duplicate code and change info to a pointer. caps are not used. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 182c628) Signed-off-by: Jiandi An <jan@nvidia.com>
Remove the duplicate code and change info to a pointer. caps are not used. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/21-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit dc10734) Signed-off-by: Jiandi An <jan@nvidia.com>
No driver uses it now, all are using get_region_info_caps(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/22-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 56c0693) Signed-off-by: Jiandi An <jan@nvidia.com>
cxl_probe_component_regs() finds the HDM decoder block during device probe and caches its location, but does not record the decoder count and does not expose the result outside drivers/cxl/. vfio-cxl needs the decoder count and the byte offset and size of the HDM block without re-running the probe sequence. Record decoder_cnt in rmap->count when parsing the HDM capability in cxl_probe_component_regs(), extend struct cxl_reg_map with a count member, and add cxl_get_hdm_info() to return offset, size, and count from the cached map. Export under the CXL namespace; stub to -EOPNOTSUPP when CONFIG_CXL_BUS is off. Co-developed-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>
…nent_regs in public header vfio-cxl lives outside drivers/cxl/ but still needs to locate the component register block and fill cxl_component_reg_map. Those prototypes were stuck in the internal drivers/cxl/cxl.h. Move the declarations to include/cxl/cxl.h next to the other vfio-facing hooks, with stubs when CXL bus support is disabled. Drop the duplicate prototypes from the private header. Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan: Move cxl_probe_component_regs() to include/cxl/pci.h instead of include/cxl/cxl.h to align with existing Srirangan/Alejandro convention; skip cxl_find_regblock() move as it is already in include/cxl/pci.h; add struct cxl_component_reg_map forward declaration] Signed-off-by: Jiandi An <jan@nvidia.com>
…xl/cxl_regs.h VFIO and other code outside the CXL core needs the same offset/mask constants the core uses for the component register block and HDM decoders. Pull them into a new include/uapi/cxl/cxl_regs.h (GPL-2.0 WITH Linux-syscall-note) and include it from include/cxl/cxl.h. Use the uapi-friendly __GENMASK helpers where needed. Section comments in the new file reference CXL spec r4.0 numbering. For UAPI change, replaced the SZ_64K with actual size as the macro will not be available for userspace programs. Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan: Remove defines from include/cxl/cxl.h instead of drivers/cxl/cxl.h as they were already moved there by Srirangan's SAUCE commit] Signed-off-by: Jiandi An <jan@nvidia.com>
Introduce the Kconfig option CONFIG_VFIO_CXL_CORE and the necessary build rules to compile CXL.mem passthrough infrastructure for vendor-specific CXL devices into the vfio-pci-core module. The new option depends on VFIO_PCI_CORE, CXL_BUS and CXL_MEM. Wire up the detection and cleanup entry-point stubs in vfio_pci_core_register_device() and vfio_pci_core_unregister_device() so that subsequent patches can fill in the CXL-specific logic without touching the vfio-pci-core flow again. The vfio_cxl_core.c file added here is an empty skeleton; the actual CXL detection and initialisation code is introduced in the following patch to keep this build-system patch reviewable on its own. Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan: Resolve context mismatches in Kconfig, Makefile, and vfio_pci_priv.h due to missing upstream xe/dmabuf support in NV-Kernels base] Signed-off-by: Jiandi An <jan@nvidia.com>
Detect a vendor-specific CXL device at vfio-pci bind time and probe its HDM decoder register block. vfio_cxl_create_device_state() allocates per-device state via devm and reads MEM_CAPABLE and CACHE_CAPABLE from the CXL DVSEC. vfio_cxl_setup_regs() locates the component register block, temporarily maps it, calls cxl_probe_component_regs() to find the HDM block, then releases the mapping. vfio_pci_cxl_detect_and_init() chains these two steps. If either fails, vdev->cxl stays NULL and the device falls back to plain vfio-pci. Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>
Promote vfio_raw_config_write() and vfio_raw_config_read() to non-static so that the CXL DVSEC write handler in the next patch can call them. Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>
… framework Add HDM decoder register emulation for CXL devices assigned to a guest. New file vfio_cxl_emu.c allocates comp_reg_virt[] covering the full component register block (CXL_COMPONENT_REG_BLOCK_SIZE), snapshots it from MMIO after probe, and registers a VFIO device region (VFIO_REGION_SUBTYPE_CXL_COMP_REGS) with read/write ops but no mmap, so every access hits the emulated buffer and write dispatchers. vfio_cxl_setup_virt_regs() is called from the tail of vfio_cxl_setup_regs(); vfio_cxl_clean_virt_regs() runs on cleanup. HDM decoder register defines come from include/uapi/cxl/cxl_regs.h. Bits with no hardware equivalent stay in vfio_cxl_priv.h. hdm_decoder_n_ctrl_write() allows the guest to clear the LOCK bit. A firmware-committed decoder arrives with LOCK=1; the guest driver must clear it before reprogramming BASE and SIZE with the VM's GPA. Such a write clears the bit in the shadow while preserving all other fields. Co-developed-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan: Resolve Makefile context mismatch due to missing upstream dmabuf support in NV-Kernels base] Signed-off-by: Jiandi An <jan@nvidia.com>
After HDM registers are mapped, call cxl_await_range_active() so we only proceed when DVSEC ranges report active without touching the memdev register group Type-2 may lack. Re-snapshot component regs (vfio_cxl_reinit_comp_regs) once MEM_ACTIVE so firmware final SIZE_HIGH etc. land in comp_reg_virt. Read committed decoder size from hardware, set capacity via cxl_set_capacity(), and devm_cxl_add_memdev(). Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>
Region Management makes use of APIs provided by CXL_CORE as below: CREATE_REGION flow: 1. Validate request (size, decoder availability) 2. Allocate HPA via cxl_get_hpa_freespace() 3. Allocate DPA via cxl_request_dpa() 4. Create region via cxl_create_region() - commits HDM decoder 5. Get HPA range via cxl_get_region_range() DESTROY_REGION flow: 1. Detach decoder via cxl_decoder_detach() 2. Free DPA via cxl_dpa_free() 3. Release root decoder via cxl_put_root_decoder() Use DEFINE_FREE scope helpers so error paths unwind cleanly. Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>
…nd reset zap Wire the CXL DPA range up as a VFIO demand-paged region so QEMU can mmap guest device memory directly. Faults call vmf_insert_pfn() to insert one PFN at a time rather than mapping the full range upfront. CXL region lifecycle: - The CXL memory region is registered with VFIO layer during vfio_pci_open_device - mmap() establishes the VMA with vm_ops but inserts no PTEs - Each guest page fault calls vfio_cxl_region_page_fault() which inserts a single PFN under the memory_lock read side - On device reset, vfio_cxl_zap_region_locked() sets region_active=false and calls unmap_mapping_range() to invalidate all DPA PTEs atomically while holding memory_lock for writing - Faults racing with reset see region_active==false and return VM_FAULT_SIGBUS - vfio_cxl_reactivate_region() restores region_active after successful hardware reset Also integrate the zap/reactivate calls into vfio_pci_ioctl_reset() so that FLR correctly invalidates DPA mappings and restores them on success. Co-developed-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan: Resolve context mismatches in vfio_pci_core.c and vfio_pci_priv.h due to missing upstream dmabuf support in NV-Kernels base] Signed-off-by: Jiandi An <jan@nvidia.com>
CXL devices have CXL DVSEC registers in the configuration space.
Many of them affect the behaviors of the devices, e.g. enabling
CXL.io/CXL.mem/CXL.cache. However, these configurations are owned by
the host and a virtualization policy should be applied when handling
the access from the guest.
Introduce the emulation of CXL configuration space to handle the access
of the virtual CXL configuration space from the guest.
vfio-pci-core already allocates vdev->vconfig as the authoritative
virtual config space shadow. Directly use vdev->vconfig:
- DVSEC reads return data from vdev->vconfig (already populated by
vfio_config_init() via vfio_ecap_init())
- DVSEC writes go through new CXL-aware write handlers that update
vdev->vconfig in place
- The writable DVSEC registers are marked virtual in vdev->pci_config_map
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Manish Honap <mhonap@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/)
[jan: Resolve context mismatches in Makefile and vfio_pci_core.h due to missing upstream dmabuf/p2pdma forward declarations in NV-Kernels base]
Signed-off-by: Jiandi An <jan@nvidia.com>
Register the DPA and component register region with VFIO layer. Region indices for both these regions are cached for quick lookup. vfio_cxl_register_cxl_region() - memremap(WB) the region HPA (treat CXL.mem as RAM, not MMIO) - Register VFIO_REGION_SUBTYPE_CXL - Records dpa_region_idx. vfio_cxl_register_comp_regs_region() - Registers VFIO_REGION_SUBTYPE_CXL_COMP_REGS with size hdm_reg_offset + hdm_reg_size - Records comp_reg_region_idx. Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>
…AR to userspace Expose CXL device capability through the VFIO device info ioctl and give userspace access to the GPU/accelerator register windows in the component BAR while protecting the CXL component register block. vfio_cxl_get_info() fills VFIO_DEVICE_INFO_CAP_CXL with the HDM register BAR index and byte offset, commit flags, and VFIO region indices for the DPA and COMP_REGS regions. HDM decoder count and the HDM block offset within COMP_REGS are not populated; both are derivable from the CXL Capability Array in the COMP_REGS region itself. vfio_cxl_get_region_info() handles VFIO_DEVICE_GET_REGION_INFO for the component register BAR. It builds a sparse-mmap capability that advertises only the GPU/accelerator register windows, carving out the CXL component register block. Three physical layouts are handled: Topology A comp block at BAR end: one area [0, comp_reg_offset) Topology B comp block at BAR start: one area [comp_end, bar_len) Topology C comp block in the middle: two areas, one on each side vfio_cxl_mmap_overlaps_comp_regs() checks whether an mmap request overlaps [comp_reg_offset, comp_reg_offset + comp_reg_size). vfio_pci_core_mmap() calls it to reject access to the component register block while allowing mmap of the GPU register windows in the sparse capability. This replaces the earlier blanket rejection of any mmap on the component BAR index. Hook both helpers into vfio_pci_ioctl_get_info() and vfio_pci_ioctl_get_region_info() in vfio_pci_core.c. The component BAR cannot be claimed exclusively since the CXL subsystem holds persistent sub-range iomem claims during HDM decoder setup. pci_request_selected_regions() returns EBUSY; pass bars=0 to skip the request and map directly via pci_iomap(). Physical ownership is assured by driver binding. Signed-off-by: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>
This commit provides an opt-out mechanism to disable the CXL support from vfio module. The opt-out is provided both build time and module load time. Build time option CONFIG_VFIO_CXL_CORE is used to enable/disable CXL support in vfio-pci module. For runtime disabling the CXL support, use the module parameter disable_cxl. This is a per-device opt-out on the core device set by the driver before registration. Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) [jan: Resolve context mismatch in vfio_pci.c probe function due to missing upstream pci_ops assignment in NV-Kernels base] Signed-off-by: Jiandi An <jan@nvidia.com>
…ough Add Documentation/driver-api/vfio-pci-cxl.rst describing the architecture, VFIO interfaces, and operational constraints for CXL Type-2 (cache-coherent accelerator) passthrough via vfio-pci-core, and link it from the driver-api index. The document covers: - VFIO_DEVICE_FLAGS_CXL and VFIO_DEVICE_INFO_CAP_CXL: what the capability struct contains and what the FIRMWARE_COMMITTED and CACHE_CAPABLE flags mean - How to derive hdm_decoder_offset and hdm_count from the COMP_REGS region by traversing the CXL Capability Array to find cap ID 0x5 and reading the HDM Decoder Capability register - Topology-aware sparse mmap on the component BAR (topologies A, B, C covering comp block at end, start, or middle of the BAR) - Two extra VFIO device regions: COMP_REGS for the emulated HDM register state and the DPA memory window - DVSEC config write virtualization: what the guest sees vs. hardware - FLR coordination: DPA PTEs zapped before reset, restored after Signed-off-by: Manish Honap <mhonap@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>
3eede80 to
502020b
Compare
|
502020b (HEAD, jiandi/cxl-vfio_2026-04-23) NVIDIA: VR: SAUCE: config: Enable CONFIG_VFIO_CXL_CORE for CXL Type-2 passthrough For these new patches that have not been posted to LKML yet, I reviewed with Codex, specifically focusing on if there are any potential regressions introduced. Here are the 5 findings with their related commits and the concern. 1. STATUS2 shadowing can break Cache WBI polling Related commit: The patch makes CXL_DVSEC_STATUS2_OFFSET reads return from vdev->vconfig instead of hardware in drivers/vfio/pci/cxl/vfio_cxl_config.c:282. That is needed for virtualized reset result bits, but STATUS2 also contains live hardware status, especially: CXL_DVSEC_STATUS2_CACHE_INVALID INITIATE_CACHE_WBI is still forwarded to hardware in drivers/vfio/pci/cxl/vfio_cxl_config.c:178, but there is no refresh of STATUS2.Cache_Invalid into the shadow. A guest that triggers Cache WBI and polls Suggested fix: make STATUS2 reads merge hardware and shadow state. For example, read hardware STATUS2, preserve live hardware bits like CACHE_INVALID, and override only the virtualized reset outcome bits from shadow. 2. Existing VFIO reset paths can leave PCI_COMMAND_MEMORY enabled Related commits: 979f0aa wires vfio_cxl_finish_reset() into existing reset paths, including VFIO_DEVICE_RESET in drivers/vfio/pci/vfio_pci_core.c:1238 and guest FLR config writes in drivers/vfio/pci/vfio_pci_config.c:902. f5c8879 adds vfio_cxl_enable_memory_space() in drivers/vfio/pci/cxl/vfio_cxl_core.c:704, which sets PCI_COMMAND_MEMORY before reading BAR-backed HDM registers. It does not restore the previous command value. So if the guest had Memory Space disabled before reset, the reset path can return with Memory Space enabled. That changes guest-visible PCI config behavior. Suggested fix: save the original command value, temporarily enable Memory Space for HDM register reads, then restore the original command value before returning. 3. Guest CXL reset bypasses CXL core reset coordination Related commits: acb85c3 implements guest CXL reset by directly calling cxl_dev_reset() in drivers/vfio/pci/cxl/vfio_cxl_config.c:131. The CXL sysfs reset path does more than call cxl_dev_reset(): it takes cxl_reset_mutex, saves/disables sibling CXL functions, runs reset, then restores siblings in drivers/cxl/core/pci.c:1385. The VFIO path skips that coordination. On multi-function CXL devices, or with concurrent reset paths, a guest-initiated reset can run without quiescing sibling functions or serializing against another CXL reset. Suggested fix: expose a CXL helper that performs the same global reset serialization and sibling save/disable/restore, while allowing VFIO to skip host memory offlining. 4. HDM BASE preservation only handles 16 decoders Related commit: vfio_cxl_reinit_hdm_shadow() snapshots BASE registers into fixed arrays of 16 entries in drivers/vfio/pci/cxl/vfio_cxl_core.c:728: __le32 saved_lo[16] = {}, saved_hi[16] = {}; But hdm_count comes from the HDM decoder count field, and local CXL code allows more than 16 decoders. If cxl->hdm_count > 16, decoders 16+ lose preserved guest GPA BASE values across reset. Suggested fix: allocate based on cxl->hdm_count, or explicitly reject/cap devices above 16 decoders if that is the intended support limit. 5. Post-reset STATUS2 read ignores config access failure Related commit: After reset, vfio_cxl_reset() reads hardware STATUS2 in drivers/vfio/pci/cxl/vfio_cxl_config.c:147: pci_read_config_word(pdev, dvsec + CXL_DVSEC_STATUS2_OFFSET, &hw_status2); The return value is ignored, and hw_status2 is used immediately afterward to stamp reset result bits into vconfig. If config access fails after reset, the guest may see undefined or misleading reset status. Suggested fix: check the return value. On failure, initialize hw_status2 deterministically and report/stamp CXL_RESET_ERROR, or return the config error from vfio_cxl_reset(). For the patches picked from LKML, I verified they match the LKML posting exactly and for those that did not match, that your backport was correct and matched the context notes in the commit message:
Note: By “identical”, I mean identical patch content by git patch-id --verbatim against the LKML v2 patch, not byte-identical commit metadata. 54d50bb vfio: Remove the get_region_info op |
I think I'm going to let Manish review this then. This RFC series is going through internal review with Vikram, Alex, etc. continuing posting comments. I'll ask them if they are in need to want this in now so they have a story for QS kernel supporting vfio cxl or if they are okay waiting until this is posted to LKML or have all these AI / Cursor identified concerns fixed in their series. Maybe Manish should run cursor/AI on his patch series while developing the patches. |
These two commits are not among the commits I included in this PR right? Mine PR commits started after Starting at |
My mistake, you are correct. |
Export two helpers for VFIO: - pci_cxl_reset_capable() - cxl_dev_reset() The change does not alter the reset flow itself, the capability checks, or the sysfs ABI. It only lifts the helper out of the private path so later VFIO patches can call the same code. Signed-off-by: Manish Honap <mhonap@nvidia.com> Signed-off-by: Jiandi An <jan@nvidia.com>
This change adds/renames the vfio-cxl code nuggets to better suite the cxl-reset handling mechanism in later patches. - Rename the CXL DPA region helpers to prepare_reset() and finish_reset so call sites read as a matched pair around pci_try_reset_function Also call prepare_reset()/finish_reset() around pci_try_reset_function() in both the PCIe BCR FLR path and the Function FLR path, matching the logic already used on the VFIO_DEVICE_RESET ioctl path. - When pci_try_reset_function() fails: finish_reset() consults the hardware COMMITTED state before re-enabling the DPA mapping, so it is safe on error and avoids leaving the DPA region wedged off after a transient reset failure. - Add vfio_cxl_reset_capable(), a small wrapper over pci_cxl_reset_capable() Signed-off-by: Manish Honap <mhonap@nvidia.com> Signed-off-by: Jiandi An <jan@nvidia.com>
…e post-reset BAR access A reset caller may disable Memory Space to quiesce device DMA before issuing the reset. pci_try_reset_function() saves and restores PCI_COMMAND around the FLR. If the memory space was disabled before FLR, it will be restored in disabled state. vfio_cxl_finish_reset() reads HDM decoder registers through the component register BAR immediately after reset. Accessing a BAR with Memory Space disabled produces an Unsupported Request completion; on platforms that promote UR to a fatal error this triggers DPC. Add vfio_cxl_enable_memory_space() and call it at the start of vfio_cxl_finish_reset() before touching any BAR. Signed-off-by: Manish Honap <mhonap@nvidia.com> Signed-off-by: Jiandi An <jan@nvidia.com>
…ss reset reinit_comp_regs() mirrors post-reset hardware state (all-zeros) into comp_reg_virt[], including HDM decoder BASE registers. For decoders that the device manager committed with a guest-physical address before the reset, pci_dev_restore() re-commits the hardware decoders with the host-physical base. The kernel provides no notification that BASE was cleared during reinit, so the emulated GPA bases are silently lost. Add vfio_cxl_reinit_hdm_shadow() which snapshots the GPA decoder bases before calling reinit_comp_regs() and restores them after, keeping the emulated decoder consistent with what the device manager set. Signed-off-by: Manish Honap <mhonap@nvidia.com> Signed-off-by: Jiandi An <jan@nvidia.com>
…nfig shadow STATUS2 was read directly from hardware while all other DVSEC registers were served from the vconfig shadow. This created two problems: 1. VOLATILE_HDM_PRES_ERROR (RW1CS, bit 3): guest writes cleared the hardware bit but the shadow was not updated, so subsequent reads still returned the set bit from hardware (which the hardware had cleared). 2. CXL_RESET_COMPLETE and CXL_RESET_ERROR (bits 1-2): these outcome bits will be written by vfio_cxl_reset() into the shadow after a protocol reset. Hardware does not update them on its own; serving reads from hardware would hide the outcome from the guest. Add STATUS2 to the read switch so reads come from the shadow, and update cxl_dvsec_status2_write() to mirror VOLATILE_HDM_PRES_ERROR clears into the shadow after forwarding to hardware. Signed-off-by: Manish Honap <mhonap@nvidia.com> Signed-off-by: Jiandi An <jan@nvidia.com>
Add vfio_cxl_reset() to drive a CXL protocol reset on behalf of a guest. Unlike cxl_do_reset(), this path skips host memory offlining since the DPA region is guest memory. The function takes memory_lock for the full sequence, calls vfio_cxl_prepare_reset() to zap DPA region PTEs, drives the hardware via pci_dev_save_and_disable() + cxl_dev_reset() + pci_dev_restore(), then calls vfio_cxl_finish_reset() to reinitialise emulated state. STATUS2 outcome bits (CXL_RESET_COMPLETE / CXL_RESET_ERROR) are written back to vconfig after the reset so the guest can poll for result without reading hardware. pci_dev_restore() overwrites the saved pre-reset state, so the hardware value is re-read after restore before the outcome is stamped. When the guest writes INIT_CXL_RST into DVSEC CONTROL2, invoke vfio_cxl_reset() to perform a CXL protocol reset. The bit is not forwarded to hardware; cxl_dev_reset() drives the reset sequence directly. Silently drop writes on devices that do not advertise RST_CAPABLE to avoid log noise for the reserved-bit case. Signed-off-by: Manish Honap <mhonap@nvidia.com> Signed-off-by: Jiandi An <jan@nvidia.com>
… passthrough Enable VFIO CXL core support on amd64 and arm64 to allow CXL Type-2 device passthrough via vfio-pci. Signed-off-by: Jiandi An <jan@nvidia.com>
502020b to
aef7e33
Compare
|
lastest CI is clean just missing LP link: nirmoy#14 (comment) Update: @JiandiAnNVIDIA you can edit the PR description to retrigger the github action |
|
I re-reviewed the latest updates with Codex and it confirmed that the updates address findings 1 and 5 from my previous review. After providing more data from a VR system with CXL devices, I was able to conclude that issues 3 and 4 are unfounded with Strata. Finding 2 remains, but it is scoped to the CXL VFIO devices and reset paths (Non-CXL VFIO devices are unaffected because vfio_cxl_finish_reset() returns when vdev->cxl == NULL.), so I am not concerned about a regression.
|
Description
This patch series adds VFIO CXL Type-2 device passthrough support to the nvidia-6.17 kernel, enabling CXL-capable accelerator devices to be assigned to virtual machines via VFIO. It includes:
get_region_inforefactoring - Upstream series that splitsVFIO_DEVICE_GET_REGION_INFOinto its own driver op and introducesget_region_info_caps, which is a prerequisite for the CXL VFIO region implementationKey Features Added:
cxl_dev_reset)disable_cxlfor per-device opt-outinclude/uapi/cxl/cxl_regs.h) for CXL register definesLP: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.17/+bug/2152222
Justification
VFIO CXL passthrough is required for assigning CXL Type-2 accelerator devices (GPUs, SmartNICs) to virtual machines:
Source
Patch Breakdown (51 patches):
torvalds/master(merged)get_region_infoseriestorvalds/master(merged in v6.19)Notes on upstream prerequisites (item 1):
Three upstream commits cherry-picked:
4868d2d52df6— crypto: hisilicon - qm updates BAR configuration2131c1517f30— hisi_acc_vfio_pci: adapt to new migration configuration767b1ed8b980— vfio/nvgrace-gpu: fix grammatical errorThe first two resolve a dependency for
e238f147d517("vfio/hisi: Convertto the get_region_info op"). The third fixes a pre-existing comment typo in
the nvgrace-gpu driver that would otherwise cause a patch-ID mismatch with
upstream
1b0ecb5baf4a("vfio/pci: Convert all PCI drivers toget_region_info_caps").
Notes on the VFIO get_region_info series (item 2):
22 upstream commits from Jason Gunthorpe's series, already merged in v6.19:
https://lore.kernel.org/all/0-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com/
These refactor the VFIO region info infrastructure that the CXL VFIO
passthrough series depends on.
Notes on Manish's VFIO CXL series (item 3):
19 out of 20 patches ported from:
https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/
Patch 20/20 (selftests) was skipped as the upstream VFIO selftest
infrastructure (
tools/testing/selftests/vfio/) is not present inthe NV-Kernels base.
Conflict resolutions were required for 10 of 19 patches due to the
NV-Kernels base diverging from upstream in two ways:
cxl_find_regblock,cxl_probe_component_regs,cxl_await_range_active,cxl_regblock_get_bar_info) are ininclude/cxl/pci.hunconditionally (per Srirangan/Alejandro convention from PR [linux-nvidia-6.17-next] Add CXL Type-2 device support, RAS error handling, reset, state save/restore, and interleaving support #342),
rather than in
include/cxl/cxl.hwithCONFIG_CXL_BUSguardsas Manish's patches expect.
xedriver,dmabuf, andp2pdmasupport causescontext mismatches in Kconfig, Makefiles, and VFIO headers.
Notes on Manish's CXL reset series (item 4):
6 patches from internal RFC-v2 posting:
Patch 1/6 had a conflict resolution identical to item 3 (declarations
added to
include/cxl/pci.hinstead ofinclude/cxl/cxl.h).Lore Links:
Jason Gunthorpe's VFIO get_region_info series (v2, merged in v6.19):
https://lore.kernel.org/all/0-v2-2a9e24d62f1b+e10a-vfio_get_region_info_op_jgg@nvidia.com/
Manish Honap's VFIO CXL Type-2 passthrough series (v2):
https://lore.kernel.org/linux-cxl/20260401143917.108413-1-mhonap@nvidia.com/
Upstream Status:
torvalds/mastertorvalds/master(v6.19)Testing
Build Validation:
Config Verification:
CXL VFIO config enabled:
Runtime Testing:
Notes
CONFIG_VFIO_CXL_COREis a newboolconfig enabled for both amd64 andarm64. It depends on
VFIO_PCI_CORE(module),CXL_BUS(built-in), andCXL_MEM(built-in). As a bool, it compiles into thevfio-pci-coremodule.(Alejandro's v23, Srirangan's save/restore and reset series).
include/uapi/cxl/cxl_regs.his introduced for CXLcomponent and HDM register defines, using UAPI-safe macros (
__GENMASK,_BITUL) and raw hex sizes instead of kernel-internalSZ_*macros.intentionally skipped as the upstream VFIO selftest infrastructure is not
present in the NV-Kernels base.