diff --git a/.github/pre-commit/spelling_allowlist.txt b/.github/pre-commit/spelling_allowlist.txt index a8c671271a1..4ab50dbf21a 100644 --- a/.github/pre-commit/spelling_allowlist.txt +++ b/.github/pre-commit/spelling_allowlist.txt @@ -100,6 +100,7 @@ Ou POSIX PSIRT PTSBE +PYNQ Pasqal Pauli Paulis @@ -131,10 +132,12 @@ Quake Quantinuum RDMA REPL +RFSoC RHEL RPC RSA RSH +RTL Realtime RoCE SDK @@ -157,6 +160,7 @@ UCCSD VQE Vazirani Verilog +Vivado WSL Xcode Zener diff --git a/docs/sphinx/using/realtime/host.md b/docs/sphinx/using/realtime/host.md index ab5ea40871a..8495e900cd7 100644 --- a/docs/sphinx/using/realtime/host.md +++ b/docs/sphinx/using/realtime/host.md @@ -18,7 +18,7 @@ using RDMA (Remote Direct Memory Access) via ConnectX NIC's. In the context of quantum error correction, HSB is one example of a transport mechanism that connects the quantum control system (typically an FPGA) to GPU-based decoders. -**Repository**: [`nvidia-holoscan`/`holoscan-sensor-bridge` (`nvqlink` branch)](https://github.com/nvidia-holoscan/holoscan-sensor-bridge/tree/nvqlink) +**Repository**: [`nvidia-holoscan`/`holoscan-sensor-bridge` (`2.6.0-EA2` tag)](https://github.com/nvidia-holoscan/holoscan-sensor-bridge/tree/2.6.0-EA2) HSB handles: @@ -392,7 +392,7 @@ struct RPCResponse { }; ``` -Both structs are 24 bytes, packed with no padding. See `cudaq_realtime_message_protocol.bs` +Both structs are 24 bytes, packed with no padding. See `cudaq_realtime_message_protocol.md` for `request_id` and `ptp_timestamp` semantics. Payload conventions: diff --git a/docs/sphinx/using/realtime/installation.rst b/docs/sphinx/using/realtime/installation.rst index 1a3319462ee..c8beec5d2d6 100644 --- a/docs/sphinx/using/realtime/installation.rst +++ b/docs/sphinx/using/realtime/installation.rst @@ -26,6 +26,22 @@ Prerequisites - CUDA Runtime with version 12.6+ or 13.x +.. _realtime-hsb-fpga-artifacts: + +HSB FPGA IP core and RFSoC bit-file +----------------------------------- + +The primary FPGA deliverable is the open-source HSB FPGA IP core, ``nv_hsb_ip``. +Integrate this RTL source into your FPGA design when you want to use HSB with your FPGA target. + +The HSB 2.6.0-EA release also provides a fully packaged RFSoC example for the Real Digital RFSoC 4x2 evaluation board, using Vivado part ``xczu48dr-ffvg1517-2-e``. +The `HSB 2.6.0-EA artifact directory `__ contains the pre-built ``nvqlink_rfsoc_v2603.bit`` bit-file and the ``pynq_rfsoc_2603_EA_release.zip`` RFSoC PYNQ reference-design archive. +The matching ``nv_hsb_ip`` source directory is in the `Holoscan Sensor Bridge release-2.6.0-EA branch `__. + +When building the RFSoC project from the PYNQ archive, place the ``nv_hsb_ip`` directory from that release branch at the same level as the archive's ``pynq`` directory. +Do not mix ``nv_hsb_ip`` from an older HSB release with the HSB 2.6.0-EA RFSoC files. +For another RFSoC part or board, update the Vivado part and constraints in ``pynq/rfsoc-pynq/build/build.tcl`` and rebuild the bit-file. + Setup --------------------- @@ -39,8 +55,8 @@ Setup - Follow the instructions given by the installer for post-installation steps to set environment variables. - - Load HSB IP bit-file to the FPGA. - The bit-file for supported FPGA vendors can be found `here `__. + - Program the FPGA with HSB. + See :ref:`realtime-hsb-fpga-artifacts` for the reusable ``nv_hsb_ip`` RTL source and the packaged RFSoC example bit-file. .. note:: @@ -114,7 +130,7 @@ The validation includes checking the data correctness and measuring the round-tr .. tab:: Using Custom Networking Layer - To measure the latency with a custom networking implementation, a stimulus (data generation) tool must the implemented that sends data to CUDA-Q realtime according to the custom networking protocol. + To measure latency with a custom networking implementation, implement a stimulus (data generation) tool that sends data to CUDA-Q Realtime according to the custom networking protocol. For example, in the HSB-based implementation, we use the `ptp_timestamp` field in the `RPCHeader` / `RPCResponse` (see the message protocol documentation) to capture the timestamp for latency analysis. Specifically, the stimulus tool (FPGA) stores the 'send' timestamp in the `RPCHeader` (incoming message), which will be echoed by the GPU in the outgoing `RPCResponse` after processing it (e.g., with the RPC handler). Using the Integrated Logic Analyzer timestamp when the FPGA receives the response from the GPU, we can compute the round-trip latency. `This file `__ contains an example of such a data generation tool. diff --git a/realtime/docs/building.md b/realtime/docs/building.md index 4d97eb0171d..c7b10e1de84 100644 --- a/realtime/docs/building.md +++ b/realtime/docs/building.md @@ -111,8 +111,9 @@ sub-directory in CUDA-Q source tree. To run the end-to-end RPC dispatch testing between FPGA and GPU using CUDA-Q Realtime and Holoscan Sensor Bridge, -- Load the `HSB` bit-file into the FPGA. -The bit-file can be obtained from [here](https://github.com/nvidia-holoscan/holoscan-sensor-bridge/tree/release-2.6.0-EA). +- Program the FPGA with `HSB`. + See the [CUDA-Q Realtime installation docs](https://nvidia.github.io/cuda-quantum/latest/using/realtime/installation.html#realtime-hsb-fpga-artifacts) + for the reusable `nv_hsb_ip` RTL source and the packaged RFSoC example bit-file. - Run the test script (at `cuda-quantum/realtime/unittests/utils/hololink_test.sh`). For example, diff --git a/realtime/docs/cudaq_realtime_host_api.md b/realtime/docs/cudaq_realtime_host_api.md index b249ed7ef9f..46fa5e2d723 100644 --- a/realtime/docs/cudaq_realtime_host_api.md +++ b/realtime/docs/cudaq_realtime_host_api.md @@ -355,7 +355,7 @@ TX Slot: | RPCResponse | response payload bytes | ``` Payload encoding details (type system, multi-argument encoding, bit-packing, -and QEC-specific examples) are defined in `cudaq_realtime_message_protocol.bs`. +and QEC-specific examples) are defined in `cudaq_realtime_message_protocol.md`. Magic values (little-endian 32-bit): @@ -381,7 +381,7 @@ struct RPCResponse { }; ``` -Both structs are 24 bytes, packed with no padding. See `cudaq_realtime_message_protocol.bs` +Both structs are 24 bytes, packed with no padding. See `cudaq_realtime_message_protocol.md` for `request_id` and `ptp_timestamp` semantics. Payload conventions: diff --git a/realtime/docs/nvqlink_latency_demo.md b/realtime/docs/nvqlink_latency_demo.md index 2637241cb61..249d2912519 100644 --- a/realtime/docs/nvqlink_latency_demo.md +++ b/realtime/docs/nvqlink_latency_demo.md @@ -1,6 +1,13 @@ # Steps to execute the NVQLink latency demo -The source Verilog code can be found [here](https://edge.urm.nvidia.com/artifactory/sw-holoscan-thirdparty-generic-local/QEC/HSB-2.6.0-EA/). +Start from the HSB FPGA artifacts described in the +[CUDA-Q Realtime installation docs](https://nvidia.github.io/cuda-quantum/latest/using/realtime/installation.html#realtime-hsb-fpga-artifacts). +For this demo, use the packaged RFSoC PYNQ reference design and place the +matching `nv_hsb_ip` RTL source as described there. + +The RFSoC PYNQ archive contains the top-level RTL, Vivado build scripts, +Integrated Logic Analyzer (`ILA`) and latency scripts, and the pre-built +bit-file for the packaged RFSoC example. More details about how the `Holoscan Sensor Bridge` (`HSB`) IP can be incorporated can be found [here](https://docs.nvidia.com/holoscan/sensor-bridge/latest/fpga_index.html) @@ -15,7 +22,7 @@ the capabilities required. ## Steps to do the experiment -1. Load the bit-file into the FPGA. +1. Load the packaged RFSoC example bit-file into the FPGA. 2. Setup the host to run the experiment. Mainly the IP address of the NIC needs to be set to `192.168.0.101`. More details can be found at the diff --git a/realtime/docs/user_guide.md b/realtime/docs/user_guide.md index 98dd1abf1e4..cfc91acd322 100644 --- a/realtime/docs/user_guide.md +++ b/realtime/docs/user_guide.md @@ -11,9 +11,9 @@ CUDA-Q Realtime, including connectivity to a - A host system with NVIDIA GPU and ConnectX-7/BlueField NIC. -- A FPGA, programmed with `HSB` IP and connected to the NIC. +- An FPGA, programmed with `HSB` IP and connected to the NIC. -> **_NOTE:_** We recommended using NVIDIA ConnectX-7 as prior generations +> **_NOTE:_** We recommend using NVIDIA ConnectX-7 as prior generations may not have all the required capabilities. ### Software Components @@ -26,7 +26,7 @@ may not have all the required capabilities. with `gpunetio` support. > **_NOTE:_** `DOCA` is required to run the end-to-end validation with FPGA -using the builtin `HSB` support of CUDA-Q realtime. +using the built-in `HSB` support of CUDA-Q Realtime. @@ -59,12 +59,12 @@ Please refer to this [section](#using-docker) for instructions. > export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/nvidia/cudaq/realtime/lib > ``` -2. Load `HSB` IP bit-file to the FPGA +2. Program the FPGA with `HSB` - The bit-file for supported FPGA vendors - can be found [here](https://edge.urm.nvidia.com/artifactory/sw-holoscan-thirdparty-generic-local/QEC/HSB-2.6.0-EA/). + See the [CUDA-Q Realtime installation docs](https://nvidia.github.io/cuda-quantum/latest/using/realtime/installation.html#realtime-hsb-fpga-artifacts) + for the reusable `nv_hsb_ip` RTL source and the packaged RFSoC example bit-file. - > **_NOTE:_** Please make sure set up the [host system](https://docs.nvidia.com/holoscan/sensor-bridge/latest/setup.html) + > **_NOTE:_** Please make sure to set up the [host system](https://docs.nvidia.com/holoscan/sensor-bridge/latest/setup.html) and the `HSB` FPGA device [IP address](https://docs.nvidia.com/holoscan/sensor-bridge/latest/architecture.html#datachannel-enumeration-and-ip-address-configuration) (if not already done so).