HBM Timing Violations on every sp=HBM design

I have been experiencing issues achieving timing closure on all direct-HBM designs. This affects all designs that use `sp=<x>:HBMn`, including example 05_perf. Any thoughts or insights on this would be greatly appreciated.

### Observations

Direct HBM (non-VNOC/MEM) connects an AXI master directly to the HBM NMU. The linking stage fails post-route timing on the platform's 400 MHz clock in _**every such design**_. It is not possible to write user/RM logic that prevents this timing closure problem. The build still completes successfully and writes a `.vbin` without a warning or error, and in many designs, there is no evidence of data corruption. However, some designs (that have been validated through simulation) have shown non-deterministic data corruption, and it is not possible to rule out this timing violation as benign.

The following data is from a minimal direct-HBM vector-add kernel. It has three AXI4 master ports, each pinned to it's own HBM channel.

### Timing report

<img width="1174" height="474" alt="Image" src="https://github.com/user-attachments/assets/a0c15084-f80e-405c-845a-fe620f6f3090" />

Every failing path is a register in the user region driving the static HBM NoC Master Unit (NMU) AXI slave or the same crossing in the other direction. More specifically, it's the per-channel HBM SmartConnect master output driving `axi_noc_cips/HBMxx_AXI`. The SmartConnect crosses clock domains from the user region (on the user clock) up to the fixed 400 MHz static-shell clock for the HBM NMU (clk_wizard_0/clk_out1, exported as static_region_clk, timing name clk_wizard_0_clk_out1_1), which is non-reconfigurable.

The important detail is that there is no logic after the SmartConnect's output. The data path routing delay is 96.7%, and it crosses from the reconfigurable partition into the static shell.

<img width="1920" height="1048" alt="Image" src="https://github.com/user-attachments/assets/a813b18a-072f-49dd-b765-72e2d3de73ce" />

### Why it can't be fixed from user logic

1. Lowering the user clock can't help because the failing path is on the non-configurable 400 MHz clock.
2. A pipeline register can't be placed closer to the NMU. Firstly, the NMU's clock region has no general fabric / SLICE sites. Secondly, the nearest fabric below is not inside the SLASH pblock, so registers cannot be placed there. 
3. Even if the pblock were closer, the data shows that the route delay into the hardened NMU is very similar across the failing endpoints, even when distance varies, which suggests that there's some intrinsic fixed delay incurred from crossing out of the fabric into the hardened block.

<img width="966" height="567" alt="Image" src="https://github.com/user-attachments/assets/309b3684-c46d-4634-84f3-ef5a0a7870a2" />

The highlighted path is the worst timing violation.

<img width="1170" height="592" alt="Image" src="https://github.com/user-attachments/assets/8400e784-9189-4dd2-b4c0-fad476cda2cf" />
<img width="607" height="466" alt="Image" src="https://github.com/user-attachments/assets/6cac01f7-6874-4e56-a25d-702335dfdbc3" />
<img width="871" height="390" alt="Image" src="https://github.com/user-attachments/assets/3092004b-06eb-40f6-a409-40a9e763829b" />

### Notes

With the current floorplanning, the static region clock would need to be clocked at about 325 MHz to meet timing. The only way to meet timing on HBM designs currently is to use a sp=MEM allocator, which route through the user-clock VNOC ingresses and complete the clock crossing on the hardened NoC block. However, it's a large performance drop below 8 channels, and a massive performance drop past that (as the VNOC ingresses must be multiplexed).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HBM Timing Violations on every sp=HBM design #132

Observations

Timing report

Why it can't be fixed from user logic

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

HBM Timing Violations on every sp=HBM design #132

Description

Observations

Timing report

Why it can't be fixed from user logic

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions