Support HIP GiMMiK vector-width metadata by tomjen12 · Pull Request #567 · PyFR/PyFR

tomjen12 · 2026-06-18T05:51:14Z

Summary

Adds support for HIP GiMMiK kernels with vector-width metadata.
When GiMMiK reports meta['width'], PyFR now uses ceil(n / width) to compute the launch grid. This supports the width variants added in PyFR/GiMMiK#19.

tomjen12 · 2026-06-18T06:19:53Z

Hi @FreddieWitherden , could you please review this PR when you have a chance?
I also noticed I cannot request reviewers from the sidebar. Is reviewer assignment limited to repository members?

FreddieWitherden · 2026-06-18T14:12:18Z

Overall looks good. Figuring out the best way for GiMMiK to expose to callers what grid/block sizes its kernels want as a function of n has always been a fiddly endeavour. I think the addition of width here is sensible. Previously we've done this by having HIP use the compile time n kernel variants since then GiMMiK can just provide the grid and block sizes up-front. However, this is bad for JIT times and did not appear to provide much of a speedup, hence why currently PyFR always asks GiMMiK for dynamic n kernels. We can change this though if fixed n would allow for appreciably better performance (and this is indeed what CUDA does).

tomjen12 · 2026-06-24T01:35:32Z

Hi @FreddieWitherden ,

I noticed that the 7 cases currently excluded by the PyFR-side GiMMiK suitability check (nuq > 128 and density > 0.15) are the p4 tet m132/m460 cases and all p5 tet cases. When I relax this condition locally and allow GiMMiK to be considered, 4 of these 7 cases perform better with GiMMiK than rocBLAS. Do you know the original motivation for this threshold, and whether it was intended to apply equally to the HIP path?

PyFR/pyfr/backends/hip/gimmik.py

Line 38 in 4d3ec09

if nuq > 128 and nnz / arr.size > 0.15:

FreddieWitherden · 2026-06-24T01:51:49Z

Hi @FreddieWitherden ,

I noticed that the 7 cases currently excluded by the PyFR-side GiMMiK suitability check (nuq > 128 and density > 0.15) are the p4 tet m132/m460 cases and all p5 tet cases. When I relax this condition locally and allow GiMMiK to be considered, 4 of these 7 cases perform better with GiMMiK than rocBLAS. Do you know the original motivation for this threshold, and whether it was intended to apply equally to the HIP path?

PyFR/pyfr/backends/hip/gimmik.py

Line 38 in 4d3ec09

if nuq > 128 and nnz / arr.size > 0.15:

Originally it was put in place for keeping compilation time down. Historically, for dense operations BLAS libraries have done a better job than GiMMiK and the code sizes for these kernels have resulted in multi-minute compilation times (for kernels which will be rejected by the auto-tuning). The thresholds themselves have not been tuned in a very long time and may not be appropriate for current hardware.

The tet cases, with the exception of M6, are almost entirely dense and so it feels like a BLAS kernel should do better. With that said, recent experience on NVIDIA hardware has suggested that getting dense capabilities into GiMMiK (and being able to specialize on our exact sizes) is worthwhile and so are open to purely dense kernels.

tomjen12 · 2026-06-24T09:00:12Z

Thanks, that makes sense. I will measure the compile/autotune time for the 7 currently excluded tet cases, along with the GiMMiK vs rocBLAS performance, before deciding whether relaxing the threshold is worthwhile.

Support vector-width HIP GiMMiK kernels

3305a81

tomjen12 mentioned this pull request Jun 18, 2026

Add tuned HIP GiMMiK preload-C and width variants with non-temporal loads and stores PyFR/GiMMiK#19

Open

FreddieWitherden self-assigned this Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support HIP GiMMiK vector-width metadata#567

Support HIP GiMMiK vector-width metadata#567
tomjen12 wants to merge 1 commit into
PyFR:developfrom
tomjen12:hip-gimmik-width-support

tomjen12 commented Jun 18, 2026

Uh oh!

tomjen12 commented Jun 18, 2026

Uh oh!

FreddieWitherden commented Jun 18, 2026 •

edited

Loading

Uh oh!

tomjen12 commented Jun 24, 2026

Uh oh!

FreddieWitherden commented Jun 24, 2026

Uh oh!

tomjen12 commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tomjen12 commented Jun 18, 2026

Summary

Uh oh!

tomjen12 commented Jun 18, 2026

Uh oh!

FreddieWitherden commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomjen12 commented Jun 24, 2026

Uh oh!

FreddieWitherden commented Jun 24, 2026

Uh oh!

tomjen12 commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FreddieWitherden commented Jun 18, 2026 •

edited

Loading