Support HIP GiMMiK vector-width metadata#567
Conversation
|
Hi @FreddieWitherden , could you please review this PR when you have a chance? |
|
Overall looks good. Figuring out the best way for GiMMiK to expose to callers what grid/block sizes its kernels want as a function of |
|
Hi @FreddieWitherden , I noticed that the 7 cases currently excluded by the PyFR-side GiMMiK suitability check (nuq > 128 and density > 0.15) are the p4 tet m132/m460 cases and all p5 tet cases. When I relax this condition locally and allow GiMMiK to be considered, 4 of these 7 cases perform better with GiMMiK than rocBLAS. Do you know the original motivation for this threshold, and whether it was intended to apply equally to the HIP path? PyFR/pyfr/backends/hip/gimmik.py Line 38 in 4d3ec09 |
Originally it was put in place for keeping compilation time down. Historically, for dense operations BLAS libraries have done a better job than GiMMiK and the code sizes for these kernels have resulted in multi-minute compilation times (for kernels which will be rejected by the auto-tuning). The thresholds themselves have not been tuned in a very long time and may not be appropriate for current hardware. The tet cases, with the exception of M6, are almost entirely dense and so it feels like a BLAS kernel should do better. With that said, recent experience on NVIDIA hardware has suggested that getting dense capabilities into GiMMiK (and being able to specialize on our exact sizes) is worthwhile and so are open to purely dense kernels. |
|
Thanks, that makes sense. I will measure the compile/autotune time for the 7 currently excluded tet cases, along with the GiMMiK vs rocBLAS performance, before deciding whether relaxing the threshold is worthwhile. |
Summary
Adds support for HIP GiMMiK kernels with vector-width metadata.
When GiMMiK reports
meta['width'], PyFR now usesceil(n / width)to compute the launch grid. This supports the width variants added in PyFR/GiMMiK#19.