Keep develop in sync with main after 1.24.0 release by Anerudhan · Pull Request #245 · NVIDIA/cudnn-frontend

Anerudhan · 2026-05-20T19:25:56Z

No description provided.

Updated contributing guidelines to streamline contribution process and clarify expectations.

Added a section for tech talks with a link to a YouTube video.

Added acknowledgements for the Native Sparse Attention fprop kernels implementation.

* # cuDNN Frontend v1.24.0 Release Notes cuDNN Frontend v1.24.0 is the recommended version for [cuDNN 9.22.0](https://docs.nvidia.com/deeplearning/cudnn/backend/latest/release-notes.html#cudnn-9-22-0) and later releases. ## General Improvements 🚀 🚀 ### Updates to Graph API - Rotary Position Embedding (RoPE) is now available as an NVRTC-compiled open-source kernel, usable both standalone and as a preprocessing stage for the SDPA engine. See the [sample](test/python/test_oss_rope.py) for usage. RoPE fusion with SDPA requires cuDNN 9.24.0. - SDPA backward now supports hidden dimension `d=256`. Requires cuDNN 9.23.0 or later. ## Open-Source Kernels 🚀 🚀 - Introduced a DSA module featuring the following DSA/CSA kernels for DsV4: - **Indexer Forward**: CuTe-DSL score kernel (Q @ Kᵗ, ReLU, head reduce, ratio causal mask). Non-fused; pair with **Indexer Top-K** for the top-K stage. - **Indexer Top-K**: SM100 CuTe-DSL radix top-K kernel with per-row ``seq_lens``. - **Sparse Attention Backward**: DSA backward (FlashMLA-shape, SM90/SM100). - **Sparse Indexer / Attention Score Recompute**: Sparse (top-K) recomputation of indexer and attention scores for training loss. - **Dense Indexer / Attention Score Recompute**: Dense (full-KV) analogues of the above. - **Indexer Backward**: Three-stage pipeline (score-grad, three GEMMs, dtype cast) for sparse top-K score tensors. - **Dense Indexer Backward**: Full-KV counterpart of Indexer Backward. - Grouped GEMM GLU forward kernel with fused Hadamard transform. ## Skills - Added a new Claude skill for converting cuteDSL kernels into experimental cuDNN APIs. ## Enhancements - Noisy logging messages are now emitted only once per process. - Convolution problems are now rejected when total filter size exceeds `INT32_MAX`. - Support for ragged input order has been added for grouped GEMM weight gradients. ## Bug Fixes - Fixed an issue in the reshape operator when called with 1D tensors. - Fixed missing `square_alpha` scaling in dgeglu and dswiglu. - Fixed a race condition in lazy variant-pack-template preparation observed in some single-threaded scenarios. ## New Samples - Added new samples for [memory-bound fusions](samples/cpp/membound/boolean_fusion.cpp). ## Acknowledgements The Native Sparse Attention forward-prop kernels, supporting head dim = 128 and optimized for the Blackwell architecture, were implemented in CuteDSL. These kernels were a collaborative effort, jointly developed by: Jie Feng, Akash Mehra, Vincent Zhang, Dominik Ernst, Xinbo Zhao, Aditya Vavre, Vedaanta Agarwalla, Mingyang Wang, Anerudhan Gopal, Paul Springer, Yang Xu, and Nima Tajbakhsh. * DSA README with acknowledgements Added acknowledgements section and improved formatting.

Anerudhan added 5 commits May 1, 2026 11:30

Revise contributing guidelines for clarity and engagement (#233)

107786d

Updated contributing guidelines to streamline contribution process and clarify expectations.

Clean up old benchmarks (#234)

50c8aeb

Add tech talks section to README (#238)

834c388

Added a section for tech talks with a link to a YouTube video.

Add acknowledgements section to sparse_attention.md (#240)

3929da2

Added acknowledgements for the Native Sparse Attention fprop kernels implementation.

Anerudhan merged commit 5c51a18 into develop May 20, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep develop in sync with main after 1.24.0 release#245

Keep develop in sync with main after 1.24.0 release#245
Anerudhan merged 5 commits into
developfrom
main

Anerudhan commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Anerudhan commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant