speed up nvte_multi_padding / nvte_multi_unpadding by matthiasdiener · Pull Request #592 · ROCm/TransformerEngine

matthiasdiener · 2026-05-20T15:25:03Z

Description

Please include a brief summary of the changes, relevant motivation and context.

Fixes https://github.com/ROCm/frameworks-internal/issues/16530

See https://github.com/ROCm/frameworks-internal/issues/16530#issuecomment-4502138388 for performance.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

aris134

LGTM!

alextmagro · 2026-06-03T14:56:00Z

+// falls back to element-wise for partial/unaligned cases.
+// Note: NT loads were also benchmarked but hurt performance.
+template <uint32_t nvec, typename Type>
+__device__ __forceinline__ void nt_store_to_elts(const Vec<Type, nvec>& v,


What is the case where we hit non-aligned vectors? Isn't FP8/MXFP8 always padded to a multiple of 16 by default? Ideally we would template out the NT vs elementwise stores

alextmagro · 2026-06-03T14:58:41Z


+#ifdef __HIP_PLATFORM_AMD__
+  // Process subtiles with vectorized loads/stores
+#pragma unroll


Did you try #pragma unroll 2 here? If we have the registers available that might help performance.

speed up nvte_multi_padding / nvte_multi_unpadding

ce6e865

matthiasdiener requested review from alextmagro and aris134 May 20, 2026 15:25

matthiasdiener self-assigned this May 20, 2026

matthiasdiener added the ci-level 1 CI test level 1 label May 20, 2026

matthiasdiener added 3 commits May 20, 2026 18:37

factor out binary search

a470ecb

Merge branch 'dev' into mdiener/speedup-pad-unpad

45b996a

guard

5f011ae

matthiasdiener marked this pull request as ready for review May 20, 2026 20:01

matthiasdiener requested review from ipanfilo, wangye805 and wenchenvincent as code owners May 20, 2026 20:01

aris134 reviewed Jun 1, 2026

View reviewed changes

Comment thread transformer_engine/common/util/padding.cu Outdated

Merge remote-tracking branch 'origin/dev' into mdiener/speedup-pad-unpad

a35459c

aris134 reviewed Jun 1, 2026

View reviewed changes

Comment thread transformer_engine/common/util/padding.cu

aris134 reviewed Jun 1, 2026

View reviewed changes

Comment thread transformer_engine/common/util/padding.cu Outdated

aris134 requested changes Jun 1, 2026

View reviewed changes

matthiasdiener added 2 commits June 1, 2026 12:56

factor out cols

cb9221d

bump n_warps_per_tile

dc708c6

matthiasdiener requested a review from aris134 June 1, 2026 19:59

matthiasdiener added 2 commits June 1, 2026 15:46

use NT stores

84b7d09

Merge branch 'dev' into mdiener/speedup-pad-unpad

710d3c0

aris134 approved these changes Jun 3, 2026

View reviewed changes

alextmagro reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed up nvte_multi_padding / nvte_multi_unpadding#592

speed up nvte_multi_padding / nvte_multi_unpadding#592
matthiasdiener wants to merge 9 commits into
devfrom
mdiener/speedup-pad-unpad

matthiasdiener commented May 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aris134 left a comment

Uh oh!

alextmagro Jun 3, 2026

Uh oh!

alextmagro Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

matthiasdiener commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aris134 left a comment

Choose a reason for hiding this comment

Uh oh!

alextmagro Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

alextmagro Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

matthiasdiener commented May 20, 2026 •

edited

Loading