Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[PyTorch] Avoid autograd's gradient accumulation in grouped MLP if possible
#2871 opened Apr 13, 2026 by ksivaman Member Loading…
6 of 14 tasks
Cute Dsl kernel for Wgrad for Fused MOE Layer
#2869 opened Apr 13, 2026 by vthumbe1503 Collaborator Loading…
13 tasks
Optimizations for MXFP8/NVFP4 dequantize kernels
#2865 opened Apr 10, 2026 by YigongQin Draft
2 of 13 tasks
Adds GEMM Profiling Guide to TE
#2863 opened Apr 9, 2026 by jomitchellnv Contributor Loading…
7 tasks
[DO NOT MERGE] Test CI
#2862 opened Apr 9, 2026 by cyanguwa Collaborator Draft
13 tasks
Strip local version labels from package version checks
#2858 opened Apr 8, 2026 by pstjohn Contributor Loading…
Add cpplint and ruff linter to pre-commit and fix lint violations
#2853 opened Apr 8, 2026 by pstjohn Contributor Loading…
Bump transformers from 4.55.0 to 5.0.0rc3 in /docs/examples/te_gemma dependencies Pull requests that update a dependency file python Pull requests that update python code
#2851 opened Apr 8, 2026 by dependabot bot Loading…
Bump transformers from 4.57.0 to 5.0.0rc3 in /docs/examples/te_llama dependencies Pull requests that update a dependency file python Pull requests that update python code
#2850 opened Apr 8, 2026 by dependabot bot Loading…
Skip activation kernels when tensor size is zero bug Something isn't working
#2848 opened Apr 8, 2026 by timmoon10 Collaborator Loading…
8 of 13 tasks
[Common] Multicast Fixes
#2847 opened Apr 8, 2026 by phu0ngng Collaborator Draft
13 tasks
[Core] Report CUDA versions when NVRTC compilation fails enhancement New feature or request
#2842 opened Apr 7, 2026 by timmoon10 Collaborator Loading…
8 of 13 tasks
Add grouped unswizzle functionality for MXFP8 scaling factors
#2837 opened Apr 5, 2026 by int-smart Contributor Loading…
8 of 13 tasks
fix CUDA architectures cmake logic community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2832 opened Apr 3, 2026 by GaetanLepage Contributor Loading…
2 of 13 tasks
Add capture_time_hooks to make_graphed_callables for non-capturable per-callable hooks
#2831 opened Apr 3, 2026 by buptzyb Contributor Loading…
1 of 13 tasks
Port softmax ops to libtorch stable ABI
#2830 opened Apr 3, 2026 by pstjohn Contributor Loading…
Cp thd swa with ag
#2829 opened Apr 3, 2026 by sudhakarsingh27 Collaborator Draft
13 tasks
[Common] Reduced padding kernel compilation time
#2827 opened Apr 2, 2026 by Oleg-Goncharov Collaborator Loading…
5 of 13 tasks
fix(CP, MLA): CP works fine with MLA in a2a cp_comm_type community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2826 opened Apr 2, 2026 by zhujian19891203 Contributor Loading…
5 of 13 tasks
fix(CP, FA): the conditional logic in the FA version contains a vulnerability when processing the output of Flash Attn forward pass community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2825 opened Apr 2, 2026 by zhujian19891203 Contributor Loading…
5 of 13 tasks
Parallel Test Execution to decrease CI run times
#2824 opened Apr 2, 2026 by sudhakarsingh27 Collaborator Draft
ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.