Fix MXFP4 triton kernel launch on triton>=3.x (seed=None -> int)#9
Open
kylefoxaustin wants to merge 1 commit into
Open
Fix MXFP4 triton kernel launch on triton>=3.x (seed=None -> int)#9kylefoxaustin wants to merge 1 commit into
kylefoxaustin wants to merge 1 commit into
Conversation
The mxfp4 forward kernel declares `seed: int` (non-constexpr). The wrapper passes seed=None when stochastic_round is False, which triton>=3.x rejects at launch with "'NoneType' object cannot be interpreted as an integer". seed is only read inside the stochastic_round=True branch (dead-code-eliminated when False), so passing 0 is safe and behavior-preserving.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Pseudo-quantization training with
QuestMXFP4Quantizer(the config inmain_setup.sh) crashes at the first forward pass on recent triton:Root cause
In
mxfp4_forward_kernel_wrapper,seedis set toNonewhenstochastic_roundisFalse:The kernel declares
seed: int(non-constexpr), so triton tries to pack it into the launch arguments. Older triton tolerated an unusedNone; triton >= 3.x rejects it at launch time. Sinceseedis only read inside theif stochastic_round:branch (dead-code-eliminated whenFalse), the value is never used in this path.Fix
Pass
0instead ofNone— behavior-preserving (the value is unread whenstochastic_round=False).Testing
With this one-line change, the full recipe (
QuestMXFP4weights+activations +AlbertTsengQuantizer4-bit gradients + theQ(E)Q(Wt)t_Q(Et)Q(Xt)tbackward scheme) trains end-to-end — verified on triton 3.6 / torch 2.11+cu128 / RTX 5090 (sm_120), 30M Llama on WikiText-103, FP4 val-loss tracking the BF16 baseline to within ~0.07 nats.