Skip to content

[Frontend] vcix: lower fused matmul whose operand is vector_store'd, fix exp chunking#265

Closed
YWHyuk wants to merge 1 commit into
feature/tog-python-bindingfrom
fix/vcix-nchunk
Closed

[Frontend] vcix: lower fused matmul whose operand is vector_store'd, fix exp chunking#265
YWHyuk wants to merge 1 commit into
feature/tog-python-bindingfrom
fix/vcix-nchunk

Conversation

@YWHyuk

@YWHyuk YWHyuk commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Fixes SDPA, which was broken on this branch by the C++->Python vcix port (lower_to_vcix.py). Bisected decisively: C++ vcix passes SDPA, Python vcix does not; exp-chunking and dma-fine-grained were ruled out, leaving _lower_matmul.

1. Fused matmul left un-lowered (the numerical bug). _lower_matmul bailed on ATag is None or BTag is None -- i.e. it required an MVIN dma_start tag for both operands. In SDPA's scores·V matmul, operand B is the softmax output produced in place by affine.vector_store, not DMAed, so BTag stayed None and the matmul was left in the IR -> wrong attention output. The C++ MatmulOpLowering marks an operand initialized by either a dma_start or a preceding affine.vector_store into its root memref; the Python port omitted that branch. Restored it (isAInit/isBInit). BTag/BAsync stay None/0 and are only read under if BAsync:, so the B dma_wait is correctly skipped, matching C++.

2. n>1 transcendental chunking crash. _make_sf_vc_v_iv called vector.ExtractStridedSliceOp(offsets, sizes, strides, vec) -- wrong arg order, missing the result type and vector operand -> TypeError under these MLIR bindings. Fixed to (result=legal_ty, vector=vec, offsets, sizes, strides). Only large transcendentals (n>1, e.g. SDPA softmax exp) reach it, so CI's small-tile (n==1) tests never hit it.

Validated end-to-end (Spike+TOGSim allclose): SDPA 56 cases pass (was crash/wrong); matmul/bmm/conv2d regress clean. Pass-level: after the fix the Python vcix output is byte-identical to mlir-opt -test-pytorchsim-to-vcix for the SDPA kernel.

🤖 Generated with Claude Code

…fix exp chunk

Two fixes to the C++->Python vcix port (lower_to_vcix.py) that SDPA exercises but
the gemm/bmm/conv tests do not:

- _lower_matmul bailed with 'if ATag is None or BTag is None: return False',
  gating on an MVIN dma_start tag for both operands. In SDPA's fused scores.V
  matmul, operand B is the softmax output produced in place by affine.vector_store,
  not DMAed, so BTag stayed None and the matmul was left un-lowered -> wrong
  attention output. Mirror the C++ MatmulOpLowering: an operand is initialized by
  either a dma_start OR a preceding affine.vector_store into its root memref; bail
  only when an operand is truly uninitialized. BTag/BAsync stay None/0 and are only
  read under 'if BAsync:', so the B dma_wait is correctly skipped (as in C++).

- _make_sf_vc_v_iv n>1 transcendental chunking called
  vector.ExtractStridedSliceOp(offsets, sizes, strides, vec) -- wrong arg order,
  missing the result type and vector operand, raising TypeError under these MLIR
  bindings. Pass (result=legal_ty, vector=vec, offsets, sizes, strides). Only
  reached by large transcendentals (n>1), e.g. SDPA softmax exp, so CI's small-tile
  (n==1) tests never hit it.

Validated end-to-end (Spike+TOGSim allclose): SDPA 56 cases pass (was crash/wrong);
matmul/bmm/conv2d regress clean. Bisected: C++ vcix passes SDPA, Python vcix did
not; exp chunking and fine-grained ruled out separately.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@YWHyuk

YWHyuk commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

Folded into #264 (cherry-picked, conflict resolved keeping both the C++-parity subtileK guard and the vector_store init branch). Closing this in favor of #264, which now carries the SDPA vcix fused-matmul + exp-chunk fix together with the other review fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant