[Frontend] Retire dead floor/mod recompile branches in codegen#261
Merged
YWHyuk merged 2 commits intoJun 18, 2026
Merged
Conversation
axis-split + graph-copy (on by default) linearize aligned floor/mod at the scheduling layer, so the index reaching get_dma_info is affine and the FloorDiv/ModularIndexing tile-divisibility branches there are never entered (measured: 0 entries across elementwise, gemm, bmm, conv, cat, floor/mod, reduce, attention). Remove those dead branches and their orphans: - the FloorDiv and ModularIndexing tile-forcing + RecompileSignal blocks - the implicit-ModularIndexing index rewrite and implicit_local_dims - the dead ModularIndexing branch in the dram_stride computation - is_modular_indexing, the write-only implicit_dim_size, unused import sys Kept: the non-floor/mod recompile paths (index-divisibility, indirect access, non-power-of-2 vec size), RecompileSignal, and the retry loop. The upstream implicit_dim_ops tile-forcing is left untouched (separate change). Validated end-to-end (Spike + TOGSim): elementwise, gemm, bmm, conv2d, group_conv, pool, cat, floor/mod suite, reduce, softmax, layernorm, batchnorm, gqa -- all pass, 0 recompiles. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-split) implicit_dim_ops/extract_dividers/apply_constraints forced the initial tile size to match a view's floor/mod divider, up front in compute_tile_size. axis-split now linearizes those views at the scheduling layer, so the forcing is redundant: disabling it leaves every test allclose-correct and, on the affected kernels, slightly faster (the forced tile was over-constrained -- batchnorm 1189->1114, layernorm 4092->3947 cycles; non-floor/mod kernels unchanged). Remove the machinery and its now-unused imports (ModularIndexing, FloorDiv, Mod, MemoryDep, StarDep, WeakDep). Validated end-to-end (Spike + TOGSim): elementwise, gemm, bmm, conv2d, group_conv, pool, cat, floor/mod suite, reduce, softmax, layernorm, batchnorm, gqa -- all pass, 0 recompiles. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Removes the floor/mod tile-divisibility + RecompileSignal branches in
get_dma_info, which are dead now that axis-split + graph-copy (PR #259, on by default) linearize aligned floor/mod at the scheduling layer. Measured 0 block entries across the op suite.Removed: FloorDiv/ModularIndexing tile-forcing blocks, implicit-ModularIndexing index rewrite,
implicit_local_dims, the dead ModularIndexing branch in dram_stride,is_modular_indexing, write-onlyimplicit_dim_size, unusedimport sys.Kept: non-floor/mod recompile paths (index-divisibility, indirect, non-power-of-2 vec),
RecompileSignal, retry loop. Upstreamimplicit_dim_opstile-forcing untouched (separate follow-up).Validated end-to-end (Spike + TOGSim): elementwise, gemm, bmm, conv2d, group_conv, pool, cat, floor/mod suite, reduce, softmax, layernorm, batchnorm, gqa -- all pass, 0 recompiles.
NOTE: depends on #259 (axis-split). Stacked on top of it; rebase onto develop once #259 merges.
🤖 Generated with Claude Code