Skip to content

[Frontend] Retire dead floor/mod recompile branches in codegen#261

Merged
YWHyuk merged 2 commits into
feature/tog-python-bindingfrom
feature/retire-floormod-recompile
Jun 18, 2026
Merged

[Frontend] Retire dead floor/mod recompile branches in codegen#261
YWHyuk merged 2 commits into
feature/tog-python-bindingfrom
feature/retire-floormod-recompile

Conversation

@YWHyuk

@YWHyuk YWHyuk commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Removes the floor/mod tile-divisibility + RecompileSignal branches in get_dma_info, which are dead now that axis-split + graph-copy (PR #259, on by default) linearize aligned floor/mod at the scheduling layer. Measured 0 block entries across the op suite.

Removed: FloorDiv/ModularIndexing tile-forcing blocks, implicit-ModularIndexing index rewrite, implicit_local_dims, the dead ModularIndexing branch in dram_stride, is_modular_indexing, write-only implicit_dim_size, unused import sys.

Kept: non-floor/mod recompile paths (index-divisibility, indirect, non-power-of-2 vec), RecompileSignal, retry loop. Upstream implicit_dim_ops tile-forcing untouched (separate follow-up).

Validated end-to-end (Spike + TOGSim): elementwise, gemm, bmm, conv2d, group_conv, pool, cat, floor/mod suite, reduce, softmax, layernorm, batchnorm, gqa -- all pass, 0 recompiles.

NOTE: depends on #259 (axis-split). Stacked on top of it; rebase onto develop once #259 merges.

🤖 Generated with Claude Code

axis-split + graph-copy (on by default) linearize aligned floor/mod at the
scheduling layer, so the index reaching get_dma_info is affine and the
FloorDiv/ModularIndexing tile-divisibility branches there are never entered
(measured: 0 entries across elementwise, gemm, bmm, conv, cat, floor/mod,
reduce, attention). Remove those dead branches and their orphans:

  - the FloorDiv and ModularIndexing tile-forcing + RecompileSignal blocks
  - the implicit-ModularIndexing index rewrite and implicit_local_dims
  - the dead ModularIndexing branch in the dram_stride computation
  - is_modular_indexing, the write-only implicit_dim_size, unused import sys

Kept: the non-floor/mod recompile paths (index-divisibility, indirect access,
non-power-of-2 vec size), RecompileSignal, and the retry loop. The upstream
implicit_dim_ops tile-forcing is left untouched (separate change).

Validated end-to-end (Spike + TOGSim): elementwise, gemm, bmm, conv2d,
group_conv, pool, cat, floor/mod suite, reduce, softmax, layernorm, batchnorm,
gqa -- all pass, 0 recompiles.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@YWHyuk YWHyuk changed the base branch from develop to feature/tog-python-binding June 18, 2026 03:42
…-split)

implicit_dim_ops/extract_dividers/apply_constraints forced the initial tile size
to match a view's floor/mod divider, up front in compute_tile_size. axis-split now
linearizes those views at the scheduling layer, so the forcing is redundant:
disabling it leaves every test allclose-correct and, on the affected kernels,
slightly faster (the forced tile was over-constrained -- batchnorm 1189->1114,
layernorm 4092->3947 cycles; non-floor/mod kernels unchanged).

Remove the machinery and its now-unused imports (ModularIndexing, FloorDiv, Mod,
MemoryDep, StarDep, WeakDep).

Validated end-to-end (Spike + TOGSim): elementwise, gemm, bmm, conv2d, group_conv,
pool, cat, floor/mod suite, reduce, softmax, layernorm, batchnorm, gqa -- all pass,
0 recompiles.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@YWHyuk YWHyuk merged commit 82d47f9 into feature/tog-python-binding Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant