[DNM][AMD] agentx-v0.4 by seungrokj · Pull Request #1654 · SemiAnalysisAI/InferenceX

seungrokj · 2026-06-03T05:51:36Z

Summary

Add minimaxm2.5-fp4-mi355x-vllm-agentic-lmcache and kimik2.5-fp4-mi355x-vllm-agentic-lmcache entries in amd-master.yaml
Add minimaxm2.5_fp4_mi355x.sh agentic benchmark script with LMCache support
Refactor kimik2.5_fp4_mi355x.sh: simplify env vars, build LMCache from source (ROCm HIP), tune LMCACHE_L1_SIZE_GB/TTL/chunk size
Refactor qwen3.5_fp8_mi355x.sh: add HiCache offloading support, add 256k trace corpus cap via WEKA_LOADER_OVERRIDE
Set LMCACHE_CHUNK_SIZE default to 32 for MiniMax agentic script

🤖 Generated with Claude Code

Note

Medium Risk
Changes how long-running Slurm agentic jobs install KV offload (git-built LMCache, larger DRAM knobs with TODOs) and alter CI sweep matrices; misconfiguration could waste cluster time or skew comparability, but there is no production serving or auth impact.

Overview
Extends AMD master CI with new agentic-coding targets: minimaxm2.5-fp4-mi355x-vllm-agentic-lmcache (TP=1, GPU-only vs lmcache) and kimik2.5-fp4-mi355x-vllm-agentic-lmcache (TP=4, same pair, pinned to vLLM 0.21.0). The existing Qwen3.5 agentic HiCache entry bumps the SGLang image, moves the sweep to TP=4, and aligns hicache concurrency with the none grid instead of a high-conc-only hicache slice.

kimik2.5_fp4_mi355x.sh is largely rewritten: it drops the large inline ROCm LMCache / vLLM monkey-patch bundle (demand pinning, chunked KV load, scheduler KV-xfer fix, NIXL/CuPy guards) in favor of cloning and HIP-building LMCache at run time, scales CPU/L1 pools by TP (3 TB node budget), lengthens L1 read TTL, caps traces via WEKA_LOADER_OVERRIDE, and tunes vLLM (fp8 KV, block size 1, no explicit max-model-len). A new minimaxm2.5_fp4_mi355x.sh follows the same LMCache/CPU-offload pattern with MiniMax-specific server flags (e.g. chunk size 32, ROCM_AITER_FA, block size 32).

qwen3.5_fp8_mi355x.sh gains an OFFLOADING switch (none vs hicache with sized CPU hierarchical pools, warmup skip, graph cap) and the same 256k trace override, plus stricter required env vars for the agentic matrix.

^{Reviewed by Cursor Bugbot for commit fc0d0d4. Bugbot is set up for automated code reviews on this repo. Configure here.}

… Kimi/Qwen scripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… to 32 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-06-03T05:51:44Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-06-03T05:51:44Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-06-03T05:51:44Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

… Kimi/MiniMax scripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

….0, expand conc list Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit dc999ef. Configure here.}

cursor · 2026-06-03T06:08:02Z

+#export WEKA_LOADER_OVERRIDE=semianalysis_cc_traces_weka_with_subagents_256k
+#060226
+export WEKA_LOADER_OVERRIDE=semianalysis_cc_traces_weka_with_subagents_060226_256k
+


Trace override on wrong script

Medium Severity

The 256k WEKA_LOADER_OVERRIDE was added only in qwen3.5_fp8_mi355x.sh, but MI355X SGLang jobs prefer qwen3.5_fp8_mi355x_sglang.sh when it exists. Standard launches therefore skip the new corpus cap the PR describes for Qwen agentic HiCache.

^{Reviewed by Cursor Bugbot for commit dc999ef. Configure here.}

cursor · 2026-06-03T06:08:02Z

+fi
+
+export VLLM_ROCM_USE_AITER=1
+export VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT4


Missing AITER RMSNorm disable

Medium Severity

The new MiniMax agentic script enables AITER but never disables AITER RMSNorm for TP below 8. The matrix runs tp: 1, and the Kimi MI355X agentic script explicitly turns off VLLM_ROCM_USE_AITER_RMSNORM for accuracy at low TP.

^{Reviewed by Cursor Bugbot for commit dc999ef. Configure here.}

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

seungrokj and others added 2 commits June 3, 2026 14:49

[AMD] agentx-v0.4: add MiniMax/Kimi lmcache agentic entries, refactor…

7a86001

… Kimi/Qwen scripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

[AMD] minimaxm2.5 agentic: change LMCACHE_CHUNK_SIZE default from 256…

19d1ca5

… to 32 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

seungrokj requested review from 1am9trash, billishyahao, chunfangamd and yctseng0211 as code owners June 3, 2026 05:51

github-project-automation Bot added this to InferenceMAX Board Jun 3, 2026

cursor Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread .github/configs/amd-master.yaml Outdated

Comment thread .github/configs/amd-master.yaml Outdated

Comment thread benchmarks/single_node/agentic/qwen3.5_fp8_mi355x.sh

[AMD] agentx-v0.4: add MODEL_PATH support and --served-model-name for…

d184078

… Kimi/MiniMax scripts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread .github/configs/amd-master.yaml Outdated

Comment thread benchmarks/single_node/agentic/kimik2.5_fp4_mi355x.sh

cursor Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread benchmarks/single_node/agentic/minimaxm2.5_fp4_mi355x.sh

Comment thread benchmarks/single_node/agentic/minimaxm2.5_fp4_mi355x.sh

[AMD] kimik2.5-fp4-mi355x-vllm-agentic-lmcache: fix config, use v0.21…

d2b2826

….0, expand conc list Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread benchmarks/single_node/agentic/kimik2.5_fp4_mi355x.sh

[AMD] agentx-v0.4: fix configs for MiniMax/Kimi/Qwen agentic entries

dc999ef

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed Jun 3, 2026

View reviewed changes

[AMD] qwen3.5-fp8-mi355x-sglang-agentic-hicache: fix runner to mi355x

fc0d0d4

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DNM][AMD] agentx-v0.4#1654

[DNM][AMD] agentx-v0.4#1654
seungrokj wants to merge 6 commits into
chore/agentx-v0.4from
amd/agentx-v0.4

seungrokj commented Jun 3, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

cursor Bot Jun 3, 2026

Uh oh!

cursor Bot Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

seungrokj commented Jun 3, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor Bot Jun 3, 2026

Choose a reason for hiding this comment

Trace override on wrong script

Uh oh!

cursor Bot Jun 3, 2026

Choose a reason for hiding this comment

Missing AITER RMSNorm disable

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

seungrokj commented Jun 3, 2026 •

edited by cursor Bot

Loading