Skip to content

[DNM][AMD] agentx-v0.4#1654

Open
seungrokj wants to merge 6 commits into
chore/agentx-v0.4from
amd/agentx-v0.4
Open

[DNM][AMD] agentx-v0.4#1654
seungrokj wants to merge 6 commits into
chore/agentx-v0.4from
amd/agentx-v0.4

Conversation

@seungrokj
Copy link
Copy Markdown
Collaborator

@seungrokj seungrokj commented Jun 3, 2026

Summary

  • Add minimaxm2.5-fp4-mi355x-vllm-agentic-lmcache and kimik2.5-fp4-mi355x-vllm-agentic-lmcache entries in amd-master.yaml
  • Add minimaxm2.5_fp4_mi355x.sh agentic benchmark script with LMCache support
  • Refactor kimik2.5_fp4_mi355x.sh: simplify env vars, build LMCache from source (ROCm HIP), tune LMCACHE_L1_SIZE_GB/TTL/chunk size
  • Refactor qwen3.5_fp8_mi355x.sh: add HiCache offloading support, add 256k trace corpus cap via WEKA_LOADER_OVERRIDE
  • Set LMCACHE_CHUNK_SIZE default to 32 for MiniMax agentic script

🤖 Generated with Claude Code


Note

Medium Risk
Changes how long-running Slurm agentic jobs install KV offload (git-built LMCache, larger DRAM knobs with TODOs) and alter CI sweep matrices; misconfiguration could waste cluster time or skew comparability, but there is no production serving or auth impact.

Overview
Extends AMD master CI with new agentic-coding targets: minimaxm2.5-fp4-mi355x-vllm-agentic-lmcache (TP=1, GPU-only vs lmcache) and kimik2.5-fp4-mi355x-vllm-agentic-lmcache (TP=4, same pair, pinned to vLLM 0.21.0). The existing Qwen3.5 agentic HiCache entry bumps the SGLang image, moves the sweep to TP=4, and aligns hicache concurrency with the none grid instead of a high-conc-only hicache slice.

kimik2.5_fp4_mi355x.sh is largely rewritten: it drops the large inline ROCm LMCache / vLLM monkey-patch bundle (demand pinning, chunked KV load, scheduler KV-xfer fix, NIXL/CuPy guards) in favor of cloning and HIP-building LMCache at run time, scales CPU/L1 pools by TP (3 TB node budget), lengthens L1 read TTL, caps traces via WEKA_LOADER_OVERRIDE, and tunes vLLM (fp8 KV, block size 1, no explicit max-model-len). A new minimaxm2.5_fp4_mi355x.sh follows the same LMCache/CPU-offload pattern with MiniMax-specific server flags (e.g. chunk size 32, ROCM_AITER_FA, block size 32).

qwen3.5_fp8_mi355x.sh gains an OFFLOADING switch (none vs hicache with sized CPU hierarchical pools, warmup skip, graph cap) and the same 256k trace override, plus stricter required env vars for the agentic matrix.

Reviewed by Cursor Bugbot for commit fc0d0d4. Bugbot is set up for automated code reviews on this repo. Configure here.

seungrokj and others added 2 commits June 3, 2026 14:49
… Kimi/Qwen scripts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… to 32

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

2 similar comments
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Comment thread .github/configs/amd-master.yaml Outdated
Comment thread .github/configs/amd-master.yaml Outdated
Comment thread benchmarks/single_node/agentic/qwen3.5_fp8_mi355x.sh
… Kimi/MiniMax scripts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread .github/configs/amd-master.yaml Outdated
Comment thread benchmarks/single_node/agentic/kimik2.5_fp4_mi355x.sh
Comment thread benchmarks/single_node/agentic/minimaxm2.5_fp4_mi355x.sh
Comment thread benchmarks/single_node/agentic/minimaxm2.5_fp4_mi355x.sh
….0, expand conc list

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread benchmarks/single_node/agentic/kimik2.5_fp4_mi355x.sh
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit dc999ef. Configure here.

Comment thread .github/configs/amd-master.yaml Outdated
#export WEKA_LOADER_OVERRIDE=semianalysis_cc_traces_weka_with_subagents_256k
#060226
export WEKA_LOADER_OVERRIDE=semianalysis_cc_traces_weka_with_subagents_060226_256k

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trace override on wrong script

Medium Severity

The 256k WEKA_LOADER_OVERRIDE was added only in qwen3.5_fp8_mi355x.sh, but MI355X SGLang jobs prefer qwen3.5_fp8_mi355x_sglang.sh when it exists. Standard launches therefore skip the new corpus cap the PR describes for Qwen agentic HiCache.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit dc999ef. Configure here.

fi

export VLLM_ROCM_USE_AITER=1
export VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT4
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing AITER RMSNorm disable

Medium Severity

The new MiniMax agentic script enables AITER but never disables AITER RMSNorm for TP below 8. The matrix runs tp: 1, and the Kimi MI355X agentic script explicitly turns off VLLM_ROCM_USE_AITER_RMSNORM for accuracy at low TP.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit dc999ef. Configure here.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant