[NV] Add MiniMax-M2.5 FP8 GB200 Dynamo vLLM recipes#1648
[NV] Add MiniMax-M2.5 FP8 GB200 Dynamo vLLM recipes#1648jasonlizhengjian wants to merge 3 commits into
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 90e5193. Configure here.
| # model: | ||
| # path: "minimax-m2.5-fp8" | ||
| # container: "v0.18.1" | ||
| # precision: "fp8" | ||
|
|
||
| # dynamo: | ||
| # version: 1.0.1 | ||
| # install: true | ||
|
|
||
| model: | ||
| path: "minimax-m2.5-fp8" | ||
| container: "vllm/vllm-openai:v0.20.1" | ||
| precision: "fp8" | ||
|
|
||
| dynamo: | ||
| install: true | ||
| wheel: "1.2.0.dev20260526" | ||
|
|
||
|
|
There was a problem hiding this comment.
🟡 Lines 3-10 of disagg-gb200-2p1d-dep8.yaml contain commented-out model: and dynamo: blocks referencing a stale container (v0.18.1) and dynamo (version: 1.0.1), immediately superseded by the real config on lines 12-19. There are also extra blank lines (20-21) and commented-out benchmark fields (warmup_prompts, use_chat_template, req_rate, random_range_ratio) at the bottom. None of the sibling recipes in this directory carry this clutter — please remove for consistency.
Extended reasoning...
This is a dead-configuration cleanup issue in a single recipe file. The file benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8/1k1k/disagg-gb200-2p1d-dep8.yaml was added new in this PR, and it carries leftover commented-out blocks from an earlier iteration that the other six sibling recipes (1p1d-tp4, 1p2d-tp4, 1p3d-tp4ep, 1p4d-dep2, 2p3d-dep4, and the three 8k1k variants) do not have.\n\nWhat the bug is: Lines 3-10 of the file contain a commented-out model: block (container: "v0.18.1") and dynamo: block (version: 1.0.1, no wheel: field) which look like an earlier draft of the schema. These are immediately followed on lines 12-19 by the real, current config (container: "vllm/vllm-openai:v0.20.1", wheel: "1.2.0.dev20260526"). Lines 20-21 are two consecutive blank lines (every other recipe in the directory has a single blank line separator). At the bottom, there are four commented-out benchmark: options (# warmup_prompts: 1, # use_chat_template: false, # req_rate: "inf", # random_range_ratio: 1.0) — also not present in the sibling recipes.\n\nStep-by-step proof:\n1. Diff the new file against any sibling, e.g. diff disagg-gb200-2p1d-dep8.yaml disagg-gb200-2p3d-dep4.yaml.\n2. The first hunk shows that 2p1d-dep8.yaml has lines 3-10 (commented-out model:/dynamo: blocks with the stale v0.18.1 container reference) that 2p3d-dep4.yaml does not.\n3. The second hunk shows an extra blank line between the dynamo: block and setup_script:.\n4. The third hunk shows the four trailing # warmup_prompts: … / # use_chat_template: … / # req_rate: … / # random_range_ratio: … lines at the bottom that no other recipe carries.\n\nImpact / why it doesn't affect functionality: YAML parsers strip comments, so the launcher and srtctl will read exactly the same fields as the other recipes. The container version that will actually be used is vllm/vllm-openai:v0.20.1 (correct, matches .github/configs/nvidia-master.yaml and the other recipes), not the stale v0.18.1. So this is purely cosmetic — no behavior change.\n\nAddressing the refutation: I agree with the refuting verifier that this has zero functional impact. I'm filing this as a nit rather than a normal bug because it is a style/cleanup issue, and only because the stale v0.18.1 container reference inside the dead comment is potentially confusing for a future maintainer trying to bisect or audit container versions across recipes. The fix is trivial: delete lines 3-10, the duplicate blank line, and the four trailing commented benchmark fields so this file matches the shape of its six siblings. Feel free to ignore if you'd rather keep the file as-is — it won't break anything.\n\nSuggested fix: Remove lines 3-10 (the commented model:/dynamo: blocks), collapse lines 20-21 to a single blank line, and remove the trailing # warmup_prompts: 1 / # use_chat_template: false / # req_rate: "inf" / # random_range_ratio: 1.0 lines at the bottom.
| elif [[ $FRAMEWORK == "dynamo-vllm" && $MODEL_PREFIX == "minimaxm2.5" && $PRECISION == "fp8" ]]; then | ||
| git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR" || exit 1 | ||
| cd "$SRT_REPO_DIR" || exit 1 | ||
| git checkout main || exit 1 | ||
| mkdir -p recipes/vllm/minimax-m2.5-gb200-fp8 || exit 1 | ||
| cp -rT "$GITHUB_WORKSPACE/benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8" recipes/vllm/minimax-m2.5-gb200-fp8 || exit 1 |
There was a problem hiding this comment.
🟡 The new minimaxm2.5 path does git checkout main on NVIDIA/srt-slurm.git, while every other branch in this same if/elif chain pins to a stable named ref (e.g. aflowers/vllm-gb200-v0.20.0, sa-submission-q2-2026, cam/sa-submission-q2-2026). Tracking main makes this recipe non-reproducible — any upstream commit to srt-slurm can silently change benchmark behavior at the next CI run. Please pin to a specific branch (e.g. sa-submission-q2-2026 like the other dynamo-vllm paths) or a commit SHA.
Extended reasoning...
What the bug is
In runners/launch_gb200-nv.sh the newly added minimaxm2.5 branch clones https://github.com/NVIDIA/srt-slurm.git and then runs git checkout main:
elif [[ $FRAMEWORK == "dynamo-vllm" && $MODEL_PREFIX == "minimaxm2.5" && $PRECISION == "fp8" ]]; then
git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR" || exit 1
cd "$SRT_REPO_DIR" || exit 1
git checkout main || exit 1
mkdir -p recipes/vllm/minimax-m2.5-gb200-fp8 || exit 1
cp -rT "$GITHUB_WORKSPACE/benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8" recipes/vllm/minimax-m2.5-gb200-fp8 || exit 1Why this is inconsistent with the rest of the file
Every other branch of this same if/elif chain pins to a stable named ref:
IS_AGENTIC=1→cam/sa-submission-q2-2026(--single-branch)- dynamo-vllm + dsv4 →
aflowers/vllm-gb200-v0.20.0 - dynamo-sglang + dsv4 →
sa-submission-q2-2026 - generic dynamo-vllm fallback →
sa-submission-q2-2026 - dynamo-trt + kimik2.5 →
sa-submission-q2-2026 - default fallback →
cam/sa-submission-q2-2026
The new minimaxm2.5 branch is the only one tracking main.
Why this matters (the impact)
main of NVIDIA/srt-slurm is the active development branch, so any upstream commit can silently change behavior of this benchmark between identical reruns:
- Recipe layout under
recipes/vllm/...could move, breaking thecp -rToverlay (which writes intorecipes/vllm/minimax-m2.5-gb200-fp8). srtctlCLI flags or YAML schema could change, breakingsrtctl apply -f $CONFIG_PATH --tags ...and thename:rewrite viased.- The container alias (e.g.
dynamo-vllm→ image) thatsrtctlresolves fromsrtslurm.yamlcould be renamed. make setup ARCH=aarch64targets could change.
None of these would surface in PR CI — they would only appear later as a silent regression in published benchmark numbers, with no diff in this repo to explain them. That defeats the entire point of having a perf-changelog entry tied to a specific PR.
Step-by-step proof this can manifest
- PR [NV] Add MiniMax-M2.5 FP8 GB200 Dynamo vLLM recipes #1648 merges. Today,
NVIDIA/srt-slurm@mainhappens to exposesrtctl apply -f <file> --tags .... - Tomorrow, upstream renames the flag to
--labeland force-merges tomain. - The next scheduled CI run for
minimaxm2.5-fp8-gb200-dynamo-vllmclonesmain, picks up the rename, andsrtctl apply ... --tags ...exits non-zero. The benchmark fails with no code change on our side.
Alternatively, in a subtler scenario:
- Upstream changes the default value of an srtctl knob (timeout, retry, batch policy).
- CI still passes, but the published throughput numbers drift, and the recorded perf-changelog entry no longer corresponds to a reproducible configuration.
The dsv4 branch right above explicitly comments that it pins to aflowers/vllm-gb200-v0.20.0 because it expects a specific srtctl + container-alias surface. The minimaxm2.5 branch needs the same treatment.
How to fix
Replace git checkout main with the same branch the other dynamo-vllm paths use (e.g. sa-submission-q2-2026), or pin to a specific SHA if the recipe needs srt-slurm changes that haven't landed in that branch yet. Matching the sibling pattern is the minimal change:
git checkout sa-submission-q2-2026(or git clone --branch <ref> --single-branch ... to make the pin explicit at clone time, matching the IS_AGENTIC / default branches).
| @@ -0,0 +1,69 @@ | |||
| name: "minimax-m2.5-vllm-disagg-gb200-1p1d-tp4ep" | |||
There was a problem hiding this comment.
🟡 Nit: the name field here is minimax-m2.5-vllm-disagg-gb200-1p1d-tp4ep, missing the 8k1k- segment that the other two 8k1k recipes in this directory use (...-8k1k-1p1d-tp4 and ...-8k1k-3p2d-dep4). No runtime impact — launch_gb200-nv.sh sed-overwrites this with ${RUNNER_NAME} before submission — but worth aligning with its siblings for consistency.
Extended reasoning...
What's inconsistent
Within benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8/8k1k/, the three new recipe files use these name: values:
disagg-gb200-1p1d-tp4.yaml→minimax-m2.5-vllm-disagg-gb200-8k1k-1p1d-tp4disagg-gb200-1p1d-tp4ep.yaml→minimax-m2.5-vllm-disagg-gb200-1p1d-tp4ep← missing8k1k-disagg-gb200-3p2d-dep4.yaml→minimax-m2.5-vllm-disagg-gb200-8k1k-3p2d-dep4
The two other 8k1k files encode the ISL/OSL bucket; this one doesn't.
Why it's a nit, not a bug
runners/launch_gb200-nv.sh runs sed -i "s/^name:.*/name: \"${RUNNER_NAME}\"/" "$CONFIG_PATH" before srtctl apply, so the in-file name is replaced at runtime and never reaches tags, labels, or logs as written. The refutation correctly points out that the 1k1k siblings also omit the bucket from their names (e.g. minimax-m2.5-vllm-disagg-gb200-1p1d-tp4), so there isn't a single repo-wide convention — the inconsistency is purely local to the 8k1k directory.
Step-by-step
- Open
8k1k/disagg-gb200-1p1d-tp4.yamlline 1 → name contains-8k1k-. - Open
8k1k/disagg-gb200-1p1d-tp4ep.yamlline 1 → name omits-8k1k-. - Open
8k1k/disagg-gb200-3p2d-dep4.yamlline 1 → name contains-8k1k-. - At runtime,
launch_gb200-nv.shline ~325 unconditionally overwritesname:with$RUNNER_NAME, so no log/tag/output uses the in-file value.
Suggested fix
Change the file's first line to:
name: "minimax-m2.5-vllm-disagg-gb200-8k1k-1p1d-tp4ep"This matches its two 8k1k siblings and removes a small source of confusion for anyone grepping recipe files directly. Not a blocker.
| - minimaxm2.5-fp8-gb200-dynamo-vllm | ||
| description: | ||
| - "Add MiniMax-M2.5 FP8 GB200 disaggregated multinode vLLM benchmarks via Dynamo" | ||
| - "Add 1k1k/8k1k FP8 recipe set under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8/" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX |
There was a problem hiding this comment.
🟡 The new entry in perf-changelog.yaml uses a placeholder pr-link 'https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX' that was never filled in. Please replace XXXX with this PR's number (1648) so the changelog entry links back to the actual PR like every other entry in the file.
Extended reasoning...
Bug\n\nLine 3404 of perf-changelog.yaml contains a literal placeholder:\n\nyaml\npr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX\n\n\nThe XXXX was clearly copied from the changelog entry template and never replaced with the actual PR number. The actual PR number is 1648 (per the PR metadata for this change).\n\n### Why this is wrong\n\nEvery other entry in perf-changelog.yaml uses a real, resolvable PR URL. A quick survey of the entries immediately preceding this one shows real PR numbers, e.g.:\n\n- line 3373: .../pull/1626\n- line 3379: .../pull/1630\n- line 3385: .../pull/1631\n- line 3391: .../pull/1588\n- line 3397: .../pull/1627\n\nThe new entry breaks that pattern with an unfilled template token.\n\n### Impact\n\nThis is a documentation/metadata issue rather than a runtime correctness bug — none of the benchmark recipes or shell scripts read pr-link at execution time. However, the changelog is the canonical place users (and tooling) look up the provenance of a perf change, so a dangling pull/XXXX link:\n\n- 404s if anyone clicks it\n- defeats automated tooling that may follow pr-link URLs (changelog generators, release notes, etc.)\n- silently propagates the template default if not caught now\n\n### Step-by-step proof\n\n1. Open perf-changelog.yaml and scroll to the bottom (lines 3399–3404):\n yaml\n - config-keys:\n - minimaxm2.5-fp8-gb200-dynamo-vllm\n description:\n - "Add MiniMax-M2.5 FP8 GB200 disaggregated multinode vLLM benchmarks via Dynamo"\n - "Add 1k1k/8k1k FP8 recipe set under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8/"\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX\n \n2. Note the literal XXXX placeholder.\n3. The PR title is "[NV] Add MiniMax-M2.5 FP8 GB200 Dynamo vLLM recipes" and its number is 1648 — so this entry should read:\n yaml\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1648\n \n\n### Fix\n\nReplace XXXX with 1648 on line 3404.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26849259449 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26849259449 |

Summary
Add GB200 MiniMax-M2.5 FP8 Dynamo vLLM recipes.
Note
Medium Risk
Launcher changes affect job submission paths and shared filesystem layout for minimaxm2.5 on GB200; misconfigured paths could break sweeps, but there is no production app or auth logic in scope.
Overview
Adds MiniMax-M2.5 FP8 on GB200 as a new
minimaxm2.5-fp8-gb200-dynamo-vllmbenchmark innvidia-master.yaml: multinode disaggregated Dynamo+vLLM (v0.20.1) with fixed-seq-len sweeps at 1k/1k and 8k/1k, each search-space point wiring prefill/decode worker counts, TP/EP, and aCONFIG_FILEunder the new recipe tree.Introduces nine Slurm recipe YAMLs under
benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8/(Nixl KV transfer, FP8 KV cache, varying 1P/2P × 1D–4D topologies and concurrency lists).Updates
runners/launch_gb200-nv.shfor this model: Lustre model path, clone NVIDIA/srt-slurmmainwith recipe overlay, and watchtower-specific shared-FS staging forsrt-slurm,INFMAX_WORKSPACE,/usr/bin/python3venv, plus stricterCONFIG_FILE/make setuperror handling. Documents the config inperf-changelog.yaml.Reviewed by Cursor Bugbot for commit f601cfe. Bugbot is set up for automated code reviews on this repo. Configure here.