[NV] Add MiniMax-M2.5 FP8 GB200 Dynamo vLLM recipes by jasonlizhengjian · Pull Request #1648 · SemiAnalysisAI/InferenceX

jasonlizhengjian · 2026-06-02T21:10:28Z

Summary

Add GB200 MiniMax-M2.5 FP8 Dynamo vLLM recipes.

Note

Medium Risk
Launcher changes affect job submission paths and shared filesystem layout for minimaxm2.5 on GB200; misconfigured paths could break sweeps, but there is no production app or auth logic in scope.

Overview
Adds MiniMax-M2.5 FP8 on GB200 as a new minimaxm2.5-fp8-gb200-dynamo-vllm benchmark in nvidia-master.yaml: multinode disaggregated Dynamo+vLLM (v0.20.1) with fixed-seq-len sweeps at 1k/1k and 8k/1k, each search-space point wiring prefill/decode worker counts, TP/EP, and a CONFIG_FILE under the new recipe tree.

Introduces nine Slurm recipe YAMLs under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8/ (Nixl KV transfer, FP8 KV cache, varying 1P/2P × 1D–4D topologies and concurrency lists).

Updates runners/launch_gb200-nv.sh for this model: Lustre model path, clone NVIDIA/srt-slurm main with recipe overlay, and watchtower-specific shared-FS staging for srt-slurm, INFMAX_WORKSPACE, /usr/bin/python3 venv, plus stricter CONFIG_FILE / make setup error handling. Documents the config in perf-changelog.yaml.

^{Reviewed by Cursor Bugbot for commit f601cfe. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-06-02T21:10:38Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 90e5193. Configure here.}

claude · 2026-06-02T21:16:53Z

+# model:
+#   path: "minimax-m2.5-fp8"
+#   container: "v0.18.1"
+#   precision: "fp8"
+
+# dynamo:
+#   version: 1.0.1
+#   install: true
+
+model:
+  path: "minimax-m2.5-fp8"
+  container: "vllm/vllm-openai:v0.20.1"
+  precision: "fp8"
+
+dynamo:
+  install: true
+  wheel: "1.2.0.dev20260526"
+
+


🟡 Lines 3-10 of disagg-gb200-2p1d-dep8.yaml contain commented-out model: and dynamo: blocks referencing a stale container (v0.18.1) and dynamo (version: 1.0.1), immediately superseded by the real config on lines 12-19. There are also extra blank lines (20-21) and commented-out benchmark fields (warmup_prompts, use_chat_template, req_rate, random_range_ratio) at the bottom. None of the sibling recipes in this directory carry this clutter — please remove for consistency.

Extended reasoning...

This is a dead-configuration cleanup issue in a single recipe file. The file benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8/1k1k/disagg-gb200-2p1d-dep8.yaml was added new in this PR, and it carries leftover commented-out blocks from an earlier iteration that the other six sibling recipes (1p1d-tp4, 1p2d-tp4, 1p3d-tp4ep, 1p4d-dep2, 2p3d-dep4, and the three 8k1k variants) do not have.\n\nWhat the bug is: Lines 3-10 of the file contain a commented-out model: block (container: "v0.18.1") and dynamo: block (version: 1.0.1, no wheel: field) which look like an earlier draft of the schema. These are immediately followed on lines 12-19 by the real, current config (container: "vllm/vllm-openai:v0.20.1", wheel: "1.2.0.dev20260526"). Lines 20-21 are two consecutive blank lines (every other recipe in the directory has a single blank line separator). At the bottom, there are four commented-out benchmark: options (# warmup_prompts: 1, # use_chat_template: false, # req_rate: "inf", # random_range_ratio: 1.0) — also not present in the sibling recipes.\n\nStep-by-step proof:\n1. Diff the new file against any sibling, e.g. diff disagg-gb200-2p1d-dep8.yaml disagg-gb200-2p3d-dep4.yaml.\n2. The first hunk shows that 2p1d-dep8.yaml has lines 3-10 (commented-out model:/dynamo: blocks with the stale v0.18.1 container reference) that 2p3d-dep4.yaml does not.\n3. The second hunk shows an extra blank line between the dynamo: block and setup_script:.\n4. The third hunk shows the four trailing # warmup_prompts: … / # use_chat_template: … / # req_rate: … / # random_range_ratio: … lines at the bottom that no other recipe carries.\n\nImpact / why it doesn't affect functionality: YAML parsers strip comments, so the launcher and srtctl will read exactly the same fields as the other recipes. The container version that will actually be used is vllm/vllm-openai:v0.20.1 (correct, matches .github/configs/nvidia-master.yaml and the other recipes), not the stale v0.18.1. So this is purely cosmetic — no behavior change.\n\nAddressing the refutation: I agree with the refuting verifier that this has zero functional impact. I'm filing this as a nit rather than a normal bug because it is a style/cleanup issue, and only because the stale v0.18.1 container reference inside the dead comment is potentially confusing for a future maintainer trying to bisect or audit container versions across recipes. The fix is trivial: delete lines 3-10, the duplicate blank line, and the four trailing commented benchmark fields so this file matches the shape of its six siblings. Feel free to ignore if you'd rather keep the file as-is — it won't break anything.\n\nSuggested fix: Remove lines 3-10 (the commented model:/dynamo: blocks), collapse lines 20-21 to a single blank line, and remove the trailing # warmup_prompts: 1 / # use_chat_template: false / # req_rate: "inf" / # random_range_ratio: 1.0 lines at the bottom.

claude · 2026-06-02T21:16:53Z

+elif [[ $FRAMEWORK == "dynamo-vllm" && $MODEL_PREFIX == "minimaxm2.5" && $PRECISION == "fp8" ]]; then
+    git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR" || exit 1
+    cd "$SRT_REPO_DIR" || exit 1
+    git checkout main || exit 1
+    mkdir -p recipes/vllm/minimax-m2.5-gb200-fp8 || exit 1
+    cp -rT "$GITHUB_WORKSPACE/benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8" recipes/vllm/minimax-m2.5-gb200-fp8 || exit 1


🟡 The new minimaxm2.5 path does git checkout main on NVIDIA/srt-slurm.git, while every other branch in this same if/elif chain pins to a stable named ref (e.g. aflowers/vllm-gb200-v0.20.0, sa-submission-q2-2026, cam/sa-submission-q2-2026). Tracking main makes this recipe non-reproducible — any upstream commit to srt-slurm can silently change benchmark behavior at the next CI run. Please pin to a specific branch (e.g. sa-submission-q2-2026 like the other dynamo-vllm paths) or a commit SHA.

Extended reasoning...

What the bug is

In runners/launch_gb200-nv.sh the newly added minimaxm2.5 branch clones https://github.com/NVIDIA/srt-slurm.git and then runs git checkout main:

elif [[ $FRAMEWORK == "dynamo-vllm" && $MODEL_PREFIX == "minimaxm2.5" && $PRECISION == "fp8" ]]; then git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR" || exit 1 cd "$SRT_REPO_DIR" || exit 1 git checkout main || exit 1 mkdir -p recipes/vllm/minimax-m2.5-gb200-fp8 || exit 1 cp -rT "$GITHUB_WORKSPACE/benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8" recipes/vllm/minimax-m2.5-gb200-fp8 || exit 1

Why this is inconsistent with the rest of the file

Every other branch of this same if/elif chain pins to a stable named ref:

IS_AGENTIC=1 → cam/sa-submission-q2-2026 (--single-branch)

dynamo-vllm + dsv4 → aflowers/vllm-gb200-v0.20.0

dynamo-sglang + dsv4 → sa-submission-q2-2026

generic dynamo-vllm fallback → sa-submission-q2-2026

dynamo-trt + kimik2.5 → sa-submission-q2-2026

default fallback → cam/sa-submission-q2-2026

The new minimaxm2.5 branch is the only one tracking main.

Why this matters (the impact)

main of NVIDIA/srt-slurm is the active development branch, so any upstream commit can silently change behavior of this benchmark between identical reruns:

Recipe layout under recipes/vllm/... could move, breaking the cp -rT overlay (which writes into recipes/vllm/minimax-m2.5-gb200-fp8).

srtctl CLI flags or YAML schema could change, breaking srtctl apply -f $CONFIG_PATH --tags ... and the name: rewrite via sed.

The container alias (e.g. dynamo-vllm → image) that srtctl resolves from srtslurm.yaml could be renamed.

make setup ARCH=aarch64 targets could change.

None of these would surface in PR CI — they would only appear later as a silent regression in published benchmark numbers, with no diff in this repo to explain them. That defeats the entire point of having a perf-changelog entry tied to a specific PR.

Step-by-step proof this can manifest

PR [NV] Add MiniMax-M2.5 FP8 GB200 Dynamo vLLM recipes #1648 merges. Today, NVIDIA/srt-slurm@main happens to expose srtctl apply -f <file> --tags ....

Tomorrow, upstream renames the flag to --label and force-merges to main.

The next scheduled CI run for minimaxm2.5-fp8-gb200-dynamo-vllm clones main, picks up the rename, and srtctl apply ... --tags ... exits non-zero. The benchmark fails with no code change on our side.

Alternatively, in a subtler scenario:

Upstream changes the default value of an srtctl knob (timeout, retry, batch policy).

CI still passes, but the published throughput numbers drift, and the recorded perf-changelog entry no longer corresponds to a reproducible configuration.

The dsv4 branch right above explicitly comments that it pins to aflowers/vllm-gb200-v0.20.0 because it expects a specific srtctl + container-alias surface. The minimaxm2.5 branch needs the same treatment.

How to fix

Replace git checkout main with the same branch the other dynamo-vllm paths use (e.g. sa-submission-q2-2026), or pin to a specific SHA if the recipe needs srt-slurm changes that haven't landed in that branch yet. Matching the sibling pattern is the minimal change:

git checkout sa-submission-q2-2026

(or git clone --branch <ref> --single-branch ... to make the pin explicit at clone time, matching the IS_AGENTIC / default branches).

claude · 2026-06-02T21:16:53Z

@@ -0,0 +1,69 @@
+name: "minimax-m2.5-vllm-disagg-gb200-1p1d-tp4ep"


🟡 Nit: the name field here is minimax-m2.5-vllm-disagg-gb200-1p1d-tp4ep, missing the 8k1k- segment that the other two 8k1k recipes in this directory use (...-8k1k-1p1d-tp4 and ...-8k1k-3p2d-dep4). No runtime impact — launch_gb200-nv.sh sed-overwrites this with ${RUNNER_NAME} before submission — but worth aligning with its siblings for consistency.

Extended reasoning...

What's inconsistent

Within benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8/8k1k/, the three new recipe files use these name: values:

disagg-gb200-1p1d-tp4.yaml → minimax-m2.5-vllm-disagg-gb200-8k1k-1p1d-tp4

disagg-gb200-1p1d-tp4ep.yaml → minimax-m2.5-vllm-disagg-gb200-1p1d-tp4ep ← missing 8k1k-

disagg-gb200-3p2d-dep4.yaml → minimax-m2.5-vllm-disagg-gb200-8k1k-3p2d-dep4

The two other 8k1k files encode the ISL/OSL bucket; this one doesn't.

Why it's a nit, not a bug

runners/launch_gb200-nv.sh runs sed -i "s/^name:.*/name: \"${RUNNER_NAME}\"/" "$CONFIG_PATH" before srtctl apply, so the in-file name is replaced at runtime and never reaches tags, labels, or logs as written. The refutation correctly points out that the 1k1k siblings also omit the bucket from their names (e.g. minimax-m2.5-vllm-disagg-gb200-1p1d-tp4), so there isn't a single repo-wide convention — the inconsistency is purely local to the 8k1k directory.

Step-by-step

Open 8k1k/disagg-gb200-1p1d-tp4.yaml line 1 → name contains -8k1k-.

Open 8k1k/disagg-gb200-1p1d-tp4ep.yaml line 1 → name omits -8k1k-.

Open 8k1k/disagg-gb200-3p2d-dep4.yaml line 1 → name contains -8k1k-.

At runtime, launch_gb200-nv.sh line ~325 unconditionally overwrites name: with $RUNNER_NAME, so no log/tag/output uses the in-file value.

Suggested fix

Change the file's first line to:

name: "minimax-m2.5-vllm-disagg-gb200-8k1k-1p1d-tp4ep"

This matches its two 8k1k siblings and removes a small source of confusion for anyone grepping recipe files directly. Not a blocker.

claude · 2026-06-02T21:16:53Z

+    - minimaxm2.5-fp8-gb200-dynamo-vllm
+  description:
+    - "Add MiniMax-M2.5 FP8 GB200 disaggregated multinode vLLM benchmarks via Dynamo"
+    - "Add 1k1k/8k1k FP8 recipe set under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8/"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX


🟡 The new entry in perf-changelog.yaml uses a placeholder pr-link 'https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX' that was never filled in. Please replace XXXX with this PR's number (1648) so the changelog entry links back to the actual PR like every other entry in the file.

Extended reasoning...

Bug\n\nLine 3404 of perf-changelog.yaml contains a literal placeholder:\n\nyaml\npr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX\n\n\nThe XXXX was clearly copied from the changelog entry template and never replaced with the actual PR number. The actual PR number is 1648 (per the PR metadata for this change).\n\n### Why this is wrong\n\nEvery other entry in perf-changelog.yaml uses a real, resolvable PR URL. A quick survey of the entries immediately preceding this one shows real PR numbers, e.g.:\n\n- line 3373: .../pull/1626\n- line 3379: .../pull/1630\n- line 3385: .../pull/1631\n- line 3391: .../pull/1588\n- line 3397: .../pull/1627\n\nThe new entry breaks that pattern with an unfilled template token.\n\n### Impact\n\nThis is a documentation/metadata issue rather than a runtime correctness bug — none of the benchmark recipes or shell scripts read pr-link at execution time. However, the changelog is the canonical place users (and tooling) look up the provenance of a perf change, so a dangling pull/XXXX link:\n\n- 404s if anyone clicks it\n- defeats automated tooling that may follow pr-link URLs (changelog generators, release notes, etc.)\n- silently propagates the template default if not caught now\n\n### Step-by-step proof\n\n1. Open perf-changelog.yaml and scroll to the bottom (lines 3399–3404):\n yaml\n - config-keys:\n - minimaxm2.5-fp8-gb200-dynamo-vllm\n description:\n - "Add MiniMax-M2.5 FP8 GB200 disaggregated multinode vLLM benchmarks via Dynamo"\n - "Add 1k1k/8k1k FP8 recipe set under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8/"\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX\n \n2. Note the literal XXXX placeholder.\n3. The PR title is "[NV] Add MiniMax-M2.5 FP8 GB200 Dynamo vLLM recipes" and its number is 1648 — so this entry should read:\n yaml\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1648\n \n\n### Fix\n\nReplace XXXX with 1648 on line 3404.

github-actions · 2026-06-03T00:20:15Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26849259449
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26849259449

github-actions · 2026-06-03T12:11:52Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26849259449
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26849259449

Add MiniMax-M2.5 FP8 GB200 Dynamo vLLM recipes

90e5193

jasonlizhengjian requested a review from a team June 2, 2026 21:10

jasonlizhengjian requested review from jgangani and kedarpotdar-nv as code owners June 2, 2026 21:10

github-project-automation Bot added this to InferenceMAX Board Jun 2, 2026

cursor Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread ...s/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8/8k1k/disagg-gb200-1p1d-tp4ep.yaml

Update MiniMax-M2.5 FP8 GB200 PR link

7d861f6

claude Bot reviewed Jun 2, 2026

View reviewed changes

Add GB200 FP8 8k prefill batch limit

f601cfe

jasonlizhengjian added the full-sweep-enabled label Jun 2, 2026

jasonlizhengjian removed the full-sweep-enabled label Jun 3, 2026

jasonlizhengjian added the reuse-full-sweep-results label Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] Add MiniMax-M2.5 FP8 GB200 Dynamo vLLM recipes#1648

[NV] Add MiniMax-M2.5 FP8 GB200 Dynamo vLLM recipes#1648
jasonlizhengjian wants to merge 3 commits into
mainfrom
nv/jasonli/minimaxm2.5-fp8-gb200-dynamo-vllm

jasonlizhengjian commented Jun 2, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

claude Bot Jun 2, 2026

Uh oh!

claude Bot Jun 2, 2026

Uh oh!

claude Bot Jun 2, 2026

Uh oh!

claude Bot Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -0,0 +1,69 @@
		name: "minimax-m2.5-vllm-disagg-gb200-1p1d-tp4ep"

Conversation

jasonlizhengjian commented Jun 2, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 2, 2026

Choose a reason for hiding this comment

What the bug is

Why this is inconsistent with the rest of the file

Why this matters (the impact)

Step-by-step proof this can manifest

How to fix

Uh oh!

claude Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jasonlizhengjian commented Jun 2, 2026 •

edited by cursor Bot

Loading