[NV] Add MiniMax-M2.5 FP8 GB300 Dynamo vLLM recipes by jasonlizhengjian · Pull Request #1647 · SemiAnalysisAI/InferenceX

jasonlizhengjian · 2026-06-02T20:28:38Z

Summary

Add GB300 MiniMax-M2.5 FP8 Dynamo vLLM recipes.

Note

Low Risk
Benchmark and CI runner configuration only; no production serving, auth, or application logic changes.

Overview
Adds MiniMax-M2.5 FP8 disaggregated Dynamo + vLLM coverage on GB300 (minimaxm2.5-fp8-gb300-dynamo-vllm in nvidia-master.yaml), with fixed 1k/1k and 8k/1k scenarios that sweep prefill/decode worker counts, TP/EP, and concurrency via CONFIG_FILE pointers into new Slurm recipes.

Introduces a full recipe tree under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8/ (GB300 disagg layouts: 1p1d–5p2d, TP/EP/DP variants, Nixl KV transfer, fp8 cache, v0.20.1 + Dynamo wheel). Documents the config in perf-changelog.yaml.

Wires the gb300-nv launcher: minimaxm2.5 + fp8 model path/prefix, clones srt-slurm main and copies these recipes for dynamo-vllm, and makes eval artifact copy failures non-fatal.

^{Reviewed by Cursor Bugbot for commit f72628c. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-06-02T20:28:48Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-06-02T20:34:31Z

+# model:
+#   path: "minimax-m2.5-fp8"
+#   container: "v0.18.1"
+#   precision: "fp8"
+
+# dynamo:
+#   version: 1.0.1


🟡 Several new recipe files carry stale commented-out model: (referencing the older v0.18.1 container) and dynamo: version: 1.0.1 blocks immediately above the active config, which already specifies vllm/vllm-openai:v0.20.1 + wheel: 1.2.0.dev20260526. These dead blocks have no runtime effect (YAML ignores comments) but are copy-paste leftovers that contradict the live values and hurt readability. Same hygiene issue appears in 1k1k/disagg-gb300-2p1d-dep8.yaml, 1k1k/disagg-gb300-2p2d-dep4.yaml, 1k1k/disagg-gb300-2p2d-dep4-hi-conc.yaml, 8k1k/disagg-gb300-3p1d-dep4.yaml, and 8k1k/disagg-gb300-3p1d-dep4-hi-conc.yaml — please drop the commented blocks (and the trailing blank lines) before merge.

Extended reasoning...

What the bug is. Five of the new recipe files in this PR begin with a commented-out model: block pinning container: "v0.18.1" and a commented-out dynamo: block pinning version: 1.0.1, sitting directly above the active model:/dynamo: blocks that pin the current versions (vllm/vllm-openai:v0.20.1 and wheel: 1.2.0.dev20260526). The commented values are clearly stale — they are an earlier iteration of the same fields that the live blocks now supersede.\n\nWhere it appears. Verified in the PR diff:\n- benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8/1k1k/disagg-gb300-2p1d-dep8.yaml (lines 3–10)\n- benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8/1k1k/disagg-gb300-2p2d-dep4.yaml\n- benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8/1k1k/disagg-gb300-2p2d-dep4-hi-conc.yaml\n- benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8/8k1k/disagg-gb300-3p1d-dep4.yaml\n- benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8/8k1k/disagg-gb300-3p1d-dep4-hi-conc.yaml\n\nThe sibling recipes in the same directories (e.g. 1k1k/disagg-gb300-1p1d-tp4.yaml, 1k1k/disagg-gb300-2p3d-dep2.yaml, the various 8k1k *-tp4ep.yaml files, etc.) do not contain the dead blocks, which suggests these five files were copy-pasted from an older template before the cleanup was applied uniformly.\n\nRuntime impact. None. YAML parsers strip # lines, so srtctl will use the live model: and dynamo: mappings as intended. This is purely a hygiene/readability issue — when a reader scans the recipe to confirm which container/wheel is in use, they see two conflicting sets of values and have to mentally diff them.\n\nWhy this is worth fixing. Stale duplicate blocks rot: the next time someone bumps the container or wheel version, they may forget that the commented block exists and leave it referencing a version that no longer matches any reality. Cleaner recipes also make grep-based audits ("which recipes pin v0.18.1?") less noisy.\n\nStep-by-step proof. Take 1k1k/disagg-gb300-2p1d-dep8.yaml lines 1–22:\n\nyaml\nname: "minimax-m2.5-vllm-disagg-gb300-2p1d-dep8"\n\n# model: <- stale block starts\n# path: "minimax-m2.5-fp8"\n# container: "v0.18.1"\n# precision: "fp8"\n\n# dynamo:\n# version: 1.0.1\n# install: true <- stale block ends\n\nmodel: <- live block (the one that takes effect)\n path: "minimax-m2.5-fp8"\n container: "vllm/vllm-openai:v0.20.1"\n precision: "fp8"\n\ndynamo:\n install: true\n wheel: "1.2.0.dev20260526"\n\n\nThe two blocks specify different containers (v0.18.1 vs vllm/vllm-openai:v0.20.1) and different dynamo versioning schemes (version: 1.0.1 vs wheel: 1.2.0.dev20260526). Only the live block reaches srtctl, but the contradiction is visible to every human reader.\n\nHow to fix. Delete the leading commented model: and dynamo: blocks (and the extra blank lines they leave behind) from the five affected files so each one starts directly with the active model: mapping — matching the structure of the sibling recipes in the same directories that don't have this issue. Severity: nit — no runtime effect, but trivial to clean up before merge.

claude · 2026-06-02T20:34:31Z

+elif [[ $FRAMEWORK == "dynamo-vllm" && $MODEL_PREFIX == "minimaxm2.5" && $PRECISION == "fp8" ]]; then
+    git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR"
+    cd "$SRT_REPO_DIR"
+    git checkout main
+    mkdir -p recipes/vllm/minimax-m2.5-fp8
+    cp -rT "$GITHUB_WORKSPACE/benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8" recipes/vllm/minimax-m2.5-fp8


🟡 The new minimaxm2.5 branch in runners/launch_gb300-nv.sh runs git checkout main on github.com/NVIDIA/srt-slurm.git, which is the only mutable-HEAD reference in this file — every other clone target pins to a specific commit (6e34b8b for agentic) or a stable branch (aflowers/gb200-dsv4-recipes for dsv4, sa-submission-q2-2026 for glm5 and the default fallback). Future upstream schema/CLI changes in srt-slurm will silently break these MiniMax recipes without any code change here. Pin to a commit or to sa-submission-q2-2026 (matching the default branch) to keep these recipes reproducible.

Extended reasoning...

What's wrong

In runners/launch_gb300-nv.sh:140-148, the newly added MiniMax-M2.5 branch clones github.com/NVIDIA/srt-slurm.git and then does git checkout main:

elif [[ $FRAMEWORK == "dynamo-vllm" && $MODEL_PREFIX == "minimaxm2.5" && $PRECISION == "fp8" ]]; then git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR" cd "$SRT_REPO_DIR" git checkout main mkdir -p recipes/vllm/minimax-m2.5-fp8 cp -rT "$GITHUB_WORKSPACE/benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8" recipes/vllm/minimax-m2.5-fp8

This is the only git checkout main in the entire file — every other branch of the if/elif/else chain pins to a more stable reference.

Comparison to existing convention

Looking at the surrounding branches in the same if/elif chain:

Branch Ref used Stability

IS_AGENTIC (cquil11/srt-slurm-nv) 6e34b8b83229634d732e41a4e2d6595f46ef60b5 Pinned commit

dynamo-vllm + dsv4 aflowers/gb200-dsv4-recipes Stable feature branch

dynamo-sglang + glm5 sa-submission-q2-2026 Stable submission branch

dynamo-vllm + minimaxm2.5 main Mutable upstream HEAD

Default fallback sa-submission-q2-2026 Stable submission branch

The main branch of NVIDIA/srt-slurm is an external upstream repo not controlled by this project. Any commit that lands there — schema changes to DynamoConfig, renames to sbatch_directives/srun_options, CLI changes in srtctl apply, etc. — will be picked up the next time this script runs, with no PR ever landing in InferenceX.

Why this matters

The broader file is deliberately paranoid about pinning. The comment block at lines ~99-121 explicitly enumerates which upstream schema features the agentic recipes depend on (DynamoConfig.wheel, sbatch_directives, srun_options, BenchmarkType.CUSTOM + benchmark.command + benchmark.env) — exactly because these are the things that drift upstream. The new MiniMax recipes themselves continue this pattern: every one of the 16 new YAMLs pins dynamo.wheel: "1.2.0.dev20260526". Leaving the srt-slurm ref unpinned defeats that paranoia for this one model.

Step-by-step proof of the failure mode

Today: Author opens PR, runs the workflow. git checkout main resolves to NVIDIA/srt-slurm HEAD at commit X. Recipes work; PR merges.

Two weeks later: An upstream PR renames safetensors-load-strategy to safetensors-load-mode in srt-slurm's vLLM recipe schema, or removes the top-level dynamo.wheel field, or changes how kv-transfer-config is parsed. (These are all examples of the schema fields the new MiniMax recipes use.) main advances to commit Y.

A regular scheduled run of minimaxm2.5-fp8-gb300-dynamo-vllm fires. The script clones, git checkout main lands on Y, copies the recipe YAMLs in, and srtctl apply fails with a parse error / unknown-field error — or worse, silently changes semantics (e.g. ignores the new field name and benchmarks with different settings).

Nothing changed in InferenceX. There's no PR to bisect, no commit to revert. The on-call has to dig through NVIDIA/srt-slurm history to find the upstream change.

Contrast: with a pinned commit (e.g. git checkout <sha>) or a stable branch (sa-submission-q2-2026), the same upstream change is invisible until someone explicitly bumps the ref in this file — at which point the diff is reviewable and the breakage is tied to a PR.

How to fix

Replace git checkout main with one of:

git checkout sa-submission-q2-2026 — matches the default fallback and the glm5 branch; simplest fix.

git checkout <full SHA> — matches the agentic-branch pattern; strongest reproducibility guarantee. Use whatever commit on main the author actually tested against.

Either is a one-line change.

claude · 2026-06-02T20:34:31Z

+  description:
+    - "Add MiniMax-M2.5 FP8 GB300 disaggregated multinode vLLM benchmarks via Dynamo"
+    - "Add 1k1k/8k1k FP8 recipe set under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8/"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX


🟡 The new entry in perf-changelog.yaml (line 3404) uses a placeholder pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX instead of this PR's actual number (#1647). Please update it to https://github.com/SemiAnalysisAI/InferenceX/pull/1647 so the changelog doesn't link to a non-existent PR after merge.

Extended reasoning...

What the bug is. The new changelog entry added in this PR (perf-changelog.yaml lines 3398-3404) contains a literal placeholder string XXXX in its pr-link field instead of a real PR number:\n\nyaml\n- config-keys:\n - minimaxm2.5-fp8-gb300-dynamo-vllm\n description:\n - "Add MiniMax-M2.5 FP8 GB300 disaggregated multinode vLLM benchmarks via Dynamo"\n - "Add 1k1k/8k1k FP8 recipe set under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8/"\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX\n\n\nConvention in the file. Every other entry in perf-changelog.yaml references a real, resolvable PR number. Adjacent entries use /pull/1627, /pull/1631, /pull/1588, /pull/1630, etc. The XXXX is clearly a template placeholder that the author forgot to fill in before pushing the final commit.\n\nStep-by-step proof.\n1. Open this PR's metadata — the PR number is 1647.\n2. Open perf-changelog.yaml at line 3404 in this diff: the value is the literal string https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX.\n3. After merge, anyone clicking that link from the changelog will navigate to github.com/SemiAnalysisAI/InferenceX/pull/XXXX, which is not a valid PR number and will 404.\n4. Compare to the entry immediately above (line 3397): pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1627 — a real PR. The convention is unambiguous.\n\nWhy existing code doesn't prevent it. perf-changelog.yaml is plain documentation YAML with no schema validation that checks whether the URL resolves; the CI doesn't fetch each pr-link to confirm the PR exists. The placeholder will silently pass any structural YAML lint.\n\nImpact. Documentation-only. There is no runtime effect — the file is metadata describing the new config key minimaxm2.5-fp8-gb300-dynamo-vllm. However, the changelog is the canonical record of when each config was added and which PR introduced it; a permanently broken link makes future archaeology (e.g. "why was this benchmark added, what was the context") harder than it needs to be.\n\nFix. Replace XXXX with 1647 on perf-changelog.yaml:3404:\nyaml\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1647\n

github-actions · 2026-06-02T23:48:00Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26846199941
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26846199941

github-actions · 2026-06-03T00:07:28Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26846199941
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26846199941

cursor · 2026-06-03T02:02:25Z

+  prefill_workers: 1
+  decode_workers: 1
+  gpus_per_prefill: 1
+  gpus_per_decode: 4


Missing GB300 slurm limits

Medium Severity

New GB300 MiniMax Dynamo vLLM recipes jump straight from setup_script to resources and never set slurm.time_limit or health_check, unlike existing GB300 vLLM disagg recipes on the same runner. Jobs therefore keep the launcher’s four-hour default while engine startup allows up to an hour, so long multi-concurrency sweeps can be killed before benchmarks finish.

Additional Locations (1)

benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8/1k1k/disagg-gb300-2p2d-dep4-hi-conc.yaml#L22-L33

^{Reviewed by Cursor Bugbot for commit cc720f0. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 8f59225. Configure here.}

cursor · 2026-06-03T02:35:05Z

+      safetensors-load-strategy: "prefetch"
+      trust-remote-code: true
+      no-enable-prefix-caching: true
+      stream-interval: 32


Missing max-model-len on 8k1k

High Severity

New GB300 8k1k disagg recipes set benchmark.isl to 8192 and osl to 1024 but omit max-model-len on prefill and decode in vllm_config. Matching GB200 MiniMax 8k1k recipes set max-model-len: 9280 on both sides so total sequence length fits disagg KV transfer; without that cap, vLLM may reject 8192-token inputs or reserve KV for the model’s full context and fail at runtime.

^{Reviewed by Cursor Bugbot for commit 8f59225. Configure here.}

Add MiniMax-M2.5 FP8 GB300 Dynamo vLLM recipes

af6d92f

jasonlizhengjian requested a review from a team June 2, 2026 20:28

jasonlizhengjian requested review from jgangani and kedarpotdar-nv as code owners June 2, 2026 20:28

github-project-automation Bot added this to InferenceMAX Board Jun 2, 2026

Update GB300 FP8 MiniMax changelog PR link

a72f7c7

jasonlizhengjian added the full-sweep-enabled label Jun 2, 2026

claude Bot reviewed Jun 2, 2026

View reviewed changes

Fix GB300 eval artifact copy

4edbeea

jasonlizhengjian removed the full-sweep-enabled label Jun 3, 2026

Handle existing GB300 eval artifacts

cc720f0

cursor Bot reviewed Jun 3, 2026

View reviewed changes

jasonlizhengjian added 2 commits June 2, 2026 19:18

Merge main into GB300 FP8 MiniMax PR

e00c5ec

Do not fail GB300 eval artifact copy

8f59225

cursor Bot reviewed Jun 3, 2026

View reviewed changes

Merge latest main into GB300 FP8 MiniMax PR

f72628c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] Add MiniMax-M2.5 FP8 GB300 Dynamo vLLM recipes#1647

[NV] Add MiniMax-M2.5 FP8 GB300 Dynamo vLLM recipes#1647
jasonlizhengjian wants to merge 7 commits into
mainfrom
nv/jasonli/minimaxm2.5-fp8-gb300-dynamo-vllm

jasonlizhengjian commented Jun 2, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

claude Bot Jun 2, 2026

Uh oh!

claude Bot Jun 2, 2026

Uh oh!

claude Bot Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

cursor Bot Jun 3, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Branch	Ref used	Stability
`IS_AGENTIC` (cquil11/srt-slurm-nv)	`6e34b8b83229634d732e41a4e2d6595f46ef60b5`	Pinned commit
`dynamo-vllm` + `dsv4`	`aflowers/gb200-dsv4-recipes`	Stable feature branch
`dynamo-sglang` + `glm5`	`sa-submission-q2-2026`	Stable submission branch
`dynamo-vllm` + `minimaxm2.5`	`main`	Mutable upstream HEAD
Default fallback	`sa-submission-q2-2026`	Stable submission branch

Conversation

jasonlizhengjian commented Jun 2, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

claude Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 2, 2026

Choose a reason for hiding this comment

What's wrong

Comparison to existing convention

Why this matters

Step-by-step proof of the failure mode

How to fix

Uh oh!

claude Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

cursor Bot Jun 3, 2026

Choose a reason for hiding this comment

Missing GB300 slurm limits

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 3, 2026

Choose a reason for hiding this comment

Missing max-model-len on 8k1k

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jasonlizhengjian commented Jun 2, 2026 •

edited by cursor Bot

Loading