[B300][vLLM] Add MiniMax-M2.5 FP4 disagg Dynamo configs by Ankur-singh · Pull Request #1652 · SemiAnalysisAI/InferenceX

Ankur-singh · 2026-06-03T00:37:37Z

B300 half of split #1560 (GB300 sibling lives in a separate PR so one CI failure doesn't block the other).

Summary

Add minimaxm2.5-fp4-b300-dynamo-vllm to .github/configs/nvidia-master.yaml (1k1k + 8k1k search spaces, including a new tp8-1p1d cell at low concurrencies)
Add srt-slurm vLLM recipes under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/
Wire minimaxm2.5 + fp4 + dynamo-vllm routing into runners/launch_b300-nv.sh
Append perf-changelog entry

Image: vllm/vllm-openai:v0.20.1; model: nvidia/MiniMax-M2.5-NVFP4.

Note

Low Risk
Benchmark/CI and runner wiring only; no production inference, auth, or application runtime changes.

Overview
Adds B300 disaggregated multinode benchmarking for MiniMax-M2.5 NVFP4 with Dynamo + vLLM (vllm/vllm-openai:v0.20.1).

A new minimaxm2.5-fp4-b300-dynamo-vllm entry in nvidia-master.yaml defines 1k/1k and 8k/1k fixed-seq-len search spaces (prefill/decode worker counts, TP/EP, optional dp-attn), pointing CONFIG_FILE at recipes/vllm/minimax-m2.5/... including new low-concurrency tp8-1p1d cells. Matching Slurm recipe YAMLs land under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/ (TP/EP/DP decode variants, expert-parallel and data-parallel decode configs, Nixl KV transfer).

runners/launch_b300-nv.sh now routes minimaxm2.5 + fp4 + dynamo-vllm to the on-disk model path and copies those recipes into NVIDIA srt-slurm on main. perf-changelog.yaml documents the new config key.

^{Reviewed by Cursor Bugbot for commit 15a96d5. Bugbot is set up for automated code reviews on this repo. Configure here.}

Split of #1560 — B300 half. - Add minimaxm2.5-fp4-b300-dynamo-vllm to nvidia-master.yaml (1k1k + 8k1k search spaces; image vllm/vllm-openai:v0.20.1, model nvidia/MiniMax-M2.5-NVFP4). - Add srt-slurm recipes under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/. - Wire minimax + dynamo-vllm routing into runners/launch_b300-nv.sh. - Append perf-changelog entry.

github-actions · 2026-06-03T00:37:44Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-06-03T00:45:39Z

+# B300-only: full-node TP=8 decode (the 8 GPUs of a single B300 node).
+# Cousin of tp4-1p1d.yaml but exercises the wider TP that B300's per-node
+# GPU count makes available. Only the smallest concurrencies (1,4,8) —
+# this topology is decode-latency focused, not throughput.


🟡 The header comments in both new tp8-1p1d.yaml files claim the recipe exercises 'the smallest concurrencies (1,4,8)', but the benchmark.concurrencies field is just "4" in both files (and the corresponding nvidia-master.yaml entries use conc-list: [4]). Either update the comments to say (4), or extend the conc-lists to include 1 and 8 if those were intended.

Extended reasoning...

Both new B300 tp8-1p1d recipes contain a self-contradictory header comment vs. their actual benchmark configuration:

benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/1k1k/tp8-1p1d.yaml line 5: # available. Only the smallest concurrencies (1,4,8) — versus line 77: concurrencies: "4"

benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/8k1k/tp8-1p1d.yaml line 5: # available. Smallest concurrencies only (1,4,8). versus line 72: concurrencies: "4"

The master config corroborates the file-level value: in .github/configs/nvidia-master.yaml, both tp8-1p1d entries (the 1k1k and 8k1k slots under minimaxm2.5-fp4-b300-dynamo-vllm) use conc-list: [4] — not [1, 4, 8]. So the comment is factually wrong about what the recipe exercises.

Impact: Documentation-only. The recipes still run correctly (only concurrency 4 is benchmarked, matching the nvidia-master.yaml wiring). The risk is misleading future readers who look at the recipe in isolation, infer that 1 and 8 are also covered, and either skip adding them to a follow-up sweep or get confused when results only show one point.

Why existing review didn't catch it: The conc-list values are spread across three files (the recipe yaml, the recipe yaml's concurrencies string, and nvidia-master.yaml). The comment was likely an earlier intent that got narrowed during tuning, but the prose wasn't updated to match.

Step-by-step proof:

Open benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300/1k1k/tp8-1p1d.yaml. The header (lines 3–6) says: # B300-only: full-node TP=8 decode … Only the smallest concurrencies (1,4,8) — this topology is decode-latency focused, not throughput.

Scroll to the benchmark: block at the bottom — concurrencies: "4". The recipe's own format for multiple values would be "1x4x8" (cf. the sibling tp4-1p1d.yaml which uses "4x16").

Open .github/configs/nvidia-master.yaml and find the tp8-1p1d cells (1k1k and 8k1k) under minimaxm2.5-fp4-b300-dynamo-vllm: both have conc-list: [4]. No 1 or 8.

So at runtime the orchestrator and the recipe agree on concurrency = 4 only. The comment, in two separate files, asserts (1,4,8). Mismatch confirmed.

Fix (trivial, pick one):

Update both header comments to say Smallest concurrency only (4). and adjust the surrounding prose, OR

Expand concurrencies to "1x4x8" in both recipe yamls and conc-list: [1, 4, 8] in both nvidia-master.yaml cells if 1 and 8 were intentionally scoped out (the comment says they should be there).

claude · 2026-06-03T00:45:39Z

+    - "Same 1k/1k and 8k/1k search space as gb300, plus a new tp8-1p1d at low concurrencies for both ISLs"
+  pr-link: https://github.com/NVIDIA/InferenceMAX/pull/83


🟡 perf-changelog.yaml:3418-3419 has two doc nits in the new minimaxm2.5-fp4-b300-dynamo-vllm entry. (1) The description says "Same 1k/1k and 8k/1k search space as gb300" but no gb300 config key exists on main — only the GB200 sibling does, and the GB300 sibling lives in a separate yet-to-merge PR, so this is a dangling forward-reference. (2) The pr-link is https://github.com/NVIDIA/InferenceMAX/pull/83, but every other entry in this 3400+ line file (including the GB200 sibling at line 3411 → pull/1642) links to SemiAnalysisAI/InferenceX/pull/<n>; this PR is #1652, so the link should be https://github.com/SemiAnalysisAI/InferenceX/pull/1652.

Extended reasoning...

Summary

Two metadata-only nits in the new perf-changelog entry for minimaxm2.5-fp4-b300-dynamo-vllm (perf-changelog.yaml:3413–3419).

1. Dangling gb300 reference (line 3418)

The description bullet reads:

"Same 1k/1k and 8k/1k search space as gb300, plus a new tp8-1p1d at low concurrencies for both ISLs"

But no gb300 config key exists in the repository. Grepping nvidia-master.yaml for minimaxm2.5.*gb returns only minimaxm2.5-fp4-gb200-dynamo-vllm (line 9909). The PR description itself acknowledges: "B300 half of split #1560 (GB300 sibling lives in a separate PR so one CI failure doesn’t block the other)." So the changelog is referring to a sibling artifact that has not yet merged.

Note also that this is not simply a typo for gb200 — the B300 1k/1k search space contains cells that the existing GB200 entry does not (dep2-2p3d, dep2-2p3d-c6144, tp4-1p2d, tp8-1p1d, dep4-4p1d, dep8-4p1d, tp4ep-2p1d), so the reference really does point at the unmerged GB300 sibling. Readers who later consult the changelog will see "search space as gb300" with no easy way to find that gb300 entry (it either does not exist yet, or, if/when the sibling lands, will live elsewhere in the file with no link from here).

Fix options:

Reference the merged sibling: "Same 1k/1k and 8k/1k search space as gb200, plus ...", or

Make the cross-reference explicit: "Same 1k/1k and 8k/1k search space as the GB300 sibling in #<sibling-PR-number>, plus ..." so the dangling forward-reference becomes a clickable pointer.

2. pr-link points to a different repo (line 3419)

The entry sets:

pr-link: https://github.com/NVIDIA/InferenceMAX/pull/83

This is the only entry in the 3400+ line perf-changelog.yaml that uses NVIDIA/InferenceMAX as the host. Every other entry uses https://github.com/SemiAnalysisAI/InferenceX/pull/<n>, including the immediately-preceding GB200 sibling at line 3411 which points to pull/1642 (the #1642 PR in this repo). Since this PR is #1652 in SemiAnalysisAI/InferenceX, the consistent link would be:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1652

The README confirms the repo was renamed: "InferenceX™ (formerly InferenceMAX)" — so the URL is using the former name with what appears to be an unrelated PR number (#83 in NVIDIA/InferenceMAX is not a public repo referenced anywhere else in this codebase). Git history shows commit 57fb086 (perf-changelog: link minimaxm2.5-fp4-b300 entry to PR #83) explicitly retargeted the link from PLACEHOLDER_PR_LINK to this URL — likely a copy-paste mistake (#83 is an internal/fork PR number, not the public one).

Proof / step-by-step

Grep for NVIDIA/InferenceMAX in the repo → only one hit: perf-changelog.yaml:3419.

Grep for pr-link.*SemiAnalysisAI/InferenceX → ~600 hits, including line 3411 which uses pull/1642.

Grep for minimaxm.?2\.5.*gb in the repo → only gb200 matches; no gb300 anywhere.

PR description states the GB300 sibling is in a separate not-yet-merged PR.

The B300 search space added in this PR differs structurally from the existing GB200 entry, so the gb300 reference is not a typo for gb200.

Impact

Documentation-only, no behavioral effect — both verifier sets unanimously rate this nit. But the changelog is the canonical history readers (and tooling) consult to chase recipe changes, and both issues actively break it: anyone clicking through the pr-link lands on a 404 (or unrelated repo), and anyone trying to compare the b300 search space against the referenced gb300 finds no such entry.

Suggested fix

- config-keys: - minimaxm2.5-fp4-b300-dynamo-vllm description: - "Add MiniMax-M2.5 NVFP4 B300 disaggregated multinode vLLM benchmarks via Dynamo" - "Image: vllm/vllm-openai:v0.20.1" - "Same 1k/1k and 8k/1k search space as gb200, plus a new tp8-1p1d at low concurrencies for both ISLs" pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1652

(or rephrase to point at the GB300 sibling PR number once it is open).

github-actions · 2026-06-03T02:00:55Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26857293503
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26857293503

github-actions · 2026-06-03T02:35:46Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26857293503
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26857293503

functionstackx · 2026-06-03T03:20:43Z

@Ankur-singh this is failing, can u take a look? https://github.com/SemiAnalysisAI/InferenceX/actions/runs/26857293503/job/79211249388?pr=1652

github-actions · 2026-06-03T03:42:01Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26857293503
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26857293503

functionstackx · 2026-06-03T05:23:45Z

/reuse-sweep-run

github-actions · 2026-06-03T05:25:17Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26865500202
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26865500202

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 15a96d5. Configure here.}

cursor · 2026-06-03T05:26:28Z

+    cd "$SRT_REPO_DIR" || exit 1
+    git checkout main
+    mkdir -p recipes/vllm/minimax-m2.5
+    cp -rT "$GITHUB_WORKSPACE/benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-b300" recipes/vllm/minimax-m2.5


Missing seeded venv for Dynamo

High Severity

New minimaxm2.5 dynamo-vllm recipes pin dynamo.wheel, but launch_b300-nv.sh still creates the srtctl venv with plain uv venv, so pip is missing and Dynamo wheel prefetch can fail before jobs start.

^{Reviewed by Cursor Bugbot for commit 15a96d5. Configure here.}

Ankur-singh added 2 commits June 2, 2026 17:22

perf-changelog: link minimaxm2.5-fp4-b300 entry to PR #83

57fb086

Ankur-singh requested a review from a team June 3, 2026 00:37

Ankur-singh requested review from jgangani and kedarpotdar-nv as code owners June 3, 2026 00:37

github-project-automation Bot added this to InferenceMAX Board Jun 3, 2026

perf-changelog: link minimaxm2.5-fp4-b300 entry to PR #1652

6a5a069

claude Bot reviewed Jun 3, 2026

View reviewed changes

Ankur-singh added the full-sweep-enabled label Jun 3, 2026

Merge branch 'main' into split-pr1560-minimax-b300

15a96d5

functionstackx merged commit 7d4063d into main Jun 3, 2026
16 of 22 checks passed

functionstackx deleted the split-pr1560-minimax-b300 branch June 3, 2026 05:25

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 3, 2026

cursor Bot reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[B300][vLLM] Add MiniMax-M2.5 FP4 disagg Dynamo configs#1652

[B300][vLLM] Add MiniMax-M2.5 FP4 disagg Dynamo configs#1652
functionstackx merged 4 commits into
mainfrom
split-pr1560-minimax-b300

Ankur-singh commented Jun 3, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

claude Bot Jun 3, 2026

Uh oh!

claude Bot Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

functionstackx commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

functionstackx commented Jun 3, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		- "Same 1k/1k and 8k/1k search space as gb300, plus a new tp8-1p1d at low concurrencies for both ISLs"
		pr-link: https://github.com/NVIDIA/InferenceMAX/pull/83

Conversation

Ankur-singh commented Jun 3, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

claude Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 3, 2026

Choose a reason for hiding this comment

Summary

1. Dangling gb300 reference (line 3418)

2. pr-link points to a different repo (line 3419)

Proof / step-by-step

Impact

Suggested fix

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

functionstackx commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

functionstackx commented Jun 3, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 3, 2026

Choose a reason for hiding this comment

Missing seeded venv for Dynamo

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ankur-singh commented Jun 3, 2026 •

edited by cursor Bot

Loading

1. Dangling `gb300` reference (line 3418)

2. `pr-link` points to a different repo (line 3419)