Skip to content

[NV] Add MiniMax-M2.5 FP8 GB200 Dynamo vLLM recipes#1648

Open
jasonlizhengjian wants to merge 3 commits into
mainfrom
nv/jasonli/minimaxm2.5-fp8-gb200-dynamo-vllm
Open

[NV] Add MiniMax-M2.5 FP8 GB200 Dynamo vLLM recipes#1648
jasonlizhengjian wants to merge 3 commits into
mainfrom
nv/jasonli/minimaxm2.5-fp8-gb200-dynamo-vllm

Conversation

@jasonlizhengjian
Copy link
Copy Markdown
Collaborator

@jasonlizhengjian jasonlizhengjian commented Jun 2, 2026

Summary

Add GB200 MiniMax-M2.5 FP8 Dynamo vLLM recipes.


Note

Medium Risk
Launcher changes affect job submission paths and shared filesystem layout for minimaxm2.5 on GB200; misconfigured paths could break sweeps, but there is no production app or auth logic in scope.

Overview
Adds MiniMax-M2.5 FP8 on GB200 as a new minimaxm2.5-fp8-gb200-dynamo-vllm benchmark in nvidia-master.yaml: multinode disaggregated Dynamo+vLLM (v0.20.1) with fixed-seq-len sweeps at 1k/1k and 8k/1k, each search-space point wiring prefill/decode worker counts, TP/EP, and a CONFIG_FILE under the new recipe tree.

Introduces nine Slurm recipe YAMLs under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8/ (Nixl KV transfer, FP8 KV cache, varying 1P/2P × 1D–4D topologies and concurrency lists).

Updates runners/launch_gb200-nv.sh for this model: Lustre model path, clone NVIDIA/srt-slurm main with recipe overlay, and watchtower-specific shared-FS staging for srt-slurm, INFMAX_WORKSPACE, /usr/bin/python3 venv, plus stricter CONFIG_FILE / make setup error handling. Documents the config in perf-changelog.yaml.

Reviewed by Cursor Bugbot for commit f601cfe. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 90e5193. Configure here.

Comment on lines +3 to +21
# model:
# path: "minimax-m2.5-fp8"
# container: "v0.18.1"
# precision: "fp8"

# dynamo:
# version: 1.0.1
# install: true

model:
path: "minimax-m2.5-fp8"
container: "vllm/vllm-openai:v0.20.1"
precision: "fp8"

dynamo:
install: true
wheel: "1.2.0.dev20260526"


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Lines 3-10 of disagg-gb200-2p1d-dep8.yaml contain commented-out model: and dynamo: blocks referencing a stale container (v0.18.1) and dynamo (version: 1.0.1), immediately superseded by the real config on lines 12-19. There are also extra blank lines (20-21) and commented-out benchmark fields (warmup_prompts, use_chat_template, req_rate, random_range_ratio) at the bottom. None of the sibling recipes in this directory carry this clutter — please remove for consistency.

Extended reasoning...

This is a dead-configuration cleanup issue in a single recipe file. The file benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8/1k1k/disagg-gb200-2p1d-dep8.yaml was added new in this PR, and it carries leftover commented-out blocks from an earlier iteration that the other six sibling recipes (1p1d-tp4, 1p2d-tp4, 1p3d-tp4ep, 1p4d-dep2, 2p3d-dep4, and the three 8k1k variants) do not have.\n\nWhat the bug is: Lines 3-10 of the file contain a commented-out model: block (container: "v0.18.1") and dynamo: block (version: 1.0.1, no wheel: field) which look like an earlier draft of the schema. These are immediately followed on lines 12-19 by the real, current config (container: "vllm/vllm-openai:v0.20.1", wheel: "1.2.0.dev20260526"). Lines 20-21 are two consecutive blank lines (every other recipe in the directory has a single blank line separator). At the bottom, there are four commented-out benchmark: options (# warmup_prompts: 1, # use_chat_template: false, # req_rate: "inf", # random_range_ratio: 1.0) — also not present in the sibling recipes.\n\nStep-by-step proof:\n1. Diff the new file against any sibling, e.g. diff disagg-gb200-2p1d-dep8.yaml disagg-gb200-2p3d-dep4.yaml.\n2. The first hunk shows that 2p1d-dep8.yaml has lines 3-10 (commented-out model:/dynamo: blocks with the stale v0.18.1 container reference) that 2p3d-dep4.yaml does not.\n3. The second hunk shows an extra blank line between the dynamo: block and setup_script:.\n4. The third hunk shows the four trailing # warmup_prompts: … / # use_chat_template: … / # req_rate: … / # random_range_ratio: … lines at the bottom that no other recipe carries.\n\nImpact / why it doesn't affect functionality: YAML parsers strip comments, so the launcher and srtctl will read exactly the same fields as the other recipes. The container version that will actually be used is vllm/vllm-openai:v0.20.1 (correct, matches .github/configs/nvidia-master.yaml and the other recipes), not the stale v0.18.1. So this is purely cosmetic — no behavior change.\n\nAddressing the refutation: I agree with the refuting verifier that this has zero functional impact. I'm filing this as a nit rather than a normal bug because it is a style/cleanup issue, and only because the stale v0.18.1 container reference inside the dead comment is potentially confusing for a future maintainer trying to bisect or audit container versions across recipes. The fix is trivial: delete lines 3-10, the duplicate blank line, and the four trailing commented benchmark fields so this file matches the shape of its six siblings. Feel free to ignore if you'd rather keep the file as-is — it won't break anything.\n\nSuggested fix: Remove lines 3-10 (the commented model:/dynamo: blocks), collapse lines 20-21 to a single blank line, and remove the trailing # warmup_prompts: 1 / # use_chat_template: false / # req_rate: "inf" / # random_range_ratio: 1.0 lines at the bottom.

Comment on lines +207 to +212
elif [[ $FRAMEWORK == "dynamo-vllm" && $MODEL_PREFIX == "minimaxm2.5" && $PRECISION == "fp8" ]]; then
git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR" || exit 1
cd "$SRT_REPO_DIR" || exit 1
git checkout main || exit 1
mkdir -p recipes/vllm/minimax-m2.5-gb200-fp8 || exit 1
cp -rT "$GITHUB_WORKSPACE/benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8" recipes/vllm/minimax-m2.5-gb200-fp8 || exit 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new minimaxm2.5 path does git checkout main on NVIDIA/srt-slurm.git, while every other branch in this same if/elif chain pins to a stable named ref (e.g. aflowers/vllm-gb200-v0.20.0, sa-submission-q2-2026, cam/sa-submission-q2-2026). Tracking main makes this recipe non-reproducible — any upstream commit to srt-slurm can silently change benchmark behavior at the next CI run. Please pin to a specific branch (e.g. sa-submission-q2-2026 like the other dynamo-vllm paths) or a commit SHA.

Extended reasoning...

What the bug is

In runners/launch_gb200-nv.sh the newly added minimaxm2.5 branch clones https://github.com/NVIDIA/srt-slurm.git and then runs git checkout main:

elif [[ $FRAMEWORK == "dynamo-vllm" && $MODEL_PREFIX == "minimaxm2.5" && $PRECISION == "fp8" ]]; then
    git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR" || exit 1
    cd "$SRT_REPO_DIR" || exit 1
    git checkout main || exit 1
    mkdir -p recipes/vllm/minimax-m2.5-gb200-fp8 || exit 1
    cp -rT "$GITHUB_WORKSPACE/benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8" recipes/vllm/minimax-m2.5-gb200-fp8 || exit 1

Why this is inconsistent with the rest of the file

Every other branch of this same if/elif chain pins to a stable named ref:

  • IS_AGENTIC=1cam/sa-submission-q2-2026 (--single-branch)
  • dynamo-vllm + dsv4 → aflowers/vllm-gb200-v0.20.0
  • dynamo-sglang + dsv4 → sa-submission-q2-2026
  • generic dynamo-vllm fallback → sa-submission-q2-2026
  • dynamo-trt + kimik2.5 → sa-submission-q2-2026
  • default fallback → cam/sa-submission-q2-2026

The new minimaxm2.5 branch is the only one tracking main.

Why this matters (the impact)

main of NVIDIA/srt-slurm is the active development branch, so any upstream commit can silently change behavior of this benchmark between identical reruns:

  • Recipe layout under recipes/vllm/... could move, breaking the cp -rT overlay (which writes into recipes/vllm/minimax-m2.5-gb200-fp8).
  • srtctl CLI flags or YAML schema could change, breaking srtctl apply -f $CONFIG_PATH --tags ... and the name: rewrite via sed.
  • The container alias (e.g. dynamo-vllm → image) that srtctl resolves from srtslurm.yaml could be renamed.
  • make setup ARCH=aarch64 targets could change.

None of these would surface in PR CI — they would only appear later as a silent regression in published benchmark numbers, with no diff in this repo to explain them. That defeats the entire point of having a perf-changelog entry tied to a specific PR.

Step-by-step proof this can manifest

  1. PR [NV] Add MiniMax-M2.5 FP8 GB200 Dynamo vLLM recipes #1648 merges. Today, NVIDIA/srt-slurm@main happens to expose srtctl apply -f <file> --tags ....
  2. Tomorrow, upstream renames the flag to --label and force-merges to main.
  3. The next scheduled CI run for minimaxm2.5-fp8-gb200-dynamo-vllm clones main, picks up the rename, and srtctl apply ... --tags ... exits non-zero. The benchmark fails with no code change on our side.

Alternatively, in a subtler scenario:

  1. Upstream changes the default value of an srtctl knob (timeout, retry, batch policy).
  2. CI still passes, but the published throughput numbers drift, and the recorded perf-changelog entry no longer corresponds to a reproducible configuration.

The dsv4 branch right above explicitly comments that it pins to aflowers/vllm-gb200-v0.20.0 because it expects a specific srtctl + container-alias surface. The minimaxm2.5 branch needs the same treatment.

How to fix

Replace git checkout main with the same branch the other dynamo-vllm paths use (e.g. sa-submission-q2-2026), or pin to a specific SHA if the recipe needs srt-slurm changes that haven't landed in that branch yet. Matching the sibling pattern is the minimal change:

git checkout sa-submission-q2-2026

(or git clone --branch <ref> --single-branch ... to make the pin explicit at clone time, matching the IS_AGENTIC / default branches).

@@ -0,0 +1,69 @@
name: "minimax-m2.5-vllm-disagg-gb200-1p1d-tp4ep"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Nit: the name field here is minimax-m2.5-vllm-disagg-gb200-1p1d-tp4ep, missing the 8k1k- segment that the other two 8k1k recipes in this directory use (...-8k1k-1p1d-tp4 and ...-8k1k-3p2d-dep4). No runtime impact — launch_gb200-nv.sh sed-overwrites this with ${RUNNER_NAME} before submission — but worth aligning with its siblings for consistency.

Extended reasoning...

What's inconsistent

Within benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8/8k1k/, the three new recipe files use these name: values:

  • disagg-gb200-1p1d-tp4.yamlminimax-m2.5-vllm-disagg-gb200-8k1k-1p1d-tp4
  • disagg-gb200-1p1d-tp4ep.yamlminimax-m2.5-vllm-disagg-gb200-1p1d-tp4ep ← missing 8k1k-
  • disagg-gb200-3p2d-dep4.yamlminimax-m2.5-vllm-disagg-gb200-8k1k-3p2d-dep4

The two other 8k1k files encode the ISL/OSL bucket; this one doesn't.

Why it's a nit, not a bug

runners/launch_gb200-nv.sh runs sed -i "s/^name:.*/name: \"${RUNNER_NAME}\"/" "$CONFIG_PATH" before srtctl apply, so the in-file name is replaced at runtime and never reaches tags, labels, or logs as written. The refutation correctly points out that the 1k1k siblings also omit the bucket from their names (e.g. minimax-m2.5-vllm-disagg-gb200-1p1d-tp4), so there isn't a single repo-wide convention — the inconsistency is purely local to the 8k1k directory.

Step-by-step

  1. Open 8k1k/disagg-gb200-1p1d-tp4.yaml line 1 → name contains -8k1k-.
  2. Open 8k1k/disagg-gb200-1p1d-tp4ep.yaml line 1 → name omits -8k1k-.
  3. Open 8k1k/disagg-gb200-3p2d-dep4.yaml line 1 → name contains -8k1k-.
  4. At runtime, launch_gb200-nv.sh line ~325 unconditionally overwrites name: with $RUNNER_NAME, so no log/tag/output uses the in-file value.

Suggested fix

Change the file's first line to:

name: "minimax-m2.5-vllm-disagg-gb200-8k1k-1p1d-tp4ep"

This matches its two 8k1k siblings and removes a small source of confusion for anyone grepping recipe files directly. Not a blocker.

Comment thread perf-changelog.yaml Outdated
Comment on lines +3400 to +3404
- minimaxm2.5-fp8-gb200-dynamo-vllm
description:
- "Add MiniMax-M2.5 FP8 GB200 disaggregated multinode vLLM benchmarks via Dynamo"
- "Add 1k1k/8k1k FP8 recipe set under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8/"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new entry in perf-changelog.yaml uses a placeholder pr-link 'https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX' that was never filled in. Please replace XXXX with this PR's number (1648) so the changelog entry links back to the actual PR like every other entry in the file.

Extended reasoning...

Bug\n\nLine 3404 of perf-changelog.yaml contains a literal placeholder:\n\nyaml\npr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX\n\n\nThe XXXX was clearly copied from the changelog entry template and never replaced with the actual PR number. The actual PR number is 1648 (per the PR metadata for this change).\n\n### Why this is wrong\n\nEvery other entry in perf-changelog.yaml uses a real, resolvable PR URL. A quick survey of the entries immediately preceding this one shows real PR numbers, e.g.:\n\n- line 3373: .../pull/1626\n- line 3379: .../pull/1630\n- line 3385: .../pull/1631\n- line 3391: .../pull/1588\n- line 3397: .../pull/1627\n\nThe new entry breaks that pattern with an unfilled template token.\n\n### Impact\n\nThis is a documentation/metadata issue rather than a runtime correctness bug — none of the benchmark recipes or shell scripts read pr-link at execution time. However, the changelog is the canonical place users (and tooling) look up the provenance of a perf change, so a dangling pull/XXXX link:\n\n- 404s if anyone clicks it\n- defeats automated tooling that may follow pr-link URLs (changelog generators, release notes, etc.)\n- silently propagates the template default if not caught now\n\n### Step-by-step proof\n\n1. Open perf-changelog.yaml and scroll to the bottom (lines 3399–3404):\n yaml\n - config-keys:\n - minimaxm2.5-fp8-gb200-dynamo-vllm\n description:\n - "Add MiniMax-M2.5 FP8 GB200 disaggregated multinode vLLM benchmarks via Dynamo"\n - "Add 1k1k/8k1k FP8 recipe set under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-gb200-fp8/"\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX\n \n2. Note the literal XXXX placeholder.\n3. The PR title is "[NV] Add MiniMax-M2.5 FP8 GB200 Dynamo vLLM recipes" and its number is 1648 — so this entry should read:\n yaml\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1648\n \n\n### Fix\n\nReplace XXXX with 1648 on line 3404.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant