Skip to content

[NV] Add MiniMax-M2.5 FP8 GB300 Dynamo vLLM recipes#1647

Open
jasonlizhengjian wants to merge 7 commits into
mainfrom
nv/jasonli/minimaxm2.5-fp8-gb300-dynamo-vllm
Open

[NV] Add MiniMax-M2.5 FP8 GB300 Dynamo vLLM recipes#1647
jasonlizhengjian wants to merge 7 commits into
mainfrom
nv/jasonli/minimaxm2.5-fp8-gb300-dynamo-vllm

Conversation

@jasonlizhengjian
Copy link
Copy Markdown
Collaborator

@jasonlizhengjian jasonlizhengjian commented Jun 2, 2026

Summary

Add GB300 MiniMax-M2.5 FP8 Dynamo vLLM recipes.


Note

Low Risk
Benchmark and CI runner configuration only; no production serving, auth, or application logic changes.

Overview
Adds MiniMax-M2.5 FP8 disaggregated Dynamo + vLLM coverage on GB300 (minimaxm2.5-fp8-gb300-dynamo-vllm in nvidia-master.yaml), with fixed 1k/1k and 8k/1k scenarios that sweep prefill/decode worker counts, TP/EP, and concurrency via CONFIG_FILE pointers into new Slurm recipes.

Introduces a full recipe tree under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8/ (GB300 disagg layouts: 1p1d–5p2d, TP/EP/DP variants, Nixl KV transfer, fp8 cache, v0.20.1 + Dynamo wheel). Documents the config in perf-changelog.yaml.

Wires the gb300-nv launcher: minimaxm2.5 + fp8 model path/prefix, clones srt-slurm main and copies these recipes for dynamo-vllm, and makes eval artifact copy failures non-fatal.

Reviewed by Cursor Bugbot for commit f72628c. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Comment on lines +3 to +9
# model:
# path: "minimax-m2.5-fp8"
# container: "v0.18.1"
# precision: "fp8"

# dynamo:
# version: 1.0.1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Several new recipe files carry stale commented-out model: (referencing the older v0.18.1 container) and dynamo: version: 1.0.1 blocks immediately above the active config, which already specifies vllm/vllm-openai:v0.20.1 + wheel: 1.2.0.dev20260526. These dead blocks have no runtime effect (YAML ignores comments) but are copy-paste leftovers that contradict the live values and hurt readability. Same hygiene issue appears in 1k1k/disagg-gb300-2p1d-dep8.yaml, 1k1k/disagg-gb300-2p2d-dep4.yaml, 1k1k/disagg-gb300-2p2d-dep4-hi-conc.yaml, 8k1k/disagg-gb300-3p1d-dep4.yaml, and 8k1k/disagg-gb300-3p1d-dep4-hi-conc.yaml — please drop the commented blocks (and the trailing blank lines) before merge.

Extended reasoning...

What the bug is. Five of the new recipe files in this PR begin with a commented-out model: block pinning container: "v0.18.1" and a commented-out dynamo: block pinning version: 1.0.1, sitting directly above the active model:/dynamo: blocks that pin the current versions (vllm/vllm-openai:v0.20.1 and wheel: 1.2.0.dev20260526). The commented values are clearly stale — they are an earlier iteration of the same fields that the live blocks now supersede.\n\nWhere it appears. Verified in the PR diff:\n- benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8/1k1k/disagg-gb300-2p1d-dep8.yaml (lines 3–10)\n- benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8/1k1k/disagg-gb300-2p2d-dep4.yaml\n- benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8/1k1k/disagg-gb300-2p2d-dep4-hi-conc.yaml\n- benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8/8k1k/disagg-gb300-3p1d-dep4.yaml\n- benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8/8k1k/disagg-gb300-3p1d-dep4-hi-conc.yaml\n\nThe sibling recipes in the same directories (e.g. 1k1k/disagg-gb300-1p1d-tp4.yaml, 1k1k/disagg-gb300-2p3d-dep2.yaml, the various 8k1k *-tp4ep.yaml files, etc.) do not contain the dead blocks, which suggests these five files were copy-pasted from an older template before the cleanup was applied uniformly.\n\nRuntime impact. None. YAML parsers strip # lines, so srtctl will use the live model: and dynamo: mappings as intended. This is purely a hygiene/readability issue — when a reader scans the recipe to confirm which container/wheel is in use, they see two conflicting sets of values and have to mentally diff them.\n\nWhy this is worth fixing. Stale duplicate blocks rot: the next time someone bumps the container or wheel version, they may forget that the commented block exists and leave it referencing a version that no longer matches any reality. Cleaner recipes also make grep-based audits ("which recipes pin v0.18.1?") less noisy.\n\nStep-by-step proof. Take 1k1k/disagg-gb300-2p1d-dep8.yaml lines 1–22:\n\nyaml\nname: "minimax-m2.5-vllm-disagg-gb300-2p1d-dep8"\n\n# model: <- stale block starts\n# path: "minimax-m2.5-fp8"\n# container: "v0.18.1"\n# precision: "fp8"\n\n# dynamo:\n# version: 1.0.1\n# install: true <- stale block ends\n\nmodel: <- live block (the one that takes effect)\n path: "minimax-m2.5-fp8"\n container: "vllm/vllm-openai:v0.20.1"\n precision: "fp8"\n\ndynamo:\n install: true\n wheel: "1.2.0.dev20260526"\n\n\nThe two blocks specify different containers (v0.18.1 vs vllm/vllm-openai:v0.20.1) and different dynamo versioning schemes (version: 1.0.1 vs wheel: 1.2.0.dev20260526). Only the live block reaches srtctl, but the contradiction is visible to every human reader.\n\nHow to fix. Delete the leading commented model: and dynamo: blocks (and the extra blank lines they leave behind) from the five affected files so each one starts directly with the active model: mapping — matching the structure of the sibling recipes in the same directories that don't have this issue. Severity: nit — no runtime effect, but trivial to clean up before merge.

Comment on lines +143 to +148
elif [[ $FRAMEWORK == "dynamo-vllm" && $MODEL_PREFIX == "minimaxm2.5" && $PRECISION == "fp8" ]]; then
git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR"
cd "$SRT_REPO_DIR"
git checkout main
mkdir -p recipes/vllm/minimax-m2.5-fp8
cp -rT "$GITHUB_WORKSPACE/benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8" recipes/vllm/minimax-m2.5-fp8
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new minimaxm2.5 branch in runners/launch_gb300-nv.sh runs git checkout main on github.com/NVIDIA/srt-slurm.git, which is the only mutable-HEAD reference in this file — every other clone target pins to a specific commit (6e34b8b for agentic) or a stable branch (aflowers/gb200-dsv4-recipes for dsv4, sa-submission-q2-2026 for glm5 and the default fallback). Future upstream schema/CLI changes in srt-slurm will silently break these MiniMax recipes without any code change here. Pin to a commit or to sa-submission-q2-2026 (matching the default branch) to keep these recipes reproducible.

Extended reasoning...

What's wrong

In runners/launch_gb300-nv.sh:140-148, the newly added MiniMax-M2.5 branch clones github.com/NVIDIA/srt-slurm.git and then does git checkout main:

elif [[ $FRAMEWORK == "dynamo-vllm" && $MODEL_PREFIX == "minimaxm2.5" && $PRECISION == "fp8" ]]; then
    git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR"
    cd "$SRT_REPO_DIR"
    git checkout main
    mkdir -p recipes/vllm/minimax-m2.5-fp8
    cp -rT "$GITHUB_WORKSPACE/benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8" recipes/vllm/minimax-m2.5-fp8

This is the only git checkout main in the entire file — every other branch of the if/elif/else chain pins to a more stable reference.

Comparison to existing convention

Looking at the surrounding branches in the same if/elif chain:

Branch Ref used Stability
IS_AGENTIC (cquil11/srt-slurm-nv) 6e34b8b83229634d732e41a4e2d6595f46ef60b5 Pinned commit
dynamo-vllm + dsv4 aflowers/gb200-dsv4-recipes Stable feature branch
dynamo-sglang + glm5 sa-submission-q2-2026 Stable submission branch
dynamo-vllm + minimaxm2.5 main Mutable upstream HEAD
Default fallback sa-submission-q2-2026 Stable submission branch

The main branch of NVIDIA/srt-slurm is an external upstream repo not controlled by this project. Any commit that lands there — schema changes to DynamoConfig, renames to sbatch_directives/srun_options, CLI changes in srtctl apply, etc. — will be picked up the next time this script runs, with no PR ever landing in InferenceX.

Why this matters

The broader file is deliberately paranoid about pinning. The comment block at lines ~99-121 explicitly enumerates which upstream schema features the agentic recipes depend on (DynamoConfig.wheel, sbatch_directives, srun_options, BenchmarkType.CUSTOM + benchmark.command + benchmark.env) — exactly because these are the things that drift upstream. The new MiniMax recipes themselves continue this pattern: every one of the 16 new YAMLs pins dynamo.wheel: "1.2.0.dev20260526". Leaving the srt-slurm ref unpinned defeats that paranoia for this one model.

Step-by-step proof of the failure mode

  1. Today: Author opens PR, runs the workflow. git checkout main resolves to NVIDIA/srt-slurm HEAD at commit X. Recipes work; PR merges.
  2. Two weeks later: An upstream PR renames safetensors-load-strategy to safetensors-load-mode in srt-slurm's vLLM recipe schema, or removes the top-level dynamo.wheel field, or changes how kv-transfer-config is parsed. (These are all examples of the schema fields the new MiniMax recipes use.) main advances to commit Y.
  3. A regular scheduled run of minimaxm2.5-fp8-gb300-dynamo-vllm fires. The script clones, git checkout main lands on Y, copies the recipe YAMLs in, and srtctl apply fails with a parse error / unknown-field error — or worse, silently changes semantics (e.g. ignores the new field name and benchmarks with different settings).
  4. Nothing changed in InferenceX. There's no PR to bisect, no commit to revert. The on-call has to dig through NVIDIA/srt-slurm history to find the upstream change.

Contrast: with a pinned commit (e.g. git checkout <sha>) or a stable branch (sa-submission-q2-2026), the same upstream change is invisible until someone explicitly bumps the ref in this file — at which point the diff is reviewable and the breakage is tied to a PR.

How to fix

Replace git checkout main with one of:

  • git checkout sa-submission-q2-2026 — matches the default fallback and the glm5 branch; simplest fix.
  • git checkout <full SHA> — matches the agentic-branch pattern; strongest reproducibility guarantee. Use whatever commit on main the author actually tested against.

Either is a one-line change.

Comment thread perf-changelog.yaml Outdated
description:
- "Add MiniMax-M2.5 FP8 GB300 disaggregated multinode vLLM benchmarks via Dynamo"
- "Add 1k1k/8k1k FP8 recipe set under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8/"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new entry in perf-changelog.yaml (line 3404) uses a placeholder pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX instead of this PR's actual number (#1647). Please update it to https://github.com/SemiAnalysisAI/InferenceX/pull/1647 so the changelog doesn't link to a non-existent PR after merge.

Extended reasoning...

What the bug is. The new changelog entry added in this PR (perf-changelog.yaml lines 3398-3404) contains a literal placeholder string XXXX in its pr-link field instead of a real PR number:\n\nyaml\n- config-keys:\n - minimaxm2.5-fp8-gb300-dynamo-vllm\n description:\n - "Add MiniMax-M2.5 FP8 GB300 disaggregated multinode vLLM benchmarks via Dynamo"\n - "Add 1k1k/8k1k FP8 recipe set under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5-fp8/"\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX\n\n\nConvention in the file. Every other entry in perf-changelog.yaml references a real, resolvable PR number. Adjacent entries use /pull/1627, /pull/1631, /pull/1588, /pull/1630, etc. The XXXX is clearly a template placeholder that the author forgot to fill in before pushing the final commit.\n\nStep-by-step proof.\n1. Open this PR's metadata — the PR number is 1647.\n2. Open perf-changelog.yaml at line 3404 in this diff: the value is the literal string https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX.\n3. After merge, anyone clicking that link from the changelog will navigate to github.com/SemiAnalysisAI/InferenceX/pull/XXXX, which is not a valid PR number and will 404.\n4. Compare to the entry immediately above (line 3397): pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1627 — a real PR. The convention is unambiguous.\n\nWhy existing code doesn't prevent it. perf-changelog.yaml is plain documentation YAML with no schema validation that checks whether the URL resolves; the CI doesn't fetch each pr-link to confirm the PR exists. The placeholder will silently pass any structural YAML lint.\n\nImpact. Documentation-only. There is no runtime effect — the file is metadata describing the new config key minimaxm2.5-fp8-gb300-dynamo-vllm. However, the changelog is the canonical record of when each config was added and which PR introduced it; a permanently broken link makes future archaeology (e.g. "why was this benchmark added, what was the context") harder than it needs to be.\n\nFix. Replace XXXX with 1647 on perf-changelog.yaml:3404:\nyaml\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1647\n

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

prefill_workers: 1
decode_workers: 1
gpus_per_prefill: 1
gpus_per_decode: 4
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing GB300 slurm limits

Medium Severity

New GB300 MiniMax Dynamo vLLM recipes jump straight from setup_script to resources and never set slurm.time_limit or health_check, unlike existing GB300 vLLM disagg recipes on the same runner. Jobs therefore keep the launcher’s four-hour default while engine startup allows up to an hour, so long multi-concurrency sweeps can be killed before benchmarks finish.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit cc720f0. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8f59225. Configure here.

safetensors-load-strategy: "prefetch"
trust-remote-code: true
no-enable-prefix-caching: true
stream-interval: 32
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing max-model-len on 8k1k

High Severity

New GB300 8k1k disagg recipes set benchmark.isl to 8192 and osl to 1024 but omit max-model-len on prefill and decode in vllm_config. Matching GB200 MiniMax 8k1k recipes set max-model-len: 9280 on both sides so total sequence length fits disagg KV transfer; without that cap, vLLM may reject 8192-token inputs or reserve KV for the model’s full context and fail at runtime.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8f59225. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant