dsr1-fp4-mi355x-sglang: bump ROCm 7.0->7.2 image + add TP4 search-space by JohnQinAMD · Pull Request #1645 · SemiAnalysisAI/InferenceX

JohnQinAMD · 2026-06-02T20:04:27Z

What

Bump the SGLang DSR1-FP4 MI355X benchmark image from lmsysorg/sglang:v0.5.12-rocm700-mi35x (ROCm 7.0) to lmsysorg/sglang-rocm:v0.5.12-rocm720-mi35x-20260519 (ROCm 7.2), and add a tp:4 search-space point. Applies to both dsr1-fp4-mi355x-sglang and dsr1-fp4-mi355x-sglang-mtp.

Why

The SGLang DSR1-FP4 MI355X config is the last one still on ROCm 7.0 (the ATOM config is already on 7.2.3). Local MI355X benches with the matched recipe (range-ratio 0.8 + ROCM_QUICK_REDUCE_QUANTIZATION=INT4, MTP):

8k1k decode TPOT is 9-25% faster at every concurrency vs the current rocm700 CI.
At c64 the rocm720 build leads both axes: TPOT 16.5 vs 18.1 ms and tput 2954 vs 2841 tok/s/gpu.
TP4 at c64/8k1k reaches 4254 tok/s/gpu (> TP8's 2954) — adds a throughput-per-GPU frontier point.

The target image is the same one already validated multi-node in PR #1566, so it's low-risk.

Scope / safety

Recipe dsr1_fp4_mi355x_mtp.sh is unchanged (already has QR-INT4 + range-ratio 0.8 + the MLA/MTP flags; handles --tensor-parallel-size=4).
fp8-mi355x configs are untouched (still rocm700) — only the two fp4-sglang blocks change.
Diff: 5 insertions / 2 deletions in amd-master.yaml + one perf-changelog.yaml trigger entry.

Caveat

The ATOM config keeps 8k1k TP4 commented out (likely an OOM at high conc for that stack). For SGLang, TP4 (4 GPUs, ~90 GB MXFP4 weights) was benched cleanly at c4-64; if CI OOMs at tp4/8k1k, fall back to conc-start: 32 like ATOM.

Note

Low Risk
Benchmark-only YAML and changelog changes; serving recipes are unchanged and the target image was already validated on related MI355X workloads.

Overview
Updates DeepSeek-R1 FP4 MI355X SGLang benchmarks (dsr1-fp4-mi355x-sglang and -mtp) to move off the last ROCm 7.0 image onto lmsysorg/sglang-rocm:v0.5.12-rocm720-mi35x-20260519 (ROCm 7.2), matching the build already exercised on multi-node disagg in PR #1566.

The sweep matrix gains tp: 4 alongside existing tp: 8—8k1k for both configs, plus 1k1k for the MTP variant—to capture a better throughput-per-GPU point. agentic-coding on the non-MTP block is commented out for this validation PR to limit CI cost; perf-changelog.yaml records the image bump, TP4 addition, and local perf notes.

^{Reviewed by Cursor Bugbot for commit 4f52374. Bugbot is set up for automated code reviews on this repo. Configure here.}

Bump image lmsysorg/sglang:v0.5.12-rocm700-mi35x -> lmsysorg/sglang-rocm:v0.5.12-rocm720-mi35x-20260519 (ROCm 7.0->7.2; same image validated multi-node in PR #1566). Add tp:4 search-space. Recipe unchanged (already QR-INT4 + range-ratio 0.8). fp8-mi355x untouched. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-02T20:04:40Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-06-02T20:04:40Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

…1645) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

claude · 2026-06-02T20:12:17Z

+    - "Image: lmsysorg/sglang:v0.5.12-rocm700-mi35x -> lmsysorg/sglang-rocm:v0.5.12-rocm720-mi35x-20260519 (ROCm 7.0 -> 7.2; same image already validated multi-node in PR #1566)"
+    - "Add tp:4 to search-space (8k1k for non-mtp; 1k1k+8k1k for mtp) to surface the TP4 throughput-per-GPU frontier"
+    - "Local MI355X bench (matched cfg: range-ratio 0.8 + ROCM_QUICK_REDUCE_QUANTIZATION=INT4): 8k1k decode TPOT 9-25% faster at every conc vs the rocm700 CI; c64 now leads both axes (TPOT 16.5 vs 18.1 ms, tput 2954 vs 2841 tok/s/gpu). TP4 c64 8k1k = 4254 tok/s/gpu (> TP8 2954)"
+  pr-link: PR_LINK_PENDING


🔴 The new perf-changelog entry uses pr-link: PR_LINK_PENDING instead of the canonical placeholder https://github.com/SemiAnalysisAI/InferenceX/pull/XXX documented in AGENTS.md:128 and used by all ~100 existing entries. On a clean merge this non-URL string lands on main as-is; on a merge-conflict path it breaks utils/merge_with_reuse.sh (the script filters PR entries by 'XXX' in link or link.endswith('/pull/{pr}'), neither of which matches PR_LINK_PENDING, so the entry is silently skipped and the tool aborts with "No PR contributions found"). Replace with https://github.com/SemiAnalysisAI/InferenceX/pull/XXX (or the final /pull/1645 URL).

Extended reasoning...

What the bug is

perf-changelog.yaml line 3406 sets pr-link: PR_LINK_PENDING for the new entry. The repository's documented convention (AGENTS.md:128) is a real GitHub URL or the literal placeholder https://github.com/SemiAnalysisAI/InferenceX/pull/XXX, and every one of the ~100 existing entries in the file follows that format. PR_LINK_PENDING is neither, and nothing in the repo rewrites it before merge (a repo-wide grep finds the string only in perf-changelog.yaml itself).

Why this matters in practice

The auto-merge helper utils/merge_with_reuse.sh recognises a PR's own changelog entry via this filter (lines ~140-141):

link = str(entry.get('pr-link') or '') if 'XXX' not in link and not link.endswith(f'/pull/{pr}'): continue

PR_LINK_PENDING contains no XXX and does not end with /pull/1645, so the entry is silently skipped. The subsequent guard (if not contribs: sys.exit("No PR contributions found...")) then aborts the reuse-merge tool whenever a perf-changelog.yaml conflict arises against this PR — the documented recovery path is broken.

Step-by-step proof

Author opens PR dsr1-fp4-mi355x-sglang: bump ROCm 7.0->7.2 image + add TP4 search-space #1645 with pr-link: PR_LINK_PENDING in the new changelog entry.

Another perf-changelog PR lands first, creating a merge conflict on perf-changelog.yaml for dsr1-fp4-mi355x-sglang: bump ROCm 7.0->7.2 image + add TP4 search-space #1645.

A maintainer runs utils/merge_with_reuse.sh to auto-resolve, passing pr=1645.

The script iterates pr_data and computes link = "PR_LINK_PENDING". The check 'XXX' not in link and not link.endswith('/pull/1645') evaluates True and True == True, so continue runs and this PR's contribution is dropped.

After the loop, contribs is empty → sys.exit("No PR contributions found ...") fires; the auto-merge aborts.

Even without conflicts, the literal PR_LINK_PENDING ends up on main, breaking the documented convention and any tooling that parses pr-links as URLs.

How to fix

Replace pr-link: PR_LINK_PENDING with either the canonical placeholder https://github.com/SemiAnalysisAI/InferenceX/pull/XXX (which merge_with_reuse.sh and convention both recognise) or, since the PR number is known, the final https://github.com/SemiAnalysisAI/InferenceX/pull/1645. A one-character / one-line diff.

Fixed — pr-link was updated to the real URL in the second commit (4f52374) immediately after the PR was opened.

JohnQinAMD requested a review from a team June 2, 2026 20:04

JohnQinAMD requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners June 2, 2026 20:04

github-project-automation Bot added this to InferenceMAX Board Jun 2, 2026

perf-changelog: fill pr-link for dsr1-fp4-mi355x-sglang rocm720 bump (#…

4f52374

…1645) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

claude Bot reviewed Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dsr1-fp4-mi355x-sglang: bump ROCm 7.0->7.2 image + add TP4 search-space#1645

dsr1-fp4-mi355x-sglang: bump ROCm 7.0->7.2 image + add TP4 search-space#1645
JohnQinAMD wants to merge 2 commits into
mainfrom
dsr1-mi355x-sglang-rocm720-bump

JohnQinAMD commented Jun 2, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

claude Bot Jun 2, 2026

Uh oh!

JohnQinAMD Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JohnQinAMD commented Jun 2, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Scope / safety

Caveat

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

claude Bot Jun 2, 2026

Choose a reason for hiding this comment

What the bug is

Why this matters in practice

Step-by-step proof

How to fix

Uh oh!

JohnQinAMD Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JohnQinAMD commented Jun 2, 2026 •

edited by cursor Bot

Loading