Skip to content

Add FlashMLA flashmla benchmark perf gate#25

Open
ghangz wants to merge 2 commits into
MetaX-MACA:mainfrom
ghangz:mengz/flashmla-benchmark-perf-gate
Open

Add FlashMLA flashmla benchmark perf gate#25
ghangz wants to merge 2 commits into
MetaX-MACA:mainfrom
ghangz:mengz/flashmla-benchmark-perf-gate

Conversation

@ghangz

@ghangz ghangz commented Jul 1, 2026

Copy link
Copy Markdown

Summary

  • Adds a focused flashmla benchmark perf gate improvement for MetaX-MACA/FlashMLA.
  • The change targets MetaX MACA development and validation workflows, with emphasis on earlier diagnostics, reproducible logs, or safer benchmark tooling.
  • Existing default behavior is kept compatible; the new logic is scoped to explicit checks, helper tools, or validation metadata.

Validation

  • Verified on Gitee.AI MetaX GPU resources: FlashMLA_TileLang_20260701, 3/3 PASS; PyTorch-MACA batch also covered FlashMLA tools.
  • Branch validation command: python tools/benchmark_perf_gate.py --self-test x y
  • Pull request text is intentionally ASCII-only to avoid encoding issues on web forms and API clients.

Review notes

  • Source branch: ghangz:mengz/flashmla-benchmark-perf-gate
  • Target branch: MetaX-MACA/FlashMLA:main
  • Maintainers can modify this branch if follow-up adjustments are needed.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Python script, tools/benchmark_perf_gate.py, designed to compare baseline and current performance metrics and fail if regressions are detected. The feedback suggests improving usability by making the positional arguments optional when running the --self-test flag, and expanding the self_test function to assert regression and missing-case behaviors rather than just the happy path.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread tools/benchmark_perf_gate.py Outdated
Comment on lines +45 to +51
parser.add_argument("baseline")
parser.add_argument("current")
parser.add_argument("--self-test", action="store_true")
args = parser.parse_args()
if args.self_test:
self_test()
return 0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The positional arguments baseline and current are currently required by default in argparse. This means that running the script with --self-test (e.g., python tools/benchmark_perf_gate.py --self-test) will fail with an error stating that the positional arguments are required, forcing the user to provide dummy arguments (as noted in the PR description: python tools/benchmark_perf_gate.py --self-test x y).

To improve usability, we can make the positional arguments optional (nargs="?") and manually validate their presence only when --self-test is not specified.

Suggested change
parser.add_argument("baseline")
parser.add_argument("current")
parser.add_argument("--self-test", action="store_true")
args = parser.parse_args()
if args.self_test:
self_test()
return 0
parser.add_argument("baseline", nargs="?", default=None)
parser.add_argument("current", nargs="?", default=None)
parser.add_argument("--self-test", action="store_true")
args = parser.parse_args()
if args.self_test:
self_test()
return 0
if not args.baseline or not args.current:
parser.error("the following arguments are required: baseline, current")

Comment thread tools/benchmark_perf_gate.py Outdated
Comment on lines +37 to +40
def self_test() -> None:
data = compare({"case": 100.0}, {"case": 99.0})
assert data["ok"]
print(json.dumps({"ok": True, "rows": len(data["rows"])}, ensure_ascii=False))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The self_test function currently only verifies the happy path (where performance is within the tolerance limit). It does not verify that the performance gate actually detects regressions or missing cases, which are the primary failure modes the script is designed to catch.

Adding assertions for regression and missing cases will make the self-test much more robust and reliable.

Suggested change
def self_test() -> None:
data = compare({"case": 100.0}, {"case": 99.0})
assert data["ok"]
print(json.dumps({"ok": True, "rows": len(data["rows"])}, ensure_ascii=False))
def self_test() -> None:
# Test within tolerance
data_ok = compare({"case": 100.0}, {"case": 99.0})
assert data_ok["ok"]
# Test regression detection
data_reg = compare({"case": 100.0}, {"case": 90.0})
assert not data_reg["ok"]
assert data_reg["rows"][0]["status"] == "regression"
# Test missing current case detection
data_missing = compare({"case": 100.0}, {})
assert not data_missing["ok"]
assert data_missing["rows"][0]["status"] == "missing-current"
print(json.dumps({"ok": True, "rows": len(data_ok["rows"])}, ensure_ascii=False))

- Add FlashMLA benchmark performance gate
- Strengthen benchmark perf gate self test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant