Skip to content

refactor: canonicalize launcher aliases and extract MAD_MULTI_NODE_RUNNER resolver#132

Open
coketaste wants to merge 1 commit into
developfrom
coketaste/launcher-update
Open

refactor: canonicalize launcher aliases and extract MAD_MULTI_NODE_RUNNER resolver#132
coketaste wants to merge 1 commit into
developfrom
coketaste/launcher-update

Conversation

@coketaste
Copy link
Copy Markdown
Collaborator

Summary

This PR cleans up launcher name handling and the MAD_MULTI_NODE_RUNNER generation logic for Docker local deployment, building on #126.

Launcher alias canonicalization

  • Adds canonicalize_distributed_launcher() in deployment/common.py with a _LAUNCHER_ALIASES dict (sglang_disagg → sglang-disagg). New aliases go here; dispatch sites stay clean.
  • Replaces inline launcher_type == "sglang-disagg" or launcher_type == "sglang_disagg" checks in k8s_template_context.py and slurm.py with a single call to the helper.

MAD_MULTI_NODE_RUNNER resolver extraction (Docker local)

  • Extracts the inline launcher resolution block in container_runner.py into _resolve_local_multi_node_runner_env(), paralleling the named methods in the SLURM and K8s paths.
  • Bug fix: sglang_disagg (underscore form) previously fell through to torchrun because the valid-launcher list only contained the hyphen form. Canonicalization now routes it correctly.
  • Behavior change: self-managing launchers (vllm, sglang, sglang-disagg, primus) now set MAD_MULTI_NODE_RUNNER="" instead of leaving it unset, so downstream scripts running under set -u
    don't fail on an undefined variable.

Tests

  • TestCanonicalizeDistributedLauncher — passthrough, alias, empty/None, unknown values
  • TestGenerateLocalLauncherCommand — torchrun family, deepspeed, self-managed → empty, unknown → torchrun fallback
  • TestResolveLocalMultiNodeRunnerEnv — full resolver: additional_context, model_info fallback, MAD_RUNTIME_NGPUS override, user-provided value preserved, unknown launcher warning,
    self-managed → "", sglang_disagg alias path

97 unit tests pass.

…NNER resolver

- Add canonicalize_distributed_launcher() to common.py with sglang_disagg → sglang-disagg alias
- Replace inline duplicate checks in k8s_template_context.py and slurm.py with the helper
- Extract _resolve_local_multi_node_runner_env() method in container_runner.py; self-managed
  launchers (vllm/sglang/sglang-disagg/primus) now always set MAD_MULTI_NODE_RUNNER="" so
  downstream scripts under set -u don't fail
- Add test coverage for new helper and extracted method, including sglang_disagg alias path

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
@coketaste coketaste self-assigned this May 29, 2026
Copilot AI review requested due to automatic review settings May 29, 2026 12:29
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors launcher alias handling and Docker-local MAD_MULTI_NODE_RUNNER resolution so distributed launcher names are canonicalized consistently and local runs get a defined runner environment variable.

Changes:

  • Adds canonicalize_distributed_launcher() with an alias map for sglang_disaggsglang-disagg.
  • Extracts Docker-local multi-node runner env resolution into _resolve_local_multi_node_runner_env().
  • Adds unit coverage for launcher canonicalization and local runner command/env generation.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/madengine/deployment/common.py Adds shared launcher alias canonicalization helper.
src/madengine/deployment/slurm.py Uses canonicalization in the SGLang disaggregated launcher dispatch.
src/madengine/deployment/k8s_template_context.py Uses canonicalization in Kubernetes SGLang disaggregated launcher dispatch.
src/madengine/execution/container_runner.py Extracts and updates Docker-local MAD_MULTI_NODE_RUNNER resolution.
tests/unit/test_deployment.py Adds tests for launcher canonicalization.
tests/unit/test_container_runner.py Adds tests for local launcher command and env resolution behavior.
CLAUDE.md Adds behavioral guidance documentation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

elif launcher_type == "sglang":
return self._generate_sglang_command(nnodes, nproc_per_node, master_port)
elif launcher_type == "sglang-disagg" or launcher_type == "sglang_disagg":
elif canonicalize_distributed_launcher(launcher_type) == "sglang-disagg":
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants