[WIP] agentX v0.4 by cquil11 · Pull Request #1640 · SemiAnalysisAI/InferenceX

cquil11 · 2026-06-02T17:52:24Z

No description provided.

resolve_trace_source() now picks a model-prefix-aware default: MODEL_PREFIX == dsv4 -> semianalysis_cc_traces_weka_with_subagents (052726, the v5 baseline, unchanged for continuity with prior DSv4 published runs) everything else -> semianalysis_cc_traces_weka_with_subagents_060226 (060226, newer v6 corpus with fresher CC recording windows) WEKA_LOADER_OVERRIDE still wins. Allowed values widened from the two 052726 loaders to all four: semianalysis_cc_traces_weka_with_subagents (052726) semianalysis_cc_traces_weka_with_subagents_256k (052726-256k) semianalysis_cc_traces_weka_with_subagents_060226 (060226) semianalysis_cc_traces_weka_with_subagents_060226_256k (060226-256k) Bumps utils/aiperf submodule to de3ad1c1, which registers the two 060226 plugin entries those new loader names resolve through. The pre-cache log line now also includes MODEL_PREFIX so it's obvious in CI which default fired. Signed-off-by: Cam Quilici <cameron@semianalysis.com>

github-actions · 2026-06-02T17:52:41Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

…dsr1-trt entry Reorganizes both master YAMLs so all pure-agentic (agentic-coding-only) recipes sit at the bottom of the file behind an "# Agentic configs" divider, separated from fixed-seq-len / synthetic / prefix-share entries above. No functional change to any non-agentic recipe. nvidia-master.yaml: splits dsr1-fp4-b200-dynamo-trt — which previously mixed fixed-seq-len + agentic-coding in one entry — into the original entry (fixed-seq-len only) plus a new sibling dsr1-fp4-b200-dynamo-trt-agentic carrying the agentic-coding scenario. 22 pure-agentic entries moved. amd-master.yaml: no split needed (no combined entries); 9 pure-agentic entries moved to the end. Verified via deep YAML parse: nvidia adds 1 key (the split sibling) and modifies the source key's scenarios from [agentic-coding, fixed-seq-len] to [fixed-seq-len]; amd has 0 keys added/removed/modified. All other entries are byte-equal after round-trip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bumps every non-comment `image:` line in both master configs to the unsuffixed v0.22.0 tag: - vllm/vllm-openai:* -> vllm/vllm-openai:v0.22.0 - vllm/vllm-openai-rocm:* -> vllm/vllm-openai-rocm:v0.22.0 Covers all prior variants: v0.17–v0.21 numbered releases, the -cu130 / -ubuntu2404 / deepseekv4-cu129 build-variant tags, and the nightly-<sha> ROCm pins (which were holding DSv4 ROCm support that has since landed in the tagged release). Comment-line tag references in the agentic divergence change-log blocks are intentionally untouched so their "X -> Y" history reads correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Removes ~240 lines of slop comments that no longer earn their keep: - "Diverged from X (agentic-coding sibling)..." rationale blocks (24 occurrences) — the sibling split is now durable and the "preserved on main" framing isn't meaningful on a branch - "Net-new agentic recipes from chore/agentx-v0.3" PR-context headers - "agentic-coding sibling — temporarily disabled" + the entire commented-out qwen3.5-bf16-b200-sglang-agentic placeholder block - Orphan boundary comments ("# DSv4-Pro FP4 on MI355X via SGLang. Uses a rocm720..." / "# DSv4 on MI355X via vLLM, using the official vllm/vllm-openai-rocm nightly...") that were stranded by prior entry moves - Inline image-bump rationale that's now stale ("# Bumped from v0.19.1...", "# Same image as the INT4 sibling: v0.20.x...", "# Nightly carrying vllm-project/vllm@20cac26b...", "# v0.21.0 (released 2026-05-14)...") since everything is on v0.22.0 Verified via YAML deep-equal: 0 keys added/removed/modified in either file — purely comment removal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Picks up SemiAnalysisAI/aiperf@47e6e206, which adds the 060226 and 060226_256k loader names to the inferencex-agentx-mvp scenario's require_loader allowlist. Without this bump, dispatching any non-DSv4 agentic run on this branch fails preflight because benchmark_lib.sh now defaults the loader to the 060226 corpus. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Same root cause as 967c50c (h200-dgxc-slurm fix): vllm/vllm-openai images ship as non-root, and on b300-nv the pyxis/enroot config does NOT implicitly remap the calling user to UID 0 inside the container. benchmark_lib.sh::install_agentic_deps runs apt-get install -y git, which fails with "dpkg: error: requested operation requires superuser privilege" (see run 26844610474 / dsv4 b300 simple offloading). Adding --container-remap-root to the srun line matches b200-dgxc and h200-dgxc-slurm behavior; benchmark_lib.sh stays untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the simple unguarded download in every agentic recipe: - if [[ "$MODEL" != /* ]]; then hf download "$MODEL"; fi with the same MODEL_PATH-aware logic that the fixed-seq-len B300 recipes already use: if [[ -n "${MODEL_PATH:-}" ]]; then if [[ ! -d "$MODEL_PATH" || empty ]]; then hf download "$MODEL" --local-dir "$MODEL_PATH" fi else hf download "$MODEL" export MODEL_PATH="$MODEL" fi Effect: on clusters where launch_*.sh exports MODEL_PATH pointing at a pre-staged on-node copy (e.g. b300-nv sets it to /scratch/models/<basename>), the agentic recipe now correctly short- circuits the hf-download instead of re-pulling 700 GB of DSv4-Pro into $HOME/.cache/huggingface every run. Touches 33 scripts; same edit in each. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Companion to 360bcf0. That commit made the agentic recipes skip hf-download when MODEL_PATH was already pre-staged — but the recipes still invoked the server with the HF id ("vllm serve \$MODEL" / "--model-path \$MODEL"), so the engine looked up the HF cache (now empty, because we just skipped the download) and tried to download from scratch itself. With the model not in cache, vllm/sglang would deadlock in the auto-download path rather than fall through to a clean error. This commit aligns every agentic recipe with the fixed-seq-len B300 pattern verbatim: vllm serve "$MODEL_PATH" --served-model-name "$MODEL" python3 -m sglang.launch_server --model-path "$MODEL_PATH" --served-model-name "$MODEL" Net effect: server loads weights directly from /scratch/models/<name>/ (or wherever the launch script staged the model) and reports the HF id as the served-model-name for downstream tooling. Touches all 33 agentic scripts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The custom cquil/vllm-openai image integrates vllm-project/vllm#43447, which fixes the DSv4 sliding-window prefix-cache eviction issue. But the fix is opt-in via VLLM_PREFIX_CACHE_RETENTION_INTERVAL — without setting it, vllm falls back to the legacy cache-every-segment path that this PR was written to repair, so the trace-replay cache hit rate stays near 0% even though the patched code is loaded. Sets the env var to 32768 (32k tokens), matching the value the PR author validated to take cache hit rate from 0% -> 74% on a comparable agentic trace-replay benchmark. On stock vllm images that don't carry the patch, the env var is simply ignored — safe to land. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ention-interval env 7ead0a0 only carried the "Prepend uncached blocks in SWA free()" hunk of PR vllm-project/vllm#43447 — it did NOT modify vllm/envs.py to register the VLLM_PREFIX_CACHE_RETENTION_INTERVAL env var. That registration didn't land until commit 7c909f8 in the PR, and 6c529f30 is the latest merge of main into the PR branch. Effect: the export in dsv4_fp4_b300_vllm.sh (1bccc5c) finally takes effect — vllm stops logging "Unknown vLLM environment variable detected" and actually activates the SWA prefix-cache retention path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

DSv4 recipes inherit the benchmark_lib carveout that defaults to the 052726 corpus for backward-compat with prior published baselines. This recipe is opting out to ride the v6 060226 corpus that all non-DSv4 recipes already use, exercising the newer CC versions / longer-tail trace mix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…wen hicache config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ipts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-project-automation Bot added this to InferenceMAX Board Jun 2, 2026

cquil11 force-pushed the chore/agentx-v0.4 branch from c84254f to f632aa4 Compare June 2, 2026 18:08

cquil11 and others added 13 commits June 2, 2026 13:16

(testing) b300 dsv4 simple offloading

321fd44

[AMD] agentx-v0.4: add MiniMax/Kimi lmcache agentic entries, update Q…

ee8d743

…wen hicache config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

[AMD] agentx-v0.4: add MiniMax agentic script, refactor Kimi/Qwen scr…

616f4db

…ipts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] agentX v0.4#1640

[WIP] agentX v0.4#1640
cquil11 wants to merge 14 commits into
mainfrom
chore/agentx-v0.4

cquil11 commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cquil11 commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants