Skip to content

[WIP] agentX v0.4#1640

Draft
cquil11 wants to merge 14 commits into
mainfrom
chore/agentx-v0.4
Draft

[WIP] agentX v0.4#1640
cquil11 wants to merge 14 commits into
mainfrom
chore/agentx-v0.4

Conversation

@cquil11
Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 commented Jun 2, 2026

No description provided.

resolve_trace_source() now picks a model-prefix-aware default:

  MODEL_PREFIX == dsv4  -> semianalysis_cc_traces_weka_with_subagents
                           (052726, the v5 baseline, unchanged for
                           continuity with prior DSv4 published runs)
  everything else       -> semianalysis_cc_traces_weka_with_subagents_060226
                           (060226, newer v6 corpus with fresher CC
                           recording windows)

WEKA_LOADER_OVERRIDE still wins. Allowed values widened from the
two 052726 loaders to all four:

  semianalysis_cc_traces_weka_with_subagents          (052726)
  semianalysis_cc_traces_weka_with_subagents_256k     (052726-256k)
  semianalysis_cc_traces_weka_with_subagents_060226   (060226)
  semianalysis_cc_traces_weka_with_subagents_060226_256k (060226-256k)

Bumps utils/aiperf submodule to de3ad1c1, which registers the two
060226 plugin entries those new loader names resolve through.

The pre-cache log line now also includes MODEL_PREFIX so it's obvious
in CI which default fired.

Signed-off-by: Cam Quilici <cameron@semianalysis.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@cquil11 cquil11 force-pushed the chore/agentx-v0.4 branch from c84254f to f632aa4 Compare June 2, 2026 18:08
cquil11 and others added 13 commits June 2, 2026 13:16
…dsr1-trt entry

Reorganizes both master YAMLs so all pure-agentic (agentic-coding-only)
recipes sit at the bottom of the file behind an "# Agentic configs"
divider, separated from fixed-seq-len / synthetic / prefix-share entries
above. No functional change to any non-agentic recipe.

nvidia-master.yaml: splits dsr1-fp4-b200-dynamo-trt — which previously
mixed fixed-seq-len + agentic-coding in one entry — into the original
entry (fixed-seq-len only) plus a new sibling dsr1-fp4-b200-dynamo-trt-agentic
carrying the agentic-coding scenario. 22 pure-agentic entries moved.

amd-master.yaml: no split needed (no combined entries); 9 pure-agentic
entries moved to the end.

Verified via deep YAML parse: nvidia adds 1 key (the split sibling) and
modifies the source key's scenarios from [agentic-coding, fixed-seq-len]
to [fixed-seq-len]; amd has 0 keys added/removed/modified. All other
entries are byte-equal after round-trip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps every non-comment `image:` line in both master configs to the
unsuffixed v0.22.0 tag:
  - vllm/vllm-openai:*           -> vllm/vllm-openai:v0.22.0
  - vllm/vllm-openai-rocm:*      -> vllm/vllm-openai-rocm:v0.22.0

Covers all prior variants: v0.17–v0.21 numbered releases, the -cu130 /
-ubuntu2404 / deepseekv4-cu129 build-variant tags, and the nightly-<sha>
ROCm pins (which were holding DSv4 ROCm support that has since landed in
the tagged release). Comment-line tag references in the agentic
divergence change-log blocks are intentionally untouched so their
"X -> Y" history reads correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes ~240 lines of slop comments that no longer earn their keep:

  - "Diverged from X (agentic-coding sibling)..." rationale blocks
    (24 occurrences) — the sibling split is now durable and the
    "preserved on main" framing isn't meaningful on a branch
  - "Net-new agentic recipes from chore/agentx-v0.3" PR-context headers
  - "agentic-coding sibling — temporarily disabled" + the entire
    commented-out qwen3.5-bf16-b200-sglang-agentic placeholder block
  - Orphan boundary comments ("# DSv4-Pro FP4 on MI355X via SGLang.
    Uses a rocm720..." / "# DSv4 on MI355X via vLLM, using the official
    vllm/vllm-openai-rocm nightly...") that were stranded by prior
    entry moves
  - Inline image-bump rationale that's now stale ("# Bumped from
    v0.19.1...", "# Same image as the INT4 sibling: v0.20.x...",
    "# Nightly carrying vllm-project/vllm@20cac26b...", "# v0.21.0
    (released 2026-05-14)...") since everything is on v0.22.0

Verified via YAML deep-equal: 0 keys added/removed/modified in either
file — purely comment removal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picks up SemiAnalysisAI/aiperf@47e6e206, which adds the 060226 and
060226_256k loader names to the inferencex-agentx-mvp scenario's
require_loader allowlist. Without this bump, dispatching any non-DSv4
agentic run on this branch fails preflight because benchmark_lib.sh
now defaults the loader to the 060226 corpus.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same root cause as 967c50c (h200-dgxc-slurm fix): vllm/vllm-openai
images ship as non-root, and on b300-nv the pyxis/enroot config does
NOT implicitly remap the calling user to UID 0 inside the container.
benchmark_lib.sh::install_agentic_deps runs apt-get install -y git,
which fails with "dpkg: error: requested operation requires superuser
privilege" (see run 26844610474 / dsv4 b300 simple offloading).

Adding --container-remap-root to the srun line matches b200-dgxc and
h200-dgxc-slurm behavior; benchmark_lib.sh stays untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the simple unguarded download in every agentic recipe:

  - if [[ "$MODEL" != /* ]]; then hf download "$MODEL"; fi

with the same MODEL_PATH-aware logic that the fixed-seq-len B300 recipes
already use:

  if [[ -n "${MODEL_PATH:-}" ]]; then
      if [[ ! -d "$MODEL_PATH" || empty ]]; then
          hf download "$MODEL" --local-dir "$MODEL_PATH"
      fi
  else
      hf download "$MODEL"
      export MODEL_PATH="$MODEL"
  fi

Effect: on clusters where launch_*.sh exports MODEL_PATH pointing at a
pre-staged on-node copy (e.g. b300-nv sets it to
/scratch/models/<basename>), the agentic recipe now correctly short-
circuits the hf-download instead of re-pulling 700 GB of DSv4-Pro
into $HOME/.cache/huggingface every run.

Touches 33 scripts; same edit in each.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Companion to 360bcf0. That commit made the agentic recipes skip
hf-download when MODEL_PATH was already pre-staged — but the recipes
still invoked the server with the HF id ("vllm serve \$MODEL" /
"--model-path \$MODEL"), so the engine looked up the HF cache (now
empty, because we just skipped the download) and tried to download from
scratch itself. With the model not in cache, vllm/sglang would deadlock
in the auto-download path rather than fall through to a clean error.

This commit aligns every agentic recipe with the fixed-seq-len B300
pattern verbatim:

  vllm serve "$MODEL_PATH" --served-model-name "$MODEL"
  python3 -m sglang.launch_server --model-path "$MODEL_PATH" --served-model-name "$MODEL"

Net effect: server loads weights directly from /scratch/models/<name>/
(or wherever the launch script staged the model) and reports the HF id
as the served-model-name for downstream tooling.

Touches all 33 agentic scripts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The custom cquil/vllm-openai image integrates vllm-project/vllm#43447,
which fixes the DSv4 sliding-window prefix-cache eviction issue. But the
fix is opt-in via VLLM_PREFIX_CACHE_RETENTION_INTERVAL — without setting
it, vllm falls back to the legacy cache-every-segment path that this PR
was written to repair, so the trace-replay cache hit rate stays near 0%
even though the patched code is loaded.

Sets the env var to 32768 (32k tokens), matching the value the PR author
validated to take cache hit rate from 0% -> 74% on a comparable agentic
trace-replay benchmark.

On stock vllm images that don't carry the patch, the env var is simply
ignored — safe to land.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ention-interval env

7ead0a0 only carried the "Prepend uncached blocks in SWA free()" hunk
of PR vllm-project/vllm#43447 — it did NOT modify vllm/envs.py to
register the VLLM_PREFIX_CACHE_RETENTION_INTERVAL env var. That
registration didn't land until commit 7c909f8 in the PR, and 6c529f30
is the latest merge of main into the PR branch.

Effect: the export in dsv4_fp4_b300_vllm.sh (1bccc5c) finally takes
effect — vllm stops logging "Unknown vLLM environment variable detected"
and actually activates the SWA prefix-cache retention path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DSv4 recipes inherit the benchmark_lib carveout that defaults to the
052726 corpus for backward-compat with prior published baselines. This
recipe is opting out to ride the v6 060226 corpus that all non-DSv4
recipes already use, exercising the newer CC versions / longer-tail
trace mix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…wen hicache config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ipts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants