Releases: NVIDIA/cloudai
Releases · NVIDIA/cloudai
v1.6.1-3
What's Changed
- Fix broken duplicate test name detection in TestParser.parse_all() by @rutayan-nv in #875
- Parsing: enhance error handling by @podkidyshev in #876
- fix various vllm/sglan bugs by @podkidyshev in #877
- vLLM, SGLang: fix long server start by @podkidyshev in #879
- vLLM, SGLang: cleanup fix for single-sbatch by @podkidyshev in #880
- Parsing: fix system config detection by @podkidyshev in #881
- vLLM, SGLang: custom healthcheck endpoint by @podkidyshev in #882
- fix secret scan false positive by @podkidyshev in #883
Full Changelog: v1.6.1-2...v1.6.1-3
v1.6.1-2
What's Changed
- Constraint failure reward override by @alexmanle in #865
- Amanley/reward overrides by @alexmanle in #869
- MegatronRun: fix
.loadtest + allow timeouts by @podkidyshev in #866 - Bump pytest from 9.0.2 to 9.0.3 by @dependabot[bot] in #871
- Bump pillow from 12.1.1 to 12.2.0 by @dependabot[bot] in #870
- Bump uv from 0.10.0 to 0.11.6 by @dependabot[bot] in #867
- Installables: submodules fix by @podkidyshev in #872
Full Changelog: v1.6.1-1...v1.6.1-2
v1.6.1-1
What's Changed
- Bump pygments from 2.19.2 to 2.20.0 by @dependabot[bot] in #853
- MBridge: revert metrics parsing by @podkidyshev in #862
- Installables: nested docker image path by @podkidyshev in #861
- Megatron Run: status check by @podkidyshev in #859
- Fix path expansion/storage by @amaslenn in #864
Full Changelog: v1.6.0b7...v1.6.1-1
v1.6 TP7
What's Changed
- Support CNI spec for NCCL over k8s by @amaslenn in #848
- Bump requests from 2.32.5 to 2.33.0 by @dependabot[bot] in #852
- MBridge: time limit managed by test run by @podkidyshev in #849
- CNI spec support for Dynamo @ k8s by @amaslenn in #854
- MBridge: using gpus-per-node from system by @podkidyshev in #847
- Update CODEOWNERS by @amaslenn in #856
- VLLM: boolean flags and constraints by @podkidyshev in #857
- Allow profiling ranks in string format with comma as separator by @juntaowww in #855
- MBridge: fix vp parameter handling by @podkidyshev in #858
Important changes
Megatron-Bridge
time_limitwas removed from MegatronBridgecmd_args. It is now to be set on scenario level, just like for other workloadsgpus_per_nodewas removed from MegatronBridgecmd_args. The value is now taken from system config just as in other workloads
Full Changelog: v1.6.beta6...v1.6.0b7
v1.6.beta6
What's Changed
- Megatron-Bridge r0.3.0 enhancement by @juntaowww in #830
- Avoid real system calls by @amaslenn in #842
- Do not run CommandShell check during object creation by @amaslenn in #843
- Cleanup NIXL file mounts by @podkidyshev in #840
- Formatting changes by @RulaHallak in #838
- Add NIXL EP workload by @amaslenn in #845
- DSE reporting by @podkidyshev in #846
Full Changelog: v1.6.beta5...v1.6.beta6
v1.6.beta5
What's Changed
- Use uv in ci by @podkidyshev in #835
- Bump tornado from 6.5.4 to 6.5.5 by @dependabot[bot] in #833
- Add SGLang workload by @amaslenn in #834
- Merge common part of vLLM and SGLang by @amaslenn in #836
- NIXL update: filepath and device_list by @podkidyshev in #829
- Agents caching by @podkidyshev in #837
- Add support for x2 nodes serving for vLLM and SGLang by @amaslenn in #839
Full Changelog: v1.6.beta4...v1.6.beta5
v1.6.beta4
What's Changed
- Avoid silent failure when commit hash is invalid by @juntaowww in #820
- Warning on using first sweep by @podkidyshev in #822
- Update CLI args format for NIXL bench by @amaslenn in #823
- Fix commit verification: commit/branch/tag support by @podkidyshev in #824
- Megatron-Bridge updates by @podkidyshev in #821
- pre-commit by @podkidyshev in #827
- Add documentation for Systems by @amaslenn in #826
- Bump werkzeug from 3.1.5 to 3.1.6 by @dependabot[bot] in #828
- Address doc issues by @amaslenn in #831
Full Changelog: v1.6.beta3...v1.6.beta4
v1.6.beta3
What's Changed
- Support DSE metrics for vLLM by @amaslenn in #816
- Agent configs by @podkidyshev in #818
- AI Dynamo updates by @karya0 in #814
Full Changelog: v1.6.beta2...v1.6.beta3
v1.6.beta2
What's Changed
- Add report generation for OSU Benchmark by @allkoow in #807
- Single sbatch + NIXL + ETCD issues by @podkidyshev in #812
- Support separate ETCD container for NIXL workloads by @amaslenn in #813
- Yet another attempt on the right copyright by @podkidyshev in #815
- Refactor NCCL k8s test cases to improve re-use and temp resources management by @amaslenn in #817
Full Changelog: v1.6.beta1...v1.6.beta2
v1.6.beta1
What's Changed
- Bump to v1.6 + upgrade dependencies by @amaslenn in #798
- Upgrade GitHub Actions to latest versions by @salmanmkc in #751
- Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in #750
- Ban "heavy" imports on module level by @amaslenn in #801
- Remove asyncio usage in jobs monitoring by @amaslenn in #796
- Bump pillow from 12.1.0 to 12.1.1 by @dependabot[bot] in #802
- Add report generation strategy for the MegatronRun by @juntaowww in #787
- Fix accedentially reverted version bump by @amaslenn in #805
- Add support for running vLLM by @amaslenn in #799
- Unit-tests per system/workload by @podkidyshev in #808
- Fix
nsyssubfield merging behavior by @juntaowww in #795 - Add support for setting NIXL num threads for vLLM CLI by @amaslenn in #809
- Fix base_tr fixture dependency by @podkidyshev in #810
- Fixes CLOUDAI-15: Updated copyright check by @podkidyshev in #811
New Contributors
- @salmanmkc made their first contribution in #751
- @podkidyshev made their first contribution in #808
Full Changelog: v1.5.0...v1.6.beta1