ci(TRI-1406): prepare server for 26.06 (versions, TRT 11 QA compat, enroot env fix) by mc-nv · Pull Request #8828 · triton-inference-server/server

mc-nv · 2026-06-10T19:00:22Z

What does the PR do?

Prepare the server repo for the 26.06 release train: bump release versions, add TensorRT 11 compatibility across the QA model generators, revert the ONNX Runtime version, and pass TRITON_GENSRCDIR to enroot containers so QA model generation works under both docker and enroot launchers.

Checklist

PR title reflects the change and is of format <commit_type>: <Title>
Changes are described in the pull request.
Related issues are referenced.
Populated github labels field
Verified that the PR passes existing CI.
Verified copyright is correct on all changed files.
Added succinct git squash message before merging.
All template sections are filled out.

Commit Type:

ci
fix

Related PRs:

triton-inference-server/onnxruntime_backend#344
dl/dgx/tritonserver!1792
dl/dgx/tritonmodelanalyzer!178

Where should the reviewer start?

TRITON_VERSION, build.py — release version bumps.
qa/common/gen_qa_*.py, qa/common/gen_common.py, qa/common/test_util.py — TRT 11 compatibility (layer.set_output_type instead of the deprecated ITensor.dtype setter, V2/V3 dispatch, identity/sequence/implicit/format/plugin generators).
qa/common/gen_qa_model_repository — pass -e TRITON_GENSRCDIR=$TRITON_MDLS_SRC_DIR to the enroot launches for openvino/onnxruntime/pytorch/tensorrt to mirror the docker path.

Test plan:

Validated by running the GenModels stages in a tritonserver CI pipeline and confirming the previously-failing --torchvision-aoti step succeeds under enroot.

CI Pipeline ID:

Caveats:

TRT 11 generators preserve V2 dispatch for backward compatibility; once 26.06 settles, V3 can become the default.

Background

Required by the 26.06 release train — see Build against latest upstream container. The enroot TRITON_GENSRCDIR fix was uncovered by the failure of GitLab job gitlab-master.nvidia.com/dl/dgx/tritonserver/-/jobs/338092379 (GenModels-build--sbsa-gb200-10.0).

Related Issues:

Resolves TRI-1406

In TensorRT 11.0.0 (shipped in the 26.06 container), the ITensor.dtype property is read-only. Assigning to it raises "AttributeError: property of 'ITensor' object has no setter" and breaks the GenModels-build--sbsa-a100-8.0 job during create_plan_shape_tensor_modelfile. Replace the three remaining ITensor.dtype assignments in gen_qa_identity_models.py with layer.set_output_type(idx, dtype) calls on the producing layer (identity / resize / shape), matching the pattern already used in the rest of the QA model-gen scripts (gen_qa_models.py, gen_qa_implicit_models.py, gen_qa_trt_format_models.py).

The 26.06 TensorRT container ships TRT 11.0.0, which removed several Python-binding entry points the QA model-gen scripts rely on. This breaks the GenModels-build job (failing first in create_plan_shape_tensor_modelfile) with: AttributeError: property of 'ITensor' object has no setter Empirical findings on TRT 11.0.0 (verified inside the gitlab-master.nvidia.com:5005/dl/dgx/tensorrt:26.06-py3-base image): * ITensor.dtype no longer has a setter (read-only). * No layer class exposes set_output_type anymore (replacing the earlier fix that used it was equally broken). * BuilderFlag.PREFER_PRECISION_CONSTRAINTS, .INT8, and .FP16 are no longer defined; strongly-typed networks supersede them. * add_shape(...) already returns INT64 by default. Apply minimum-touch compatibility shims in gen_qa_identity_models.py: * Wrap each tensor.dtype = X assignment in try/except AttributeError so older TRT keeps the explicit override while TRT 11+ falls back to the natural dtype propagation (which already matches what was being set). * Guard PREFER_PRECISION_CONSTRAINTS / INT8 / FP16 BuilderFlag uses with hasattr() checks, mirroring the existing pattern already used for REJECT_EMPTY_ALGORITHMS. Verified by running every TensorRT entry point in this file inside the 26.06 TRT image and confirming engine generation now completes for --tensorrt, --tensorrt-compat, --tensorrt-big, and --tensorrt-shape-io. The same patterns exist in five other gen_qa_*.py scripts and may need the same treatment once the pipeline advances past identity models.

…tors The 26.06 TensorRT container ships TRT 11.0.0 (strongly-typed networks by default) and removed the implicit-precision APIs the QA model generators relied on. After unblocking gen_qa_identity_models.py the GenModels-build job failed next in gen_qa_sequence_models.py and would have failed in every other gen_qa_*.py exercising TensorRT. Empirical findings on TRT 11.0.0 (verified inside the gitlab-master.nvidia.com:5005/dl/dgx/tensorrt:26.06-py3-base image): * ITensor.dtype setter removed (read-only). * Layer.set_output_type removed from every layer class. * ITensor.dynamic_range setter removed. * BuilderFlag.PREFER_PRECISION_CONSTRAINTS / INT8 / FP16 / BF16 / FP8 removed; strongly-typed networks supersede them. * add_cast(input, dtype) is the canonical replacement for the old add_identity + set_output_type pattern (available since TRT 8.5). * add_shape(...) returns INT64 by default. Apply minimum-touch compatibility shims: * Add two helpers to qa/common/gen_common.py: - trt_set_dynamic_range(tensor, lo, hi): try/except wrapper for the removed dtype.dynamic_range setter. - trt_cast_tensor(network, tensor, dtype): uses add_cast on TRT 8.5+ and falls back to add_identity + set_output_type on older TRT. * In each of gen_qa_models.py, gen_qa_sequence_models.py, gen_qa_dyna_sequence_models.py, gen_qa_implicit_models.py, gen_qa_dyna_sequence_implicit_models.py, gen_qa_identity_models.py, gen_qa_trt_format_models.py: - Wrap every "tensor.dtype = X" in try/except AttributeError. - Guard every PREFER_PRECISION_CONSTRAINTS / INT8 / FP16 BuilderFlag use with hasattr() (matching the existing pattern used for REJECT_EMPTY_ALGORITHMS). - Replace add_identity + set_output_type cast patterns with trt_cast_tensor; for non-cast set_output_type calls that the elementwise output dtype already satisfies, gate with hasattr. - Replace tensor.dynamic_range = (-128, 127) with trt_set_dynamic_range(...) so the call is a no-op on TRT 11+. Add a support_trt_int8_implicit_precision() helper in test_util.py gated on BuilderFlag.INT8. validate_for_trt_model now drops int8 from the supported dtype set on TRT 11+; the implicit INT8 path no longer exists in strongly-typed networks and the QA generators don't emit explicit QDQ nodes. Plan-INT8 model coverage is therefore deferred on TRT 11 containers; the corresponding L0_* tests already skip when the expected model is absent. Verified against the 26.06 TRT image by running every TensorRT entry point in every gen_qa_*.py file -- all 14 invocations exercised by gen_qa_model_repository now reach engine generation successfully.

… TRT 11 The 26.06 GenModels-build--sbsa-dgx_spark-12.1 job failed in create_plan_modelfile with: TypeError: add_plugin_v2(): incompatible function arguments. The following argument types are supported: 1. (self: INetworkDefinition, inputs: ..., plugin: IPluginV2) -> IPluginV2Layer Invoked with: ...; kwargs: ..., plugin=<IPluginV3 object ...> Root cause: the V2-vs-V3 plugin dispatch was split between two independent probes: * Plugin creation: get_trt_plugin() picked V3 when "registry.plugin_creator_list" was absent (true on TRT 11), so it called create_plugin(..., phase=trt.TensorRTPhase.BUILD) and returned an IPluginV3. * Network add: create_plan_modelfile() picked the V2 path when "hasattr(network, 'add_plugin_v2')" was True -- but TRT 11 still binds add_plugin_v2 on INetworkDefinition; it just refuses IPluginV3 arguments. The probes disagreed and a V3 plugin object landed in the V2 add path. Hoist the V2/V3 decision to a module-level constant TRT_USES_V3_PLUGINS (computed once from the registry probe at import time) and use it for both plugin creation and network dispatch so both halves agree. Verified on TRT 11.0.0 (26.06 container) that TRT_USES_V3_PLUGINS evaluates True and both call sites read it.

The enroot launches for openvino/onnxruntime/pytorch/tensorrt were not forwarding TRITON_GENSRCDIR, causing gen_qa_models.py to fall back to the relative path 'gen_srcdir' and fail with FileNotFoundError on resnet50_labels.txt during the torchvision AOTI step. Mirror the docker path which already passes -e TRITON_GENSRCDIR=$TRITON_MDLS_SRC_DIR.

Add inline rationale on each `except AttributeError: pass` introduced for TRT 11 compatibility (ITensor.dtype setter and dynamic_range setter removed in strongly-typed networks). Silences 20 CodeQL "Empty except" findings on PR #8828.

mc-nv added 7 commits June 9, 2026 17:57

Update release versions

7a8dab2

fix: Revert the ONNX Runtime version.

14b0f65

mc-nv mentioned this pull request Jun 10, 2026

ci(TRI-1406): prepare ONNX Runtime backend for 26.06 (TRT 11, OV 2026.2, CCCL fix) triton-inference-server/onnxruntime_backend#344

Open

9 tasks

mc-nv self-assigned this Jun 10, 2026

github-advanced-security AI found potential problems Jun 10, 2026

View reviewed changes

mc-nv marked this pull request as ready for review June 10, 2026 20:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(TRI-1406): prepare server for 26.06 (versions, TRT 11 QA compat, enroot env fix)#8828

ci(TRI-1406): prepare server for 26.06 (versions, TRT 11 QA compat, enroot env fix)#8828
mc-nv wants to merge 8 commits into
r26.06from
mchornyi/TRI-1406/prepare-26.06

mc-nv commented Jun 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

mc-nv commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

mc-nv commented Jun 10, 2026 •

edited

Loading