Skip to content

ci(TRI-1406): prepare server for 26.06 (versions, TRT 11 QA compat, enroot env fix)#8828

Open
mc-nv wants to merge 8 commits into
r26.06from
mchornyi/TRI-1406/prepare-26.06
Open

ci(TRI-1406): prepare server for 26.06 (versions, TRT 11 QA compat, enroot env fix)#8828
mc-nv wants to merge 8 commits into
r26.06from
mchornyi/TRI-1406/prepare-26.06

Conversation

@mc-nv

@mc-nv mc-nv commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

What does the PR do?

Prepare the server repo for the 26.06 release train: bump release versions, add TensorRT 11 compatibility across the QA model generators, revert the ONNX Runtime version, and pass TRITON_GENSRCDIR to enroot containers so QA model generation works under both docker and enroot launchers.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging.
  • All template sections are filled out.

Commit Type:

  • ci
  • fix

Related PRs:

triton-inference-server/onnxruntime_backend#344
dl/dgx/tritonserver!1792
dl/dgx/tritonmodelanalyzer!178

Where should the reviewer start?

  • TRITON_VERSION, build.py — release version bumps.
  • qa/common/gen_qa_*.py, qa/common/gen_common.py, qa/common/test_util.py — TRT 11 compatibility (layer.set_output_type instead of the deprecated ITensor.dtype setter, V2/V3 dispatch, identity/sequence/implicit/format/plugin generators).
  • qa/common/gen_qa_model_repository — pass -e TRITON_GENSRCDIR=$TRITON_MDLS_SRC_DIR to the enroot launches for openvino/onnxruntime/pytorch/tensorrt to mirror the docker path.

Test plan:

Validated by running the GenModels stages in a tritonserver CI pipeline and confirming the previously-failing --torchvision-aoti step succeeds under enroot.

  • CI Pipeline ID:

Caveats:

TRT 11 generators preserve V2 dispatch for backward compatibility; once 26.06 settles, V3 can become the default.

Background

Required by the 26.06 release train — see Build against latest upstream container. The enroot TRITON_GENSRCDIR fix was uncovered by the failure of GitLab job gitlab-master.nvidia.com/dl/dgx/tritonserver/-/jobs/338092379 (GenModels-build--sbsa-gb200-10.0).

Related Issues:

  • Resolves TRI-1406

mc-nv added 7 commits June 9, 2026 17:57
In TensorRT 11.0.0 (shipped in the 26.06 container), the ITensor.dtype
property is read-only. Assigning to it raises
"AttributeError: property of 'ITensor' object has no setter" and breaks
the GenModels-build--sbsa-a100-8.0 job during
create_plan_shape_tensor_modelfile.

Replace the three remaining ITensor.dtype assignments in
gen_qa_identity_models.py with layer.set_output_type(idx, dtype) calls
on the producing layer (identity / resize / shape), matching the pattern
already used in the rest of the QA model-gen scripts
(gen_qa_models.py, gen_qa_implicit_models.py, gen_qa_trt_format_models.py).
The 26.06 TensorRT container ships TRT 11.0.0, which removed several
Python-binding entry points the QA model-gen scripts rely on. This
breaks the GenModels-build job (failing first in
create_plan_shape_tensor_modelfile) with:

  AttributeError: property of 'ITensor' object has no setter

Empirical findings on TRT 11.0.0 (verified inside the
gitlab-master.nvidia.com:5005/dl/dgx/tensorrt:26.06-py3-base image):

  * ITensor.dtype no longer has a setter (read-only).
  * No layer class exposes set_output_type anymore (replacing the
    earlier fix that used it was equally broken).
  * BuilderFlag.PREFER_PRECISION_CONSTRAINTS, .INT8, and .FP16 are
    no longer defined; strongly-typed networks supersede them.
  * add_shape(...) already returns INT64 by default.

Apply minimum-touch compatibility shims in gen_qa_identity_models.py:

  * Wrap each tensor.dtype = X assignment in try/except AttributeError
    so older TRT keeps the explicit override while TRT 11+ falls back to
    the natural dtype propagation (which already matches what was being
    set).
  * Guard PREFER_PRECISION_CONSTRAINTS / INT8 / FP16 BuilderFlag uses
    with hasattr() checks, mirroring the existing pattern already used
    for REJECT_EMPTY_ALGORITHMS.

Verified by running every TensorRT entry point in this file inside the
26.06 TRT image and confirming engine generation now completes for
--tensorrt, --tensorrt-compat, --tensorrt-big, and --tensorrt-shape-io.

The same patterns exist in five other gen_qa_*.py scripts and may need
the same treatment once the pipeline advances past identity models.
…tors

The 26.06 TensorRT container ships TRT 11.0.0 (strongly-typed networks
by default) and removed the implicit-precision APIs the QA model
generators relied on. After unblocking gen_qa_identity_models.py the
GenModels-build job failed next in gen_qa_sequence_models.py and would
have failed in every other gen_qa_*.py exercising TensorRT.

Empirical findings on TRT 11.0.0 (verified inside the
gitlab-master.nvidia.com:5005/dl/dgx/tensorrt:26.06-py3-base image):

  * ITensor.dtype setter removed (read-only).
  * Layer.set_output_type removed from every layer class.
  * ITensor.dynamic_range setter removed.
  * BuilderFlag.PREFER_PRECISION_CONSTRAINTS / INT8 / FP16 / BF16 / FP8
    removed; strongly-typed networks supersede them.
  * add_cast(input, dtype) is the canonical replacement for the old
    add_identity + set_output_type pattern (available since TRT 8.5).
  * add_shape(...) returns INT64 by default.

Apply minimum-touch compatibility shims:

  * Add two helpers to qa/common/gen_common.py:
      - trt_set_dynamic_range(tensor, lo, hi): try/except wrapper for the
        removed dtype.dynamic_range setter.
      - trt_cast_tensor(network, tensor, dtype): uses add_cast on TRT 8.5+
        and falls back to add_identity + set_output_type on older TRT.
  * In each of gen_qa_models.py, gen_qa_sequence_models.py,
    gen_qa_dyna_sequence_models.py, gen_qa_implicit_models.py,
    gen_qa_dyna_sequence_implicit_models.py, gen_qa_identity_models.py,
    gen_qa_trt_format_models.py:
      - Wrap every "tensor.dtype = X" in try/except AttributeError.
      - Guard every PREFER_PRECISION_CONSTRAINTS / INT8 / FP16 BuilderFlag
        use with hasattr() (matching the existing pattern used for
        REJECT_EMPTY_ALGORITHMS).
      - Replace add_identity + set_output_type cast patterns with
        trt_cast_tensor; for non-cast set_output_type calls that the
        elementwise output dtype already satisfies, gate with hasattr.
      - Replace tensor.dynamic_range = (-128, 127) with
        trt_set_dynamic_range(...) so the call is a no-op on TRT 11+.

Add a support_trt_int8_implicit_precision() helper in test_util.py
gated on BuilderFlag.INT8. validate_for_trt_model now drops int8 from
the supported dtype set on TRT 11+; the implicit INT8 path no longer
exists in strongly-typed networks and the QA generators don't emit
explicit QDQ nodes. Plan-INT8 model coverage is therefore deferred on
TRT 11 containers; the corresponding L0_* tests already skip when the
expected model is absent.

Verified against the 26.06 TRT image by running every TensorRT entry
point in every gen_qa_*.py file -- all 14 invocations exercised by
gen_qa_model_repository now reach engine generation successfully.
… TRT 11

The 26.06 GenModels-build--sbsa-dgx_spark-12.1 job failed in
create_plan_modelfile with:

  TypeError: add_plugin_v2(): incompatible function arguments. The
  following argument types are supported:
    1. (self: INetworkDefinition, inputs: ..., plugin: IPluginV2)
       -> IPluginV2Layer
  Invoked with: ...; kwargs: ..., plugin=<IPluginV3 object ...>

Root cause: the V2-vs-V3 plugin dispatch was split between two
independent probes:

  * Plugin creation: get_trt_plugin() picked V3 when
    "registry.plugin_creator_list" was absent (true on TRT 11), so it
    called create_plugin(..., phase=trt.TensorRTPhase.BUILD) and
    returned an IPluginV3.
  * Network add: create_plan_modelfile() picked the V2 path when
    "hasattr(network, 'add_plugin_v2')" was True -- but TRT 11 still
    binds add_plugin_v2 on INetworkDefinition; it just refuses
    IPluginV3 arguments. The probes disagreed and a V3 plugin object
    landed in the V2 add path.

Hoist the V2/V3 decision to a module-level constant
TRT_USES_V3_PLUGINS (computed once from the registry probe at import
time) and use it for both plugin creation and network dispatch so
both halves agree.

Verified on TRT 11.0.0 (26.06 container) that
TRT_USES_V3_PLUGINS evaluates True and both call sites read it.
The enroot launches for openvino/onnxruntime/pytorch/tensorrt were not
forwarding TRITON_GENSRCDIR, causing gen_qa_models.py to fall back to
the relative path 'gen_srcdir' and fail with FileNotFoundError on
resnet50_labels.txt during the torchvision AOTI step. Mirror the docker
path which already passes -e TRITON_GENSRCDIR=$TRITON_MDLS_SRC_DIR.
Comment thread qa/common/gen_common.py Fixed
Comment thread qa/common/gen_qa_dyna_sequence_implicit_models.py Fixed
Comment thread qa/common/gen_qa_dyna_sequence_implicit_models.py Fixed
Comment thread qa/common/gen_qa_dyna_sequence_models.py Fixed
Comment thread qa/common/gen_qa_dyna_sequence_models.py Fixed
Comment thread qa/common/gen_qa_models.py Fixed
Comment thread qa/common/gen_qa_sequence_models.py Fixed
Comment thread qa/common/gen_qa_sequence_models.py Fixed
Comment thread qa/common/gen_qa_sequence_models.py Fixed
Comment thread qa/common/gen_qa_sequence_models.py Fixed
Add inline rationale on each `except AttributeError: pass` introduced
for TRT 11 compatibility (ITensor.dtype setter and dynamic_range setter
removed in strongly-typed networks). Silences 20 CodeQL "Empty except"
findings on PR #8828.
@mc-nv mc-nv marked this pull request as ready for review June 10, 2026 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants