Add TensorRT Multi-Device (multi-GPU) inference support by pkisfaludi-nv · Pull Request #120 · triton-inference-server/tensorrt_backend

pkisfaludi-nv · 2026-06-10T02:22:42Z

Summary

Adds TensorRT Multi-Device (MD) support to the tensorrt backend: run a single TensorRT engine sharded across multiple GPUs via NCCL DistCollective + IExecutionContext::setCommunicator (GA in TensorRT 11), transparently to clients (same gRPC/HTTP API).

This mirrors an internal change; submitting upstream as requested by the backend maintainers.

Usage

Enable per model with a KIND_MODEL instance group + parameters:

instance_group [ { kind: KIND_MODEL count: 1 } ]
parameters [
  { key: "enable_multi_device"           value: { string_value: "true" } },
  { key: "multi_device_gpus"             value: { string_value: "0,1" } },
  { key: "multi_device_per_rank_engines" value: { string_value: "true" } }
]

See docs/multi_device.md.

Implementation

ncclCommInitAll, per-rank deserialize + concurrent setCommunicator (sequential deadlocks), adaptive P2P/host input replication, fan-out enqueueV3, rank-0 output.
Supports offline-sharded engines and per-rank weight-shard (tensor-parallel) engines (multi_device_per_rank_engines).
Built behind -DTRITON_ENABLE_TENSORRT_MULTI_DEVICE=ON (TensorRT >= 11 + NCCL); default off, so non-MD builds/models are unchanged.
docs/ includes the engine builders used for testing.

Validation

Validated on 2× and 8× B200 (NVLink): a sharded model across 2 GPUs matches the 1-GPU baseline (rel_max ~3.6e-3); server logs TensorRT Multi-Device ready: N ranks; both GPUs active.

Notes

Requires TensorRT >= 11.
DCO signed-off. Happy to split into smaller commits or adjust naming per maintainer preference.

Run a single TensorRT engine sharded across multiple GPUs via TensorRT Multi-Device (NCCL DistCollective + IExecutionContext::setCommunicator, GA in TensorRT 11). Enabled per-model with a KIND_MODEL instance group plus parameters enable_multi_device, multi_device_gpus, and multi_device_per_rank_engines; transparent to clients (same gRPC/HTTP API). Runtime: ncclCommInitAll, per-rank deserialize + setCommunicator (concurrent), adaptive P2P/host input replication, fan-out enqueueV3, rank-0 output. Supports offline-sharded engines and per-rank weight-shard (tensor-parallel) engines. docs/multi_device.md documents configuration + validation; docs/ includes engine builders used for testing. Validated on 2x and 8x B200 (NVLink). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Peter Kisfaludi <pkisfaludi@nvidia.com>

pkisfaludi-nv · 2026-06-10T02:30:19Z

cc @mc-nv @whoisj — this is the upstream version of the internal MR you reviewed (TRT-28040, TensorRT Multi-Device / multi-GPU support), submitted here at @mc-nv's request. Would appreciate your review when you have a chance. I couldn't add you as formal reviewers from a fork — please assign yourselves (or let me know who should own it). Thanks!

mc-nv · 2026-06-10T04:27:44Z

@pkisfaludi-nv
I've change the target brach to 26.06, default one doesn't support the TensorRT 11.
I been tested your changes today and they went through the build process.
#121

cc: @whoisj

mc-nv changed the base branch from main to r26.06 June 10, 2026 04:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TensorRT Multi-Device (multi-GPU) inference support#120

Add TensorRT Multi-Device (multi-GPU) inference support#120
pkisfaludi-nv wants to merge 1 commit into
triton-inference-server:r26.06from
pkisfaludi-nv:feat/trt-28040-multi-device

pkisfaludi-nv commented Jun 10, 2026

Uh oh!

pkisfaludi-nv commented Jun 10, 2026

Uh oh!

mc-nv commented Jun 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

pkisfaludi-nv commented Jun 10, 2026

Summary

Usage

Implementation

Validation

Notes

Uh oh!

pkisfaludi-nv commented Jun 10, 2026

Uh oh!

mc-nv commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

mc-nv commented Jun 10, 2026 •

edited

Loading