Skip to content

Add TensorRT Multi-Device (multi-GPU) inference support#121

Open
mc-nv wants to merge 2 commits into
r26.06from
mchornyi/TRI-1406/prepare-26.06
Open

Add TensorRT Multi-Device (multi-GPU) inference support#121
mc-nv wants to merge 2 commits into
r26.06from
mchornyi/TRI-1406/prepare-26.06

Conversation

@mc-nv

@mc-nv mc-nv commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Run a single TensorRT engine sharded across multiple GPUs via TensorRT Multi-Device (NCCL DistCollective + IExecutionContext::setCommunicator, GA in TensorRT 11). Enabled per-model with a KIND_MODEL instance group plus parameters enable_multi_device, multi_device_gpus, and multi_device_per_rank_engines; transparent to clients (same gRPC/HTTP API).

Runtime: ncclCommInitAll, per-rank deserialize + setCommunicator (concurrent), adaptive P2P/host input replication, fan-out enqueueV3, rank-0 output. Supports offline-sharded engines and per-rank weight-shard (tensor-parallel) engines.

docs/multi_device.md documents configuration + validation; docs/ includes engine builders used for testing. Validated on 2x and 8x B200 (NVLink).

pkisfaludi-nv and others added 2 commits June 10, 2026 04:22
Run a single TensorRT engine sharded across multiple GPUs via TensorRT
Multi-Device (NCCL DistCollective + IExecutionContext::setCommunicator, GA in
TensorRT 11). Enabled per-model with a KIND_MODEL instance group plus parameters
enable_multi_device, multi_device_gpus, and multi_device_per_rank_engines;
transparent to clients (same gRPC/HTTP API).

Runtime: ncclCommInitAll, per-rank deserialize + setCommunicator (concurrent),
adaptive P2P/host input replication, fan-out enqueueV3, rank-0 output. Supports
offline-sharded engines and per-rank weight-shard (tensor-parallel) engines.

docs/multi_device.md documents configuration + validation; docs/ includes engine
builders used for testing. Validated on 2x and 8x B200 (NVLink).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Peter Kisfaludi <pkisfaludi@nvidia.com>
@mc-nv mc-nv changed the base branch from main to r26.06 June 10, 2026 04:31
@mc-nv mc-nv self-assigned this Jun 10, 2026
@mc-nv mc-nv requested review from Vinya567, pskiran1 and yinggeh June 10, 2026 22:14
@mc-nv mc-nv marked this pull request as ready for review June 10, 2026 22:36
@mc-nv mc-nv requested a review from whoisj June 10, 2026 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants