Skip to content

πŸ› [Bug] Torch-TensorRT Slower than Onnx TensorRT in Alpamayo-10BΒ #4239

@cehongwang

Description

@cehongwang

Benchmark Result

Per-clip minADE (m)

# Clip ONNX-TRT Torch-TRT Ξ”
1 0043b781… 0.1719 0.1721 +0.0002
2 00ee8960… 0.9539 0.9582 +0.0043
3 0145f6e0… 0.3925 0.3929 +0.0004
4 01460b45… 0.0876 0.0874 βˆ’0.0002
5 01cf8186… 0.8914 0.7777 βˆ’0.1137
6 021a4585… 0.0976 0.1003 +0.0027
7 0347d9f9… 1.5423 1.5467 +0.0044
8 03aa0b51… 1.3659 1.3488 βˆ’0.0171
9 0455a6e4… 0.2546 0.2537 βˆ’0.0009
10 047c0263… 0.5622 0.5603 βˆ’0.0019
11 049445b3… 0.4437 0.4453 +0.0016
Average 0.6149 0.6039 βˆ’0.0110

Stage timings (mean of clips 2–11, ms; clip 1 excluded as warm-up)

Stage ONNX-TRT Torch-TRT Ξ” Ξ”%
ViT 364.0 365.4 +1.4 +0.4%
LLM Prefill 890.7 875.0 βˆ’15.7 βˆ’1.8%
LLM Generation 146.5 188.7 +42.2 +28.8%
Diffusor 169.9 173.5 +3.6 +2.1%
E2E 1571.9 1602.6 +30.7 +2.0%

Engine sizes

Engine ONNX-TRT (MiB) Torch-TRT (MiB) Ξ”
LLM 14484 14495 +11
Visual 1106 1114 +8
Action 4357 4380 +23

Shared execution context (per-runner workspace, bytes)

Runner ONNX-TRT Torch-TRT Ratio
LLM 2,776,893,952 2,776,893,952 1.0Γ—
Vision 4,074,504,192 4,074,505,216 1.0Γ—
Action 265,029,632 2,025,095,168 ~7.6Γ—

Peak shared exec context is bounded by the vision runner (~4.07 GB) in both cases, so peak GPU memory is unchanged. Only the action runner's reserved workspace balloons under Torch-TRT.

Verdict

  • Accuracy: equivalent on average (Torch-TRT actually 1 cm better on this 11-clip subset). Per-clip differences are sub-cm except for clip 5, where Torch-TRT happens to be 11 cm closer.
  • Throughput: Torch-TRT is ~2% slower end-to-end. The regression is concentrated in LLM decode (+29%); prefill is actually marginally faster.
  • Memory: action-runner workspace ~7.6Γ— larger under Torch-TRT (peak GPU memory unchanged because vision dominates).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions