🐛 [Bug] [Alpamayo 1.5] [H200] CUDA OutOfMemory error when compiling

##  Bug Description
clone the repo: `https://github.com/cehongwang/alpamayo1.5`
Running command on H200: 
```
PYTHONPATH=src python src/alpamayo1_5/eval.py --compile_trt
```
Error message:
```
2026-05-01 00:04:05,398 - torch_tensorrt [TensorRT Conversion Context] - ERROR - [defaultAllocator.cpp::allocate::59] Error Code 1: Cuda Runtime (In allocate at /_src/common/dispatch/defaultAllocator.cpp:59)
2026-05-01 00:04:05,398 - torch_tensorrt [TensorRT Conversion Context] - WARNING - Requested amount of GPU memory (149635638784 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
2026-05-01 00:04:05,399 - torch_tensorrt [TensorRT Conversion Context] - ERROR - [executionContext.cpp::initializeExecutionContext::644] Error Code 2: OutOfMemory (Requested size was 149635638784 bytes.)
Traceback (most recent call last):
  File "/home/scratch.zewenl_sw/docker_workspace/cehongwang/alpamayo1.5/src/alpamayo1_5/eval.py", line 331, in <module>
    main()
  File "/home/scratch.zewenl_sw/docker_workspace/cehongwang/alpamayo1.5/src/alpamayo1_5/eval.py", line 224, in main
    trt_vision, trt_lm, trt_diffusion, prefix_seq_len = compile_trt_modules(
                                                        ^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.zewenl_sw/docker_workspace/cehongwang/alpamayo1.5/src/alpamayo1_5/trt/compile_trt.py", line 170, in compile_trt_modules
    trt_lm = compile_language_trt(
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.zewenl_sw/docker_workspace/cehongwang/alpamayo1.5/src/alpamayo1_5/trt/compile_trt.py", line 87, in compile_language_trt
    compiled_model = compile_vlm_lm_trt_with_cache(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.zewenl_sw/docker_workspace/cehongwang/alpamayo1.5/src/alpamayo1_5/trt/lm_with_cache.py", line 768, in compile_vlm_lm_trt_with_cache
    trt_prefill = torch_tensorrt.dynamo.compile(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/scratch/docker_workspace/cehongwang/alpamayo1.5/a1_5_venv_h200/lib/python3.12/site-packages/torch_tensorrt/dynamo/_compiler.py", line 804, in compile
    trt_gm = compile_module(
             ^^^^^^^^^^^^^^^
  File "/mnt/scratch/docker_workspace/cehongwang/alpamayo1.5/a1_5_venv_h200/lib/python3.12/site-packages/torch_tensorrt/dynamo/_compiler.py", line 1050, in compile_module
    trt_module = convert_module(
                 ^^^^^^^^^^^^^^^
  File "/mnt/scratch/docker_workspace/cehongwang/alpamayo1.5/a1_5_venv_h200/lib/python3.12/site-packages/torch_tensorrt/dynamo/conversion/_conversion.py", line 361, in convert_module
    return rt_cls(
           ^^^^^^^
  File "/mnt/scratch/docker_workspace/cehongwang/alpamayo1.5/a1_5_venv_h200/lib/python3.12/site-packages/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py", line 230, in __init__
    self.setup_engine()
  File "/mnt/scratch/docker_workspace/cehongwang/alpamayo1.5/a1_5_venv_h200/lib/python3.12/site-packages/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py", line 294, in setup_engine
    assert self.context is not None, "Failed to create execution context"
           ^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Failed to create execution context
```

## Environment

GPU: H200

tensorrt                 10.16.1.11
tensorrt_cu13            10.16.1.11
tensorrt_cu13_bindings   10.16.1.11
tensorrt_cu13_libs       10.16.1.11
torch                    2.11.0
torch_tensorrt           2.11.0
torchvision              0.26.0
transformers             4.57.1


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 [Bug] [Alpamayo 1.5] [H200] CUDA OutOfMemory error when compiling #4227

Bug Description

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

🐛 [Bug] [Alpamayo 1.5] [H200] CUDA OutOfMemory error when compiling #4227

Description

Bug Description

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions