Skip to content

🐛 [Bug] [Alpamayo 1.5] [H200] CUDA OutOfMemory error when compiling #4227

@zewenli98

Description

@zewenli98

Bug Description

clone the repo: https://github.com/cehongwang/alpamayo1.5
Running command on H200:

PYTHONPATH=src python src/alpamayo1_5/eval.py --compile_trt

Error message:

2026-05-01 00:04:05,398 - torch_tensorrt [TensorRT Conversion Context] - ERROR - [defaultAllocator.cpp::allocate::59] Error Code 1: Cuda Runtime (In allocate at /_src/common/dispatch/defaultAllocator.cpp:59)
2026-05-01 00:04:05,398 - torch_tensorrt [TensorRT Conversion Context] - WARNING - Requested amount of GPU memory (149635638784 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
2026-05-01 00:04:05,399 - torch_tensorrt [TensorRT Conversion Context] - ERROR - [executionContext.cpp::initializeExecutionContext::644] Error Code 2: OutOfMemory (Requested size was 149635638784 bytes.)
Traceback (most recent call last):
  File "/home/scratch.zewenl_sw/docker_workspace/cehongwang/alpamayo1.5/src/alpamayo1_5/eval.py", line 331, in <module>
    main()
  File "/home/scratch.zewenl_sw/docker_workspace/cehongwang/alpamayo1.5/src/alpamayo1_5/eval.py", line 224, in main
    trt_vision, trt_lm, trt_diffusion, prefix_seq_len = compile_trt_modules(
                                                        ^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.zewenl_sw/docker_workspace/cehongwang/alpamayo1.5/src/alpamayo1_5/trt/compile_trt.py", line 170, in compile_trt_modules
    trt_lm = compile_language_trt(
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.zewenl_sw/docker_workspace/cehongwang/alpamayo1.5/src/alpamayo1_5/trt/compile_trt.py", line 87, in compile_language_trt
    compiled_model = compile_vlm_lm_trt_with_cache(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/scratch.zewenl_sw/docker_workspace/cehongwang/alpamayo1.5/src/alpamayo1_5/trt/lm_with_cache.py", line 768, in compile_vlm_lm_trt_with_cache
    trt_prefill = torch_tensorrt.dynamo.compile(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/scratch/docker_workspace/cehongwang/alpamayo1.5/a1_5_venv_h200/lib/python3.12/site-packages/torch_tensorrt/dynamo/_compiler.py", line 804, in compile
    trt_gm = compile_module(
             ^^^^^^^^^^^^^^^
  File "/mnt/scratch/docker_workspace/cehongwang/alpamayo1.5/a1_5_venv_h200/lib/python3.12/site-packages/torch_tensorrt/dynamo/_compiler.py", line 1050, in compile_module
    trt_module = convert_module(
                 ^^^^^^^^^^^^^^^
  File "/mnt/scratch/docker_workspace/cehongwang/alpamayo1.5/a1_5_venv_h200/lib/python3.12/site-packages/torch_tensorrt/dynamo/conversion/_conversion.py", line 361, in convert_module
    return rt_cls(
           ^^^^^^^
  File "/mnt/scratch/docker_workspace/cehongwang/alpamayo1.5/a1_5_venv_h200/lib/python3.12/site-packages/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py", line 230, in __init__
    self.setup_engine()
  File "/mnt/scratch/docker_workspace/cehongwang/alpamayo1.5/a1_5_venv_h200/lib/python3.12/site-packages/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py", line 294, in setup_engine
    assert self.context is not None, "Failed to create execution context"
           ^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Failed to create execution context

Environment

GPU: H200

tensorrt 10.16.1.11
tensorrt_cu13 10.16.1.11
tensorrt_cu13_bindings 10.16.1.11
tensorrt_cu13_libs 10.16.1.11
torch 2.11.0
torch_tensorrt 2.11.0
torchvision 0.26.0
transformers 4.57.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions