🐛 [Bug] Torch-TRT does not translate softmax quantizer generated by modelopt fp8 mha quantization

##  Bug Description

The modelopt inserts QDQ pairs for BMM1/softmax/BMM2 to follow the TRT's fp8 MHA pattern however it seems Torch-trt is omitting the softmax quantizer and it caused no fused mha picked up by TRT

## To Reproduce

Steps to reproduce the behavior:

1. Launch the nvidia pytorch container: `nvcr.io/nvidia/pytorch:26.03-py3`
2. Install transformers: `pip install transformers`
3. Install modelopt nightly: `pip install --upgrade "git+https://github.com/NVIDIA/Model-Optimizer.git@main"`
4. Run the attached scripts

[vit_fp8_mha_qdq_inspect.log](https://github.com/user-attachments/files/26944094/vit_fp8_mha_qdq_inspect.log)
[vit_fp8_mha_qdq_inspect.py](https://github.com/user-attachments/files/26944093/vit_fp8_mha_qdq_inspect.py)

## Expected behavior

The softmax quantizers get converted to TRT qdq along with other bmm quantizers

## Environment

See steps to reproduce

## Additional context

@narendasan filed per discussion


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 [Bug] Torch-TRT does not translate softmax quantizer generated by modelopt fp8 mha quantization #4200

Bug Description

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

🐛 [Bug] Torch-TRT does not translate softmax quantizer generated by modelopt fp8 mha quantization #4200

Description

Bug Description

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions