Defect report: Quark BFPQuantizeDequantize abort during ORT session creation (YOLOv8x BFP16)
Title
BFPQuantizeDequantize custom op crashes ONNX Runtime during session creation (shape inference) for YOLOv8x BFP16; isolated to single node (idx=993); workaround: replace with Identity
Summary
Loading a BFP16-quantized YOLOv8x ONNX model in ONNX Runtime with Quark custom ops registered causes a native abort during InferenceSession creation.
I was experimenting with different data and op types in AMD Quark (from Ryzen AI Software 1.7.0 on Ryzen AI HX 370 using Ubuntu 25.10 after building XRT from source and updating to the latest ROCm and ONNX runtimes). I used Yolov8x as my float model, and successfully experimented with INT8 and BF16 before moving on to BFP16 and MX9.
Investigation shows the abort occurs inside the Quark custom ops library (libcustom_ops.so) during shape inference for BFPFixNeuron.
Graph bisection isolated the crash to a single BFPQuantizeDequantize node (node index 993). Replacing only that node with Identity unblocks session creation.
A minimal standalone reproducer containing a BFPQuantizeDequantize node with the same input tensor rank/shape ([1,80,8400]) loads successfully, suggesting the failure is context-dependent (graph metadata / surrounding topology), not purely based on tensor shape.
Environment
- Host:
jc01 (Ryzen AI / AMD platform)
- OS:
Linux-6.17.0-12-generic-x86_64-with-glibc2.42
- Python:
3.12.12 | packaged by conda-forge | (main, Jan 26 2026, 23:51:32) [GCC 14.3.0]
- ONNX Runtime:
1.22.1
- Quark:
amd-quark 0.11 (import path: /home/johnk/miniforge3/envs/quark312/lib/python3.12/site-packages/quark/__init__.py)
- Custom ops library:
quark/onnx/operators/custom_ops/lib/libcustom_ops.so
- Execution Provider:
CPUExecutionProvider
- Custom op registration:
import onnxruntime as ort
from quark.onnx import get_library_path
so = ort.SessionOptions()
so.register_custom_ops_library(get_library_path("CPU"))
Problem details
- Failure mode: process abort during
onnxruntime.InferenceSession(...) creation.
- Observed abort signature: assertion failure from
std::vector::operator[] (native crash).
- Crash location: inside Quark custom ops library
libcustom_ops.so in BFPFixNeuron shape inference.
Evidence (gdb backtrace)
Attach the full backtrace captured on jc01. Key frames indicate a crash during custom-op shape inference inside libcustom_ops.so for BFPFixNeuron.
- gdb backtrace: TODO (paste full text)
Steps to reproduce
- Install Quark + ONNX Runtime (versions above).
- Ensure the Quark custom ops library is available.
- Register custom ops library and create a session:
import onnxruntime as ort
from quark.onnx import get_library_path
model_path = "yolov8x.bfp16.v1.onnx" # failing model
so = ort.SessionOptions()
so.register_custom_ops_library(get_library_path("CPU"))
sess = ort.InferenceSession(model_path, sess_options=so, providers=["CPUExecutionProvider"])
- Observe abort during session creation (native crash).
Bisection / isolation result
The crash persists until isolating a single node:
- Node index:
993
- Op type:
BFPQuantizeDequantize
- Node name:
/model.22/Sigmoid_output_0_DequantizeLinear
- Input producer:
Sigmoid
- Observed input shape at that edge:
[1, 80, 8400]
Workaround
Replace only node idx=993 with Identity (preserving input/output tensor name wiring) and save a patched model.
- Patched artifact:
yolov8x.bfp16.v1.patched-only-node993.onnx
- Result: ORT session creation succeeds reliably.
Session creation confirmation (patched model)
Run:
import onnxruntime as ort
from quark.onnx import get_library_path
model="/home/johnk/experiments/quark-yolov8x-exp/2026-02-09-jc01/models/onnx/bfp16/yolov8x.bfp16.v1.patched-only-node993.onnx"
so = ort.SessionOptions()
so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
so.register_custom_ops_library(get_library_path("CPU"))
print("registered custom ops; optimizations disabled")
sess = ort.InferenceSession(model, sess_options=so, providers=["CPUExecutionProvider"])
print("PATCHED (only node 993) session OK")
Observed output:
registered custom ops; optimizations disabled
PATCHED (only node 993) session OK
Performance sanity check (patched model)
Configuration:
- ORT graph optimizations:
ORT_DISABLE_ALL
- Provider:
CPUExecutionProvider
- Input: random FP32 tensor with shape
(1,3,640,640)
- Warmup:
5
- Timed iterations:
50
Result:
sec_total=640.4786179065704
sec_per_iter=12.809572358131408
out0_shape=(1, 84, 8400)
out0_dtype=float32
Additional observations
- A minimal standalone ONNX model containing only
BFPQuantizeDequantize with input shape [1,80,8400] loads successfully (after forcing ONNX IR version 10 in the minimal repro model).
- Earlier minimal repro also showed rank-0 (scalar) inputs can crash
BFPQuantizeDequantize shape inference (separate issue), but the YOLOv8x crash was isolated to the above non-scalar node.
- Since the standalone reproducer works, this points to a context-dependent shape inference bug in the custom op implementation (graph metadata/value_info/dynamic dims/multiple consumers/etc.).
Related issues (ruled out / adjacent symptoms)
These issues are not the same root cause as this defect, but they are related to earlier symptoms we ruled out while diagnosing Linux + native plugin/custom-op availability.
Expected behavior
- ORT session creation should not abort.
- If an input is unsupported, the custom op should return a recoverable error with a descriptive message rather than aborting.
Actual behavior
- Native abort during session initialization, no Python exception.
Related investigation notes
See:
docs/articles/productize_amd_ai_workflow/quark-experiments/2026-02-09-yolov8x-quark-pipeline-experiment.md
Defect report: Quark BFPQuantizeDequantize abort during ORT session creation (YOLOv8x BFP16)
Title
BFPQuantizeDequantizecustom op crashes ONNX Runtime during session creation (shape inference) for YOLOv8x BFP16; isolated to single node (idx=993); workaround: replace withIdentitySummary
Loading a BFP16-quantized YOLOv8x ONNX model in ONNX Runtime with Quark custom ops registered causes a native abort during
InferenceSessioncreation.I was experimenting with different data and op types in AMD Quark (from Ryzen AI Software 1.7.0 on Ryzen AI HX 370 using Ubuntu 25.10 after building XRT from source and updating to the latest ROCm and ONNX runtimes). I used Yolov8x as my float model, and successfully experimented with INT8 and BF16 before moving on to BFP16 and MX9.
Investigation shows the abort occurs inside the Quark custom ops library (
libcustom_ops.so) during shape inference forBFPFixNeuron.Graph bisection isolated the crash to a single
BFPQuantizeDequantizenode (node index993). Replacing only that node withIdentityunblocks session creation.A minimal standalone reproducer containing a
BFPQuantizeDequantizenode with the same input tensor rank/shape ([1,80,8400]) loads successfully, suggesting the failure is context-dependent (graph metadata / surrounding topology), not purely based on tensor shape.Environment
jc01(Ryzen AI / AMD platform)Linux-6.17.0-12-generic-x86_64-with-glibc2.423.12.12 | packaged by conda-forge | (main, Jan 26 2026, 23:51:32) [GCC 14.3.0]1.22.1amd-quark 0.11(import path:/home/johnk/miniforge3/envs/quark312/lib/python3.12/site-packages/quark/__init__.py)quark/onnx/operators/custom_ops/lib/libcustom_ops.soCPUExecutionProviderProblem details
onnxruntime.InferenceSession(...)creation.std::vector::operator[](native crash).libcustom_ops.soinBFPFixNeuronshape inference.Evidence (gdb backtrace)
Attach the full backtrace captured on
jc01. Key frames indicate a crash during custom-op shape inference insidelibcustom_ops.soforBFPFixNeuron.Steps to reproduce
Bisection / isolation result
The crash persists until isolating a single node:
993BFPQuantizeDequantize/model.22/Sigmoid_output_0_DequantizeLinearSigmoid[1, 80, 8400]Workaround
Replace only node
idx=993withIdentity(preserving input/output tensor name wiring) and save a patched model.yolov8x.bfp16.v1.patched-only-node993.onnxSession creation confirmation (patched model)
Run:
Observed output:
Performance sanity check (patched model)
Configuration:
ORT_DISABLE_ALLCPUExecutionProvider(1,3,640,640)550Result:
sec_total=640.4786179065704sec_per_iter=12.809572358131408out0_shape=(1, 84, 8400)out0_dtype=float32Additional observations
BFPQuantizeDequantizewith input shape[1,80,8400]loads successfully (after forcing ONNX IR version 10 in the minimal repro model).BFPQuantizeDequantizeshape inference (separate issue), but the YOLOv8x crash was isolated to the above non-scalar node.Related issues (ruled out / adjacent symptoms)
These issues are not the same root cause as this defect, but they are related to earlier symptoms we ruled out while diagnosing Linux + native plugin/custom-op availability.
onnx_custom_ops.so)Expected behavior
Actual behavior
Related investigation notes
See:
docs/articles/productize_amd_ai_workflow/quark-experiments/2026-02-09-yolov8x-quark-pipeline-experiment.md