Skip to content

enhancement: winml perf/eval should fast-fail when given QDQ model targeting QNN GPU EP #868

@DingmaomaoBJTU

Description

@DingmaomaoBJTU

Background

winml build --device gpu correctly sets quant=null via _patch_device() — normal CLI paths never produce a QDQ quantized model for GPU. This issue was discovered during research experiments that manually bypassed that protection.

However, winml perf and winml eval accept any ONNX as input. A user who:

  • brings their own pre-quantized W8A8 QDQ ONNX, OR
  • hand-crafts a config.json bypassing the GPU quant guard

...will encounter an infinite hang with no output, no error, and no timeout.

Root cause

QNN GPU EP cannot handle QDQ INT8 graphs (Conv/Gemm/LayerNorm patterns) and hangs silently in graph compilation rather than returning an error. This is ultimately a QNN SDK / ORT behavior, but winml perf can add a defensive check to protect the user experience.

Proposed enhancement

In winml perf and winml eval, before session creation, check:

if is_qdq_model(model_path) and ep == "qnn" and device == "gpu":
raise CliError with clear guidance:
- "QNN GPU EP does not support INT8 QDQ models"
- "Use FP32 (default) or FP16 once #867 (--enable-fp16-conversion) is available"

The is_qdq_model() check can inspect the ONNX graph for QuantizeLinear + DequantizeLinear node pairs.

Why enhancement, not bug

winml build already has the correct design (_patch_device() prevents QDQ configs for GPU). This is a defensive UX improvement for edge cases, not a fix for a broken code path.

Priority

Low — normal workflow is already protected. Fast-fail would prevent confusion for researchers and power users.

See also

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — minor bug or non-critical improvementfeature scaleFeature scale work itemtriagedIssue has been triaged

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions