enhancement: winml perf/eval should fast-fail when given QDQ model targeting QNN GPU EP

## Background

`winml build --device gpu` correctly sets `quant=null` via `_patch_device()` — normal CLI paths never produce a QDQ quantized model for GPU. This issue was discovered during research experiments that manually bypassed that protection.

However, `winml perf` and `winml eval` accept any ONNX as input. A user who:
- brings their own pre-quantized W8A8 QDQ ONNX, OR
- hand-crafts a config.json bypassing the GPU quant guard

...will encounter an infinite hang with no output, no error, and no timeout.

## Root cause

QNN GPU EP cannot handle QDQ INT8 graphs (Conv/Gemm/LayerNorm patterns) and hangs silently in graph compilation rather than returning an error. This is ultimately a QNN SDK / ORT behavior, but `winml perf` can add a defensive check to protect the user experience.

## Proposed enhancement

In `winml perf` and `winml eval`, before session creation, check:

`if is_qdq_model(model_path) and ep == "qnn" and device == "gpu":`
    raise CliError with clear guidance:
    - "QNN GPU EP does not support INT8 QDQ models"
    - "Use FP32 (default) or FP16 once #867 (--enable-fp16-conversion) is available"

The `is_qdq_model()` check can inspect the ONNX graph for `QuantizeLinear` + `DequantizeLinear` node pairs.

## Why enhancement, not bug

`winml build` already has the correct design (`_patch_device()` prevents QDQ configs for GPU). This is a defensive UX improvement for edge cases, not a fix for a broken code path.

## Priority

Low — normal workflow is already protected. Fast-fail would prevent confusion for researchers and power users.

## See also

- #865: `--ep-option` for runtime EP flags  
- #867: `--enable-fp16-conversion` (the correct GPU optimization path)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enhancement: winml perf/eval should fast-fail when given QDQ model targeting QNN GPU EP #868

Background

Root cause

Proposed enhancement

Why enhancement, not bug

Priority

See also

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

enhancement: winml perf/eval should fast-fail when given QDQ model targeting QNN GPU EP #868

Description

Background

Root cause

Proposed enhancement

Why enhancement, not bug

Priority

See also

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions