fix: pre-quantized models run optimize, only skip quantize by DingmaomaoBJTU · Pull Request #794 · microsoft/winml-cli

DingmaomaoBJTU · 2026-05-29T12:52:17Z

Summary

Pre-quantized (QDQ) models were incorrectly skipping optimize_onnx. Our custom optimizer is safe for QDQ models — it removes redundant Q/DQ identity pairs without fusing QDQ into QLinear* ops. This PR changes the behavior so pre-quantized models run the full optimize+autoconf+analyze loop and only skip quantize.

Also fixes a bug where winml build -m model.onnx (without -c) would fail with "not a valid JSON file" because generate_build_config wasn't receiving the onnx_path parameter.

Changes

build/onnx.py & build/hf.py:
- Pre-quantized models now run full optimize+autoconf+analyze, only skip quantize.
- Cache is_quantized_onnx() result to a local is_qdq variable to avoid redundant ONNX file reads.
- skip_optimize: restore original semantics — runs optimize_onnx (applies config.optim flags) but skips the autoconf re-optimization loop. Stage labels now accurately reflect actual work done.
- Fix misleading log messages (distinguish QDQ detection vs skip_optimize flag).
commands/build.py: Pass onnx_path to generate_build_config when input is a .onnx file path.
Tests: Updated expectations to match new behavior.

DingmaomaoBJTU

Overall this is a clean, well-motivated fix. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly explained and the change is minimal. Tests are updated correctly. A couple of minor suggestions below.

DingmaomaoBJTU

Overall a clean, well-motivated fix. The root cause is clearly explained and the change is minimal. Tests are updated correctly. A few minor suggestions inline.

DingmaomaoBJTU

Good fix overall. The root cause is well-identified and the solution is minimal and correct: skipping run_optimize_analyze_loop() entirely preserves QDQ structure for downstream EPs. Tests are properly updated. One minor issue noted inline.

DingmaomaoBJTU

Good fix -- the root cause analysis is clear and the change is minimal. The pre-quantized path now correctly preserves the QDQ structure by skipping the ORT optimizer entirely. A few minor suggestions inline.

DingmaomaoBJTU

Clean, well-motivated fix. Skipping optimize_onnx entirely for pre-quantized models is the right call — ORT Level 2 fusing QDQ patterns into QLinearConv is a known foot-gun for QNN/DML EPs. The test updates are consistent. One minor nit below.

DingmaomaoBJTU

Good fix — the root-cause is clear and the change is minimal. Two minor issues noted inline.

DingmaomaoBJTU

Good fix - the root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly identified and the solution is minimal. Two minor nits inline.

DingmaomaoBJTU

Good fix — the root cause is clear and the change is minimal. A couple of minor suggestions below.

DingmaomaoBJTU

Good fix. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly identified and the change is minimal and correct. Tests are well updated. A couple of minor suggestions below.

DingmaomaoBJTU

Good fix overall. The root cause is well-explained and the change is minimal and correct. Left a couple of inline nits.

DingmaomaoBJTU

Reviewed — this looks good. The fix is clean and minimal. One minor observation:

src/winml/modelkit/build/onnx.py: The zeroed-out tuple �nalyze_iters, analyze_unsupported, analyze_details = 0, 0, {} is fine for now, but if �uild_onnx_model ever adds new return fields from the analyze loop, this line silently falls out of sync. Consider adding a brief comment noting this must track the return shape of
un_optimize_analyze_loop.

Tests are correctly updated. The docstring on est_pre_quantized_runs_analyze_only accurately reflects the new behavior.

DingmaomaoBJTU

Clean fix with good test coverage. The logic change is correct - skipping run_optimize_analyze_loop() entirely for pre-quantized models prevents ORT Level 2 QDQ fusion. Two minor issues noted inline.

DingmaomaoBJTU

Clean, well-motivated fix. Skipping optimize entirely for pre-quantized models is the correct approach. Tests properly updated.

DingmaomaoBJTU

Clean, well-scoped fix. The root cause analysis is solid and the test updates properly reflect the new behavior. One minor suggestion inline.

DingmaomaoBJTU

Good fix - skipping optimize_onnx() entirely for pre-quantized models is the correct approach to preserve QDQ structure. A couple of minor nits below.

DingmaomaoBJTU

Good fix overall — the root cause is well explained and the approach is correct. Skipping the ORT optimizer entirely for pre-quantized models makes sense to preserve QDQ structure. A couple of minor nits below.

DingmaomaoBJTU

Good fix. Skipping ORT Level 2 optimization for pre-quantized models avoids QDQ pattern fusion that breaks EP compatibility. Code is clean, tests updated correctly.

DingmaomaoBJTU

Good fix — the root cause is clear and the change is minimal. A couple of minor observations inline.

DingmaomaoBJTU

Good fix. The root cause (ORT L2 optimizer fusing QDQ patterns) is clearly identified and the change is minimal and correct. Tests are properly updated. A couple of minor observations inline.

DingmaomaoBJTU

Good fix - the root cause and verification are clearly documented. The code change is minimal and correct: pre-quantized models now bypass optimize_onnx() entirely, preserving QDQ structure for downstream EPs. Two minor nits below.

DingmaomaoBJTU

Good fix. The root cause (ORT Level 2 optimizer fusing QDQ patterns breaking EP compatibility) is well-identified and the solution is minimal. Tests updated correctly to assert_not_called(). One minor suggestion below.

DingmaomaoBJTU

Clean fix overall. A couple inline suggestions.

DingmaomaoBJTU

Good fix overall - the root cause is clear and the change is minimal. The tests are well-updated to match the new behavior. Left a couple of minor suggestions.

DingmaomaoBJTU

Good fix overall — the root cause and solution are well-motivated. Skipping ORT Level 2 optimization for pre-quantized models is the correct approach to preserve QDQ structure. A couple of minor observations below.

DingmaomaoBJTU

Good fix overall. The root cause is well-identified and the change is minimal and correct. A couple of minor nits below.

DingmaomaoBJTU

Good fix overall — the root cause analysis is solid and the change is minimal. A couple of minor nits below.

DingmaomaoBJTU

Good fix overall — the root cause is clear and the change is minimal. A few observations inline.

DingmaomaoBJTU

Clean, well-motivated fix. The root cause (ORT L2 optimizer fusing QDQ patterns) is clearly documented and the solution is appropriately minimal. A few inline suggestions.

DingmaomaoBJTU

Good fix overall. The root cause is well-identified and the solution is clean — skipping ORT Level 2 optimization entirely for pre-quantized models prevents QDQ fusion that breaks EP compatibility. The tests correctly reflect the new behavior. One minor nit: see inline comment about a stale log message.

DingmaomaoBJTU

Clean fix. Skipping the optimizer entirely for pre-quantized models is the right approach to preserve QDQ node structure. Tests are updated consistently.

DingmaomaoBJTU

Good fix — the root cause is clear and the change is minimal. A couple of minor observations inline.

DingmaomaoBJTU

Overall a clean, well-motivated fix. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly documented in the comment. Tests are updated consistently. One minor observation below.

DingmaomaoBJTU

Clean fix. Skipping run_optimize_analyze_loop entirely for pre-quantized models is the correct approach - preserves QDQ structure for QNN/DML EPs. Tests updated consistently.

DingmaomaoBJTU

Good fix — skipping optimize_onnx() entirely for pre-quantized models is the right call to preserve QDQ structure. A few minor inline comments below.

DingmaomaoBJTU

Good fix. The root cause analysis is solid and the change is minimal and correct. Skipping optimize_onnx() entirely for QDQ models prevents ORT Level 2 fusion from breaking EP compatibility. Tests are well-aligned with the new behavior. One minor nit below about a stale log message.

DingmaomaoBJTU

Good fix overall - the root cause is well-explained and the test updates are consistent. A couple of minor suggestions below.

DingmaomaoBJTU

Good fix - the root cause analysis is clear and the change is minimal. A couple of minor observations below.

DingmaomaoBJTU

Good fix - clear root cause, clean implementation. The QDQ fusion was silently regressing latency, and skipping optimize entirely is the right call. A couple of inline notes below.

DingmaomaoBJTU

Clean, surgical fix with a well-documented root cause. The change is straightforward and the tests are properly updated. One consideration below.

DingmaomaoBJTU

Overall the fix is correct and well-motivated. The PR description clearly explains the root cause and includes verification. A couple of minor suggestions below.

DingmaomaoBJTU

Overall this is a clean, well-motivated fix. The root cause is clearly identified and the solution is minimal and correct. A couple of minor observations below.

DingmaomaoBJTU

LGTM - clean, well-scoped fix. One minor observation inline.

DingmaomaoBJTU

Good fix. The root cause is well-identified (ORT Level 2 optimizer fusing QDQ patterns breaks QNN/DML EP compatibility). A few minor observations below.

DingmaomaoBJTU

Good fix overall. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly identified and the solution is minimal and correct. The test updates properly reflect the new behavior. A few minor observations inline.

DingmaomaoBJTU

Overall LGTM - clean, well-scoped fix. One minor suggestion below.

DingmaomaoBJTU

Good fix overall. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is well-explained and the change is minimal and correct. Left a couple of inline suggestions below.

DingmaomaoBJTU

Good fix with clear root cause analysis. The change correctly prevents ORT Level 2 optimization from fusing QDQ patterns. One minor nit on a stale log message.

DingmaomaoBJTU

Review Summary: Clean, well-scoped fix. The root cause (ORT L2 optimizer fusing QDQ patterns) is clearly explained, the fix is minimal and correct, and test assertions are properly updated from �ssert_called_once() to �ssert_not_called(). The zero-initialization of �nalyze_iters, analyze_unsupported, analyze_details is appropriate since no analysis runs. current_path retains its prior value which is correct for pass-through. LGTM — no issues found.

DingmaomaoBJTU

Good fix! The root cause is well-identified and the approach is correct - completely skipping ORT optimization preserves QDQ patterns. A couple of minor nits below.

DingmaomaoBJTU

Overall this is a clean fix. The root cause is clearly identified and the solution is correct - pre-quantized models should bypass ORT Level 2 optimization entirely to preserve QDQ graph structure. Tests are properly updated. One minor nit on the log message.

DingmaomaoBJTU

Clean, well-motivated fix. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly identified and the solution — skipping the optimizer entirely rather than running it with special flags — is the correct approach for preserving QDQ structure that downstream EPs require. Tests properly updated. One minor observation: the \ est_skip_optimize_kwarg\ test at line 428 also changes to \�ssert_not_called()\ — confirming that the \skip_optimize\ kwarg path now also fully skips (consistent behavior). LGTM.

xieofxie

Nice catch on the QDQ-skips-optimize issue and the -m model.onnx build-config bug. Two concerns worth addressing before merge.

1. skip_optimize=True no longer actually skips optimization. run_optimize_analyze_loop always calls optimize_onnx internally (see build/common.py:83), so the new control flow appends "optimize" to stages_skipped but still runs the full optimize+autoconf+analyze path. The log messages and stages_skipped then disagree with reality, and test_skip_optimize_kwarg (both test_hf.py and test_onnx.py, lines around 860 / 413) literally encodes the contradiction:

assert "optimize" in result.stages_skipped     # claims skip
mock_pipeline["optimize"].assert_called_once()  # but it ran

Those two assertions can't both be true in a sane reading of "skip_optimize". They were untouched by this PR, but the underlying behavior changed around them — please reconcile (assert_not_called() if the intent is really skip, or rename the flag if the intent is "label as skipped while still running").

2. Silent semantic regression for direct skip_optimize=True callers. Before this PR, skip_optimize=True passed no max_optim_iterations, so the autoconf re-optimization loop didn't run (max_optim_iterations=0 default). After this PR, max_optim_iterations=hack_max_optim_iterations is passed unconditionally, so skip_optimize=True now triggers a full autoconf loop where it used to skip it. Not mentioned in the PR description.

Individual suggestions on the lines below. The pre-quantized fix and the commands/build.py fix both look correct in intent — just want the skip_optimize semantics tightened up so labels match work, and the is_quantized_onnx calls de-duplicated.

- Pre-quantized (QDQ) models now run full optimize+autoconf+analyze, only skip quantize to preserve QDQ structure. - Cache is_quantized_onnx() result to avoid redundant ONNX file reads. - skip_optimize: restore original semantics (run optimize_onnx without autoconf loop, label as skipped). Labels now match actual work done. - Fix generate_build_config onnx_path bug for 'winml build -m model.onnx'.

DingmaomaoBJTU requested a review from a team as a code owner May 29, 2026 12:52

DingmaomaoBJTU commented May 29, 2026

View reviewed changes

DingmaomaoBJTU commented May 30, 2026

View reviewed changes

DingmaomaoBJTU commented May 31, 2026

View reviewed changes

DingmaomaoBJTU commented Jun 1, 2026

View reviewed changes

Comment thread src/winml/modelkit/build/onnx.py

Comment thread src/winml/modelkit/build/onnx.py Outdated

DingmaomaoBJTU commented Jun 1, 2026

View reviewed changes

Comment thread src/winml/modelkit/build/onnx.py Outdated

DingmaomaoBJTU commented Jun 1, 2026

View reviewed changes

DingmaomaoBJTU force-pushed the dingmaomaobjtu/fix-prequantized-optimize-skip branch from f83aa5c to cebbf4d Compare June 8, 2026 05:52

DingmaomaoBJTU changed the title ~~fix: skip optimize_onnx for pre-quantized models~~ fix: pre-quantized models run optimize, only skip quantize Jun 8, 2026

DingmaomaoBJTU force-pushed the dingmaomaobjtu/fix-prequantized-optimize-skip branch 2 times, most recently from 1c2dee5 to 51e4841 Compare June 8, 2026 06:31

xieofxie reviewed Jun 8, 2026

View reviewed changes

DingmaomaoBJTU force-pushed the dingmaomaobjtu/fix-prequantized-optimize-skip branch from 51e4841 to bdee3ff Compare June 8, 2026 07:51

DingmaomaoBJTU force-pushed the dingmaomaobjtu/fix-prequantized-optimize-skip branch from bdee3ff to 71eacb7 Compare June 8, 2026 07:51

Conversation

DingmaomaoBJTU commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

DingmaomaoBJTU commented May 29, 2026 •

edited

Loading