fix: pre-quantized models run optimize, only skip quantize#794
fix: pre-quantized models run optimize, only skip quantize#794DingmaomaoBJTU wants to merge 1 commit into
Conversation
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Overall this is a clean, well-motivated fix. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly explained and the change is minimal. Tests are updated correctly. A couple of minor suggestions below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Overall a clean, well-motivated fix. The root cause is clearly explained and the change is minimal. Tests are updated correctly. A few minor suggestions inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall. The root cause is well-identified and the solution is minimal and correct: skipping run_optimize_analyze_loop() entirely preserves QDQ structure for downstream EPs. Tests are properly updated. One minor issue noted inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix -- the root cause analysis is clear and the change is minimal. The pre-quantized path now correctly preserves the QDQ structure by skipping the ORT optimizer entirely. A few minor suggestions inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Clean, well-motivated fix. Skipping optimize_onnx entirely for pre-quantized models is the right call — ORT Level 2 fusing QDQ patterns into QLinearConv is a known foot-gun for QNN/DML EPs. The test updates are consistent. One minor nit below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix — the root-cause is clear and the change is minimal. Two minor issues noted inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix - the root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly identified and the solution is minimal. Two minor nits inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix — the root cause is clear and the change is minimal. A couple of minor suggestions below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly identified and the change is minimal and correct. Tests are well updated. A couple of minor suggestions below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall. The root cause is well-explained and the change is minimal and correct. Left a couple of inline nits.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Reviewed — this looks good. The fix is clean and minimal. One minor observation:
src/winml/modelkit/build/onnx.py: The zeroed-out tuple �nalyze_iters, analyze_unsupported, analyze_details = 0, 0, {} is fine for now, but if �uild_onnx_model ever adds new return fields from the analyze loop, this line silently falls out of sync. Consider adding a brief comment noting this must track the return shape of
un_optimize_analyze_loop.
Tests are correctly updated. The docstring on est_pre_quantized_runs_analyze_only accurately reflects the new behavior.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Clean fix with good test coverage. The logic change is correct - skipping run_optimize_analyze_loop() entirely for pre-quantized models prevents ORT Level 2 QDQ fusion. Two minor issues noted inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Clean, well-motivated fix. Skipping optimize entirely for pre-quantized models is the correct approach. Tests properly updated.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Clean, well-scoped fix. The root cause analysis is solid and the test updates properly reflect the new behavior. One minor suggestion inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix - skipping optimize_onnx() entirely for pre-quantized models is the correct approach to preserve QDQ structure. A couple of minor nits below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall — the root cause is well explained and the approach is correct. Skipping the ORT optimizer entirely for pre-quantized models makes sense to preserve QDQ structure. A couple of minor nits below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix. Skipping ORT Level 2 optimization for pre-quantized models avoids QDQ pattern fusion that breaks EP compatibility. Code is clean, tests updated correctly.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix — the root cause is clear and the change is minimal. A couple of minor observations inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix. The root cause (ORT L2 optimizer fusing QDQ patterns) is clearly identified and the change is minimal and correct. Tests are properly updated. A couple of minor observations inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix - the root cause and verification are clearly documented. The code change is minimal and correct: pre-quantized models now bypass optimize_onnx() entirely, preserving QDQ structure for downstream EPs. Two minor nits below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix. The root cause (ORT Level 2 optimizer fusing QDQ patterns breaking EP compatibility) is well-identified and the solution is minimal. Tests updated correctly to assert_not_called(). One minor suggestion below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Clean fix overall. A couple inline suggestions.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall - the root cause is clear and the change is minimal. The tests are well-updated to match the new behavior. Left a couple of minor suggestions.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall — the root cause and solution are well-motivated. Skipping ORT Level 2 optimization for pre-quantized models is the correct approach to preserve QDQ structure. A couple of minor observations below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall. The root cause is well-identified and the change is minimal and correct. A couple of minor nits below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall — the root cause analysis is solid and the change is minimal. A couple of minor nits below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall — the root cause is clear and the change is minimal. A few observations inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Clean, well-motivated fix. The root cause (ORT L2 optimizer fusing QDQ patterns) is clearly documented and the solution is appropriately minimal. A few inline suggestions.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall. The root cause is well-identified and the solution is clean — skipping ORT Level 2 optimization entirely for pre-quantized models prevents QDQ fusion that breaks EP compatibility. The tests correctly reflect the new behavior. One minor nit: see inline comment about a stale log message.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Clean fix. Skipping the optimizer entirely for pre-quantized models is the right approach to preserve QDQ node structure. Tests are updated consistently.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix — the root cause is clear and the change is minimal. A couple of minor observations inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Overall a clean, well-motivated fix. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly documented in the comment. Tests are updated consistently. One minor observation below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Clean fix. Skipping run_optimize_analyze_loop entirely for pre-quantized models is the correct approach - preserves QDQ structure for QNN/DML EPs. Tests updated consistently.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix — skipping optimize_onnx() entirely for pre-quantized models is the right call to preserve QDQ structure. A few minor inline comments below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix. The root cause analysis is solid and the change is minimal and correct. Skipping optimize_onnx() entirely for QDQ models prevents ORT Level 2 fusion from breaking EP compatibility. Tests are well-aligned with the new behavior. One minor nit below about a stale log message.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall - the root cause is well-explained and the test updates are consistent. A couple of minor suggestions below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix - the root cause analysis is clear and the change is minimal. A couple of minor observations below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix - clear root cause, clean implementation. The QDQ fusion was silently regressing latency, and skipping optimize entirely is the right call. A couple of inline notes below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Clean, surgical fix with a well-documented root cause. The change is straightforward and the tests are properly updated. One consideration below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Overall the fix is correct and well-motivated. The PR description clearly explains the root cause and includes verification. A couple of minor suggestions below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Overall this is a clean, well-motivated fix. The root cause is clearly identified and the solution is minimal and correct. A couple of minor observations below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
LGTM - clean, well-scoped fix. One minor observation inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix. The root cause is well-identified (ORT Level 2 optimizer fusing QDQ patterns breaks QNN/DML EP compatibility). A few minor observations below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly identified and the solution is minimal and correct. The test updates properly reflect the new behavior. A few minor observations inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Overall LGTM - clean, well-scoped fix. One minor suggestion below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is well-explained and the change is minimal and correct. Left a couple of inline suggestions below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix with clear root cause analysis. The change correctly prevents ORT Level 2 optimization from fusing QDQ patterns. One minor nit on a stale log message.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Review Summary: Clean, well-scoped fix. The root cause (ORT L2 optimizer fusing QDQ patterns) is clearly explained, the fix is minimal and correct, and test assertions are properly updated from �ssert_called_once() to �ssert_not_called(). The zero-initialization of �nalyze_iters, analyze_unsupported, analyze_details is appropriate since no analysis runs. current_path retains its prior value which is correct for pass-through. LGTM — no issues found.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix! The root cause is well-identified and the approach is correct - completely skipping ORT optimization preserves QDQ patterns. A couple of minor nits below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Overall this is a clean fix. The root cause is clearly identified and the solution is correct - pre-quantized models should bypass ORT Level 2 optimization entirely to preserve QDQ graph structure. Tests are properly updated. One minor nit on the log message.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Clean, well-motivated fix. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly identified and the solution — skipping the optimizer entirely rather than running it with special flags — is the correct approach for preserving QDQ structure that downstream EPs require. Tests properly updated. One minor observation: the \ est_skip_optimize_kwarg\ test at line 428 also changes to \�ssert_not_called()\ — confirming that the \skip_optimize\ kwarg path now also fully skips (consistent behavior). LGTM.
f83aa5c to
cebbf4d
Compare
1c2dee5 to
51e4841
Compare
xieofxie
left a comment
There was a problem hiding this comment.
Nice catch on the QDQ-skips-optimize issue and the -m model.onnx build-config bug. Two concerns worth addressing before merge.
1. skip_optimize=True no longer actually skips optimization. run_optimize_analyze_loop always calls optimize_onnx internally (see build/common.py:83), so the new control flow appends "optimize" to stages_skipped but still runs the full optimize+autoconf+analyze path. The log messages and stages_skipped then disagree with reality, and test_skip_optimize_kwarg (both test_hf.py and test_onnx.py, lines around 860 / 413) literally encodes the contradiction:
assert "optimize" in result.stages_skipped # claims skip
mock_pipeline["optimize"].assert_called_once() # but it ranThose two assertions can't both be true in a sane reading of "skip_optimize". They were untouched by this PR, but the underlying behavior changed around them — please reconcile (assert_not_called() if the intent is really skip, or rename the flag if the intent is "label as skipped while still running").
2. Silent semantic regression for direct skip_optimize=True callers. Before this PR, skip_optimize=True passed no max_optim_iterations, so the autoconf re-optimization loop didn't run (max_optim_iterations=0 default). After this PR, max_optim_iterations=hack_max_optim_iterations is passed unconditionally, so skip_optimize=True now triggers a full autoconf loop where it used to skip it. Not mentioned in the PR description.
Individual suggestions on the lines below. The pre-quantized fix and the commands/build.py fix both look correct in intent — just want the skip_optimize semantics tightened up so labels match work, and the is_quantized_onnx calls de-duplicated.
51e4841 to
bdee3ff
Compare
- Pre-quantized (QDQ) models now run full optimize+autoconf+analyze, only skip quantize to preserve QDQ structure. - Cache is_quantized_onnx() result to avoid redundant ONNX file reads. - skip_optimize: restore original semantics (run optimize_onnx without autoconf loop, label as skipped). Labels now match actual work done. - Fix generate_build_config onnx_path bug for 'winml build -m model.onnx'.
bdee3ff to
71eacb7
Compare
Summary
Pre-quantized (QDQ) models were incorrectly skipping
optimize_onnx. Our custom optimizer is safe for QDQ models — it removes redundant Q/DQ identity pairs without fusing QDQ into QLinear* ops. This PR changes the behavior so pre-quantized models run the full optimize+autoconf+analyze loop and only skip quantize.Also fixes a bug where
winml build -m model.onnx(without-c) would fail with "not a valid JSON file" becausegenerate_build_configwasn't receiving theonnx_pathparameter.Changes
build/onnx.py&build/hf.py:is_quantized_onnx()result to a localis_qdqvariable to avoid redundant ONNX file reads.skip_optimize: restore original semantics — runsoptimize_onnx(appliesconfig.optimflags) but skips the autoconf re-optimization loop. Stage labels now accurately reflect actual work done.skip_optimizeflag).commands/build.py: Passonnx_pathtogenerate_build_configwhen input is a.onnxfile path.