Skip to content

fix: pre-quantized models run optimize, only skip quantize#794

Open
DingmaomaoBJTU wants to merge 1 commit into
mainfrom
dingmaomaobjtu/fix-prequantized-optimize-skip
Open

fix: pre-quantized models run optimize, only skip quantize#794
DingmaomaoBJTU wants to merge 1 commit into
mainfrom
dingmaomaobjtu/fix-prequantized-optimize-skip

Conversation

@DingmaomaoBJTU

@DingmaomaoBJTU DingmaomaoBJTU commented May 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

Pre-quantized (QDQ) models were incorrectly skipping optimize_onnx. Our custom optimizer is safe for QDQ models — it removes redundant Q/DQ identity pairs without fusing QDQ into QLinear* ops. This PR changes the behavior so pre-quantized models run the full optimize+autoconf+analyze loop and only skip quantize.

Also fixes a bug where winml build -m model.onnx (without -c) would fail with "not a valid JSON file" because generate_build_config wasn't receiving the onnx_path parameter.

Changes

  • build/onnx.py & build/hf.py:
    • Pre-quantized models now run full optimize+autoconf+analyze, only skip quantize.
    • Cache is_quantized_onnx() result to a local is_qdq variable to avoid redundant ONNX file reads.
    • skip_optimize: restore original semantics — runs optimize_onnx (applies config.optim flags) but skips the autoconf re-optimization loop. Stage labels now accurately reflect actual work done.
    • Fix misleading log messages (distinguish QDQ detection vs skip_optimize flag).
  • commands/build.py: Pass onnx_path to generate_build_config when input is a .onnx file path.
  • Tests: Updated expectations to match new behavior.

@DingmaomaoBJTU DingmaomaoBJTU requested a review from a team as a code owner May 29, 2026 12:52

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is a clean, well-motivated fix. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly explained and the change is minimal. Tests are updated correctly. A couple of minor suggestions below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall a clean, well-motivated fix. The root cause is clearly explained and the change is minimal. Tests are updated correctly. A few minor suggestions inline.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix overall. The root cause is well-identified and the solution is minimal and correct: skipping run_optimize_analyze_loop() entirely preserves QDQ structure for downstream EPs. Tests are properly updated. One minor issue noted inline.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix -- the root cause analysis is clear and the change is minimal. The pre-quantized path now correctly preserves the QDQ structure by skipping the ORT optimizer entirely. A few minor suggestions inline.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-motivated fix. Skipping optimize_onnx entirely for pre-quantized models is the right call — ORT Level 2 fusing QDQ patterns into QLinearConv is a known foot-gun for QNN/DML EPs. The test updates are consistent. One minor nit below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix — the root-cause is clear and the change is minimal. Two minor issues noted inline.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix - the root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly identified and the solution is minimal. Two minor nits inline.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix — the root cause is clear and the change is minimal. A couple of minor suggestions below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly identified and the change is minimal and correct. Tests are well updated. A couple of minor suggestions below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix overall. The root cause is well-explained and the change is minimal and correct. Left a couple of inline nits.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed — this looks good. The fix is clean and minimal. One minor observation:

src/winml/modelkit/build/onnx.py: The zeroed-out tuple �nalyze_iters, analyze_unsupported, analyze_details = 0, 0, {} is fine for now, but if �uild_onnx_model ever adds new return fields from the analyze loop, this line silently falls out of sync. Consider adding a brief comment noting this must track the return shape of
un_optimize_analyze_loop.

Tests are correctly updated. The docstring on est_pre_quantized_runs_analyze_only accurately reflects the new behavior.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean fix with good test coverage. The logic change is correct - skipping run_optimize_analyze_loop() entirely for pre-quantized models prevents ORT Level 2 QDQ fusion. Two minor issues noted inline.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-motivated fix. Skipping optimize entirely for pre-quantized models is the correct approach. Tests properly updated.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-scoped fix. The root cause analysis is solid and the test updates properly reflect the new behavior. One minor suggestion inline.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix - skipping optimize_onnx() entirely for pre-quantized models is the correct approach to preserve QDQ structure. A couple of minor nits below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix overall — the root cause is well explained and the approach is correct. Skipping the ORT optimizer entirely for pre-quantized models makes sense to preserve QDQ structure. A couple of minor nits below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix. Skipping ORT Level 2 optimization for pre-quantized models avoids QDQ pattern fusion that breaks EP compatibility. Code is clean, tests updated correctly.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix — the root cause is clear and the change is minimal. A couple of minor observations inline.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix. The root cause (ORT L2 optimizer fusing QDQ patterns) is clearly identified and the change is minimal and correct. Tests are properly updated. A couple of minor observations inline.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix - the root cause and verification are clearly documented. The code change is minimal and correct: pre-quantized models now bypass optimize_onnx() entirely, preserving QDQ structure for downstream EPs. Two minor nits below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix. The root cause (ORT Level 2 optimizer fusing QDQ patterns breaking EP compatibility) is well-identified and the solution is minimal. Tests updated correctly to assert_not_called(). One minor suggestion below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean fix overall. A couple inline suggestions.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix overall - the root cause is clear and the change is minimal. The tests are well-updated to match the new behavior. Left a couple of minor suggestions.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix overall — the root cause and solution are well-motivated. Skipping ORT Level 2 optimization for pre-quantized models is the correct approach to preserve QDQ structure. A couple of minor observations below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix overall. The root cause is well-identified and the change is minimal and correct. A couple of minor nits below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix overall — the root cause analysis is solid and the change is minimal. A couple of minor nits below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix overall — the root cause is clear and the change is minimal. A few observations inline.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-motivated fix. The root cause (ORT L2 optimizer fusing QDQ patterns) is clearly documented and the solution is appropriately minimal. A few inline suggestions.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix overall. The root cause is well-identified and the solution is clean — skipping ORT Level 2 optimization entirely for pre-quantized models prevents QDQ fusion that breaks EP compatibility. The tests correctly reflect the new behavior. One minor nit: see inline comment about a stale log message.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean fix. Skipping the optimizer entirely for pre-quantized models is the right approach to preserve QDQ node structure. Tests are updated consistently.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix — the root cause is clear and the change is minimal. A couple of minor observations inline.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall a clean, well-motivated fix. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly documented in the comment. Tests are updated consistently. One minor observation below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean fix. Skipping run_optimize_analyze_loop entirely for pre-quantized models is the correct approach - preserves QDQ structure for QNN/DML EPs. Tests updated consistently.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix — skipping optimize_onnx() entirely for pre-quantized models is the right call to preserve QDQ structure. A few minor inline comments below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix. The root cause analysis is solid and the change is minimal and correct. Skipping optimize_onnx() entirely for QDQ models prevents ORT Level 2 fusion from breaking EP compatibility. Tests are well-aligned with the new behavior. One minor nit below about a stale log message.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix overall - the root cause is well-explained and the test updates are consistent. A couple of minor suggestions below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix - the root cause analysis is clear and the change is minimal. A couple of minor observations below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix - clear root cause, clean implementation. The QDQ fusion was silently regressing latency, and skipping optimize entirely is the right call. A couple of inline notes below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, surgical fix with a well-documented root cause. The change is straightforward and the tests are properly updated. One consideration below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the fix is correct and well-motivated. The PR description clearly explains the root cause and includes verification. A couple of minor suggestions below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is a clean, well-motivated fix. The root cause is clearly identified and the solution is minimal and correct. A couple of minor observations below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - clean, well-scoped fix. One minor observation inline.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix. The root cause is well-identified (ORT Level 2 optimizer fusing QDQ patterns breaks QNN/DML EP compatibility). A few minor observations below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix overall. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly identified and the solution is minimal and correct. The test updates properly reflect the new behavior. A few minor observations inline.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM - clean, well-scoped fix. One minor suggestion below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix overall. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is well-explained and the change is minimal and correct. Left a couple of inline suggestions below.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix with clear root cause analysis. The change correctly prevents ORT Level 2 optimization from fusing QDQ patterns. One minor nit on a stale log message.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary: Clean, well-scoped fix. The root cause (ORT L2 optimizer fusing QDQ patterns) is clearly explained, the fix is minimal and correct, and test assertions are properly updated from �ssert_called_once() to �ssert_not_called(). The zero-initialization of �nalyze_iters, analyze_unsupported, analyze_details is appropriate since no analysis runs. current_path retains its prior value which is correct for pass-through. LGTM — no issues found.

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix! The root cause is well-identified and the approach is correct - completely skipping ORT optimization preserves QDQ patterns. A couple of minor nits below.

Comment thread src/winml/modelkit/build/onnx.py
Comment thread src/winml/modelkit/build/onnx.py Outdated

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is a clean fix. The root cause is clearly identified and the solution is correct - pre-quantized models should bypass ORT Level 2 optimization entirely to preserve QDQ graph structure. Tests are properly updated. One minor nit on the log message.

Comment thread src/winml/modelkit/build/onnx.py Outdated

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-motivated fix. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly identified and the solution — skipping the optimizer entirely rather than running it with special flags — is the correct approach for preserving QDQ structure that downstream EPs require. Tests properly updated. One minor observation: the \ est_skip_optimize_kwarg\ test at line 428 also changes to \�ssert_not_called()\ — confirming that the \skip_optimize\ kwarg path now also fully skips (consistent behavior). LGTM.

@DingmaomaoBJTU DingmaomaoBJTU force-pushed the dingmaomaobjtu/fix-prequantized-optimize-skip branch from f83aa5c to cebbf4d Compare June 8, 2026 05:52
@DingmaomaoBJTU DingmaomaoBJTU changed the title fix: skip optimize_onnx for pre-quantized models fix: pre-quantized models run optimize, only skip quantize Jun 8, 2026
@DingmaomaoBJTU DingmaomaoBJTU force-pushed the dingmaomaobjtu/fix-prequantized-optimize-skip branch 2 times, most recently from 1c2dee5 to 51e4841 Compare June 8, 2026 06:31

@xieofxie xieofxie left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch on the QDQ-skips-optimize issue and the -m model.onnx build-config bug. Two concerns worth addressing before merge.

1. skip_optimize=True no longer actually skips optimization. run_optimize_analyze_loop always calls optimize_onnx internally (see build/common.py:83), so the new control flow appends "optimize" to stages_skipped but still runs the full optimize+autoconf+analyze path. The log messages and stages_skipped then disagree with reality, and test_skip_optimize_kwarg (both test_hf.py and test_onnx.py, lines around 860 / 413) literally encodes the contradiction:

assert "optimize" in result.stages_skipped     # claims skip
mock_pipeline["optimize"].assert_called_once()  # but it ran

Those two assertions can't both be true in a sane reading of "skip_optimize". They were untouched by this PR, but the underlying behavior changed around them — please reconcile (assert_not_called() if the intent is really skip, or rename the flag if the intent is "label as skipped while still running").

2. Silent semantic regression for direct skip_optimize=True callers. Before this PR, skip_optimize=True passed no max_optim_iterations, so the autoconf re-optimization loop didn't run (max_optim_iterations=0 default). After this PR, max_optim_iterations=hack_max_optim_iterations is passed unconditionally, so skip_optimize=True now triggers a full autoconf loop where it used to skip it. Not mentioned in the PR description.

Individual suggestions on the lines below. The pre-quantized fix and the commands/build.py fix both look correct in intent — just want the skip_optimize semantics tightened up so labels match work, and the is_quantized_onnx calls de-duplicated.

Comment thread src/winml/modelkit/build/hf.py Outdated
Comment thread src/winml/modelkit/build/hf.py Outdated
Comment thread src/winml/modelkit/build/hf.py Outdated
Comment thread src/winml/modelkit/build/hf.py Outdated
Comment thread src/winml/modelkit/build/onnx.py Outdated
Comment thread src/winml/modelkit/build/onnx.py Outdated
Comment thread src/winml/modelkit/commands/build.py
@DingmaomaoBJTU DingmaomaoBJTU force-pushed the dingmaomaobjtu/fix-prequantized-optimize-skip branch from 51e4841 to bdee3ff Compare June 8, 2026 07:51
- Pre-quantized (QDQ) models now run full optimize+autoconf+analyze,
  only skip quantize to preserve QDQ structure.
- Cache is_quantized_onnx() result to avoid redundant ONNX file reads.
- skip_optimize: restore original semantics (run optimize_onnx without
  autoconf loop, label as skipped). Labels now match actual work done.
- Fix generate_build_config onnx_path bug for 'winml build -m model.onnx'.
@DingmaomaoBJTU DingmaomaoBJTU force-pushed the dingmaomaobjtu/fix-prequantized-optimize-skip branch from bdee3ff to 71eacb7 Compare June 8, 2026 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants