You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement pattern-based rewriting for QLinear operators (QLinearConv, QLinearMatMul) to enable quantization-aware graph transformations needed for INT8/INT4 model deployment on NPUs.
Context
QLinear operators are ONNX's representation of quantized computations. Fusing and rewriting these operators (e.g., QLinearConv → fused INT8 kernel) is required for efficient execution on hardware NPUs that support quantized computation natively. This is the final graph optimizer capability needed for QDQ pipeline completion.
QLinear * patterns recognized and rewritten by the graph optimizer
QLinearConv + Bias fusion
QLinearMatMul → INT8 GEMM rewrite (where EP supports it)
Quantization-scale/zero-point folding
Acceptance Criteria
QLinearConv fusion rule implemented and tested
QLinearMatMul rewrite rule implemented and tested
Scale/zero-point folding across consecutive QDQ patterns
All QLinear rewrite rules tested against QDQ-quantized P0 models
Runtime-validated: quantized model output matches pre-rewrite output within tolerance
All existing tests pass (CARDINAL RULE: no regressions)
Technical Notes
Must follow CARDINAL RULE This repo is missing a LICENSE file #1: no hardcoded model architecture assumptions — all pattern matching must be graph-structure-based
Summary
Implement pattern-based rewriting for QLinear operators (QLinearConv, QLinearMatMul) to enable quantization-aware graph transformations needed for INT8/INT4 model deployment on NPUs.
Context
QLinear operators are ONNX's representation of quantized computations. Fusing and rewriting these operators (e.g., QLinearConv → fused INT8 kernel) is required for efficient execution on hardware NPUs that support quantized computation natively. This is the final graph optimizer capability needed for QDQ pipeline completion.
From:
plans/release/0315_release_plan/P1_CHECKLIST.md(P1-FEATURE-006)plans/release/0501_release_plan/P0_CHECKLIST.md(P1-FEATURE-007)Current State
head_diminto a shared NormalizedConfig base class #401) produces QLinear operators that need post-processingDesired State
Acceptance Criteria
Technical Notes
head_diminto a shared NormalizedConfig base class #401 on at least 3 architectures (CNN, BERT, ViT)Related Files
plans/release/0315_release_plan/feature-scale.md— P1.4 QLinear Rewriteplans/release/0501_release_plan/feature-scale.md— P1.7 QLinear Rewriteplans/release/0315_release_plan/P1_CHECKLIST.md— P1-FEATURE-006plans/release/0501_release_plan/P0_CHECKLIST.md— P1-FEATURE-007head_diminto a shared NormalizedConfig base class #401 — QDQ quantization (upstream dependency)