feat: untie matmul rewrite#857
Open
xieofxie wants to merge 10 commits into
Open
Conversation
added 8 commits
June 5, 2026 14:58
OpenVINO GPU's oneDNN gemm cannot select an implementation for a batched (rank >= 3) MatMul where an operand is a compile-time constant; the same gemm with a dynamic operand, and 2D constant gemm, both compile fine. Transformer disentangled-attention position terms (e.g. DeBERTa) fold to 3D constants and fail to compile with: [GPU] Failed to select implementation for ... type: gemm (compile_graph.cpp:59 selected_impl == nullptr) Add an EP-gated `untie-constant-batched-matmul` surgery that routes the constant operand through Add(const, zero), where zero is a data-dependent runtime [1] tensor (Cast -> Reshape(-1) -> Slice[0:1] -> Sub). This makes the operand runtime-valued so OV's constant folder cannot repack it into a gemm weight, while keeping the single batched MatMul (no perf regression) and leaving numerics unchanged (+0). Wired via autoconf: BatchedConstMatMulValidator detects the pattern and, gated to Intel IHV + GPU, emits a GraphOptimization opportunity the existing autoconf loop auto-applies. Pattern-based, architecture-agnostic. Also makes the model-validator device filter case-insensitive so builds that pass lowercase "gpu" are matched.
- Use loop index for the untied operand name instead of node.name, which is optional in ONNX and can be blank/duplicated (would collide and yield an invalid graph). - Update docstring to describe the actual Cast/Reshape/Slice/Sub construction (was stale ReduceMin wording) and document the non-empty-first-input assumption. - Split the Slice starts/ends/axes initializers into distinct named tensors. - Note the Constant-node detection gap in the validator (shared with surgery). - Add a test for two unnamed batched-const MatMuls (name-collision regression).
…urgery
Re-implements the OpenVINO-GPU batched-const-MatMul workaround as a pattern
rewrite (match -> replace) instead of a SurgeryPipe transform, so node
generation goes through the existing PatternMatcher/PatternRewriter framework.
- New patterns (pattern/batched_const_matmul_patterns.py):
- BatchedConstMatMulPattern (source): matches a bare MatMul with exactly one
rank->=3 constant operand. Overrides check_skeleton_result to skip the base
symbolic-dim rejection, since the dynamic activation operand legitimately
carries symbolic dims.
- UntiedBatchedConstMatMulPattern (target): emits
MatMul(dyn, Add(const, zero)) where zero is a [1] runtime tensor derived
from the MatMul's own dynamic operand (Reshape([-1]) -> Slice([0:1]) -> Sub).
Deriving zero from the dynamic operand keeps the replacement local (the
rewriter only wires to the matched subgraph's boundary tensors) and removes
the surgery's dependency on graph.input[0] and its empty-first-input edge
case. Operands share a dtype, so no Cast is needed.
- Wires the rule into pattern/rules/default.json as capability
"batchedconstmatmul-untied" (enabled:false so it stays out of general
matching; applied only when the capability is turned on).
- Repoints BatchedConstMatMulValidator to emit the rewrite capability flag; the
validator still supplies the Intel-IHV + GPU gating and the autoconf trigger.
- PatternRewriter: tolerate symbolic/dynamic operand dims when building the
dummy input array for a target's get_onnx_model (previously crashed).
The SurgeryPipe untie implementation is left in place but is no longer driven by
autoconf; it can be removed in a follow-up.
The batched-const-MatMul workaround is now implemented as the batchedconstmatmul-untied rewrite, so the SurgeryPipe path is dead code. Removes: - UNTIE_CONSTANT_BATCHED_MATMUL capability. - SurgeryPipeConfig.untie_constant_batched_matmul and its build_config / should_process / process wiring. - SurgeryPipe._untie_constant_batched_matmul. - The corresponding SurgeryPipe unit tests. Updates the validator's known-gap comment to reference the rewrite source pattern instead of the removed surgery.
…ith 'import' and 'import from'' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A pattern implementation of #817