Skip to content

feat: untie matmul rewrite#857

Open
xieofxie wants to merge 10 commits into
mainfrom
hualxie/untie-matmul-rewrite
Open

feat: untie matmul rewrite#857
xieofxie wants to merge 10 commits into
mainfrom
hualxie/untie-matmul-rewrite

Conversation

@xieofxie

@xieofxie xieofxie commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

A pattern implementation of #817

hualxie added 8 commits June 5, 2026 14:58
OpenVINO GPU's oneDNN gemm cannot select an implementation for a batched
(rank >= 3) MatMul where an operand is a compile-time constant; the same
gemm with a dynamic operand, and 2D constant gemm, both compile fine.
Transformer disentangled-attention position terms (e.g. DeBERTa) fold to
3D constants and fail to compile with:

  [GPU] Failed to select implementation for ... type: gemm
  (compile_graph.cpp:59 selected_impl == nullptr)

Add an EP-gated `untie-constant-batched-matmul` surgery that routes the
constant operand through Add(const, zero), where zero is a data-dependent
runtime [1] tensor (Cast -> Reshape(-1) -> Slice[0:1] -> Sub). This makes
the operand runtime-valued so OV's constant folder cannot repack it into a
gemm weight, while keeping the single batched MatMul (no perf regression)
and leaving numerics unchanged (+0).

Wired via autoconf: BatchedConstMatMulValidator detects the pattern and,
gated to Intel IHV + GPU, emits a GraphOptimization opportunity the
existing autoconf loop auto-applies. Pattern-based, architecture-agnostic.

Also makes the model-validator device filter case-insensitive so builds
that pass lowercase "gpu" are matched.
- Use loop index for the untied operand name instead of node.name, which is
  optional in ONNX and can be blank/duplicated (would collide and yield an
  invalid graph).
- Update docstring to describe the actual Cast/Reshape/Slice/Sub construction
  (was stale ReduceMin wording) and document the non-empty-first-input
  assumption.
- Split the Slice starts/ends/axes initializers into distinct named tensors.
- Note the Constant-node detection gap in the validator (shared with surgery).
- Add a test for two unnamed batched-const MatMuls (name-collision regression).
…urgery

Re-implements the OpenVINO-GPU batched-const-MatMul workaround as a pattern
rewrite (match -> replace) instead of a SurgeryPipe transform, so node
generation goes through the existing PatternMatcher/PatternRewriter framework.

- New patterns (pattern/batched_const_matmul_patterns.py):
  - BatchedConstMatMulPattern (source): matches a bare MatMul with exactly one
    rank->=3 constant operand. Overrides check_skeleton_result to skip the base
    symbolic-dim rejection, since the dynamic activation operand legitimately
    carries symbolic dims.
  - UntiedBatchedConstMatMulPattern (target): emits
    MatMul(dyn, Add(const, zero)) where zero is a [1] runtime tensor derived
    from the MatMul's own dynamic operand (Reshape([-1]) -> Slice([0:1]) -> Sub).
    Deriving zero from the dynamic operand keeps the replacement local (the
    rewriter only wires to the matched subgraph's boundary tensors) and removes
    the surgery's dependency on graph.input[0] and its empty-first-input edge
    case. Operands share a dtype, so no Cast is needed.
- Wires the rule into pattern/rules/default.json as capability
  "batchedconstmatmul-untied" (enabled:false so it stays out of general
  matching; applied only when the capability is turned on).
- Repoints BatchedConstMatMulValidator to emit the rewrite capability flag; the
  validator still supplies the Intel-IHV + GPU gating and the autoconf trigger.
- PatternRewriter: tolerate symbolic/dynamic operand dims when building the
  dummy input array for a target's get_onnx_model (previously crashed).

The SurgeryPipe untie implementation is left in place but is no longer driven by
autoconf; it can be removed in a follow-up.
Comment thread tests/unit/optim/pipes/test_pipe_rewrite_batched_const_matmul.py Fixed
hualxie and others added 2 commits June 10, 2026 11:57
The batched-const-MatMul workaround is now implemented as the
batchedconstmatmul-untied rewrite, so the SurgeryPipe path is dead code. Removes:

- UNTIE_CONSTANT_BATCHED_MATMUL capability.
- SurgeryPipeConfig.untie_constant_batched_matmul and its build_config /
  should_process / process wiring.
- SurgeryPipe._untie_constant_batched_matmul.
- The corresponding SurgeryPipe unit tests.

Updates the validator's known-gap comment to reference the rewrite source
pattern instead of the removed surgery.
…ith 'import' and 'import from''

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@xieofxie xieofxie marked this pull request as ready for review June 10, 2026 06:29
@xieofxie xieofxie requested a review from a team as a code owner June 10, 2026 06:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants