Skip to content

Use a pass-through for gemm#59

Open
kshyatt wants to merge 2 commits into
mainfrom
ksh/gemm
Open

Use a pass-through for gemm#59
kshyatt wants to merge 2 commits into
mainfrom
ksh/gemm

Conversation

@kshyatt
Copy link
Copy Markdown
Member

@kshyatt kshyatt commented May 13, 2026

This will help us use the new support for generic GPUArray strided views in a way that bypasses some really awful ambiguity warnings.

@kshyatt kshyatt requested a review from lkdvos May 13, 2026 10:12
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Your PR no longer requires formatting changes. Thank you for your contribution!

Copy link
Copy Markdown
Member

@lkdvos lkdvos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't fully fix the issue I think, that code path shouldn't ever be reached by the GPU arrays since it is guarded by an isblasmatrix call that checks pointer(A) isa Ptr.

I think this really needs a more proper rewrite that dispatches to a gemm function that then indeed can determine the proper driver.
Note also that the current fallback is using the Strided machinery to manually write out the kernel, which is actually equivalent to what the generic_matmatmul! function does anyways

@kshyatt
Copy link
Copy Markdown
Member Author

kshyatt commented May 13, 2026

pointer(A) isa Ptr is true for ROCArrays, so this code path is definitely reached by GPUArrays because it was erroring

@kshyatt
Copy link
Copy Markdown
Member Author

kshyatt commented May 13, 2026

This doesn't fully fix the issue I think

It seems to have unblocked the v1 AMD stuff on TO 🤷 . But if someone wants to make a higher performance version, go ahead. The rocBLAS gemm doesn't work well if the stride in the first dimension isn't 1, I think.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

❌ Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
ext/StridedGPUArraysExt.jl 0.00% 2 Missing ⚠️
src/linalg.jl 0.00% 2 Missing ⚠️
Files with missing lines Coverage Δ
ext/StridedGPUArraysExt.jl 44.44% <0.00%> (-1.46%) ⬇️
src/linalg.jl 21.87% <0.00%> (-0.24%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lkdvos
Copy link
Copy Markdown
Member

lkdvos commented May 13, 2026

Would it then not make more sense to fix the isblasmatrix implementation? That at least treats CUDA and AMD equally then.

@kshyatt
Copy link
Copy Markdown
Member Author

kshyatt commented May 13, 2026

So I think the CUDA one doesn't ever touch this because the result of pointer(A) there is a CuPtr, NOT a Ptr 🙃 . Not sure about Metal?

@lkdvos
Copy link
Copy Markdown
Member

lkdvos commented May 13, 2026

Yes, exactly, I think my argument is to either:
A. Make AMD also not touch this so it is treated the same as the other backends
B. Make every single one either dispatch to gemm if possible, and if not use the Strided implementation, similar to how the current CPU version works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants