Conversation
|
Your PR no longer requires formatting changes. Thank you for your contribution! |
lkdvos
left a comment
There was a problem hiding this comment.
This doesn't fully fix the issue I think, that code path shouldn't ever be reached by the GPU arrays since it is guarded by an isblasmatrix call that checks pointer(A) isa Ptr.
I think this really needs a more proper rewrite that dispatches to a gemm function that then indeed can determine the proper driver.
Note also that the current fallback is using the Strided machinery to manually write out the kernel, which is actually equivalent to what the generic_matmatmul! function does anyways
|
|
It seems to have unblocked the v1 AMD stuff on TO 🤷 . But if someone wants to make a higher performance version, go ahead. The rocBLAS gemm doesn't work well if the stride in the first dimension isn't 1, I think. |
Codecov Report❌ Patch coverage is
🚀 New features to boost your workflow:
|
|
Would it then not make more sense to fix the |
|
So I think the CUDA one doesn't ever touch this because the result of |
|
Yes, exactly, I think my argument is to either: |
This will help us use the new support for generic
GPUArraystrided views in a way that bypasses some really awful ambiguity warnings.