[AMD GPU] Add Windows & Linux ROCm support and Linux MIGraphX support#1188
Open
Looong01 wants to merge 30 commits intolightvector:masterfrom
Open
[AMD GPU] Add Windows & Linux ROCm support and Linux MIGraphX support#1188Looong01 wants to merge 30 commits intolightvector:masterfrom
Looong01 wants to merge 30 commits intolightvector:masterfrom
Conversation
Add ROCm and MIGraphX support for AMD GPU
Author
Author
Owner
|
Thanks, I'll look at this soon. |
Author
Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




All test passed, we can merge it to main branch! @lightvector
Bothe Windows and Linux Binary release has been published here: https://github.com/Looong01/KataGo-Multi-backends/releases
Background
This PR summarizes all commits by
Looong01on theAMD_GPUbranch from2025-07-28to2026-03-16(23commits total:18non-merge +5merge), focused on introducing and refining ROCm backend support in KataGo, plus the new MIGraphX backend added on theMIGraphXbranch.Key Changes — ROCm Backend
rocmhelpers.hip,rocmutils.*,rocmincludes.h,rocmerrorcheck.h.USE_ROCM_BACKENDinto startup and config flow (setup/benchmark/gtpconfig) for proper backend detection and config generation.HIP_PATH/ROCM_PATH, clang toolchain handling, Windows library search paths).rocmbackend_new.cppafter merging validated changes into the main backend path.cpp/configs/*) with ROCm instructions androcmDeviceToUse*,rocmUseFP16examples.lightvector:masterto reduce branch drift.Critical Bug Fix: ConvLayer accumulate (residual skip connections)
miopenConvolutionForwardImmediatedoes not supportalpha/betaparameters (unlike cuDNN'scudnnConvolutionForward). The original code setbeta = accumulate ? 1.0 : 0.0but this value was never passed to the MIOpen API, causing all residual skip connections to be silently dropped — the neural network output was effectively garbage.accumulate=true, save the output buffer (trunk) to a pre-allocatedaccumBufviahipMemcpyAsync(Device-to-Device), run convolution (which overwrites the output buffer), then add the saved residual back using a newcustomCudaAddTensorsInplaceGPU kernel. All operations stay in VRAM with zero CPU-side data transfer.rocmhelpers.hip/rocmhelpers.h:customCudaAddTensorsInplace(float*, const float*, int)customCudaAddTensorsInplace(half*, const half*, int)accumBufis pre-allocated once perConvLayerat construction time (sized formaxBatchSize), avoiding per-inferencehipMalloc/hipFreeoverhead.Secondary Fix: Algorithm enumeration buffer overflow
miopenConvolutionForwardGetSolutionCountreturns the available count by overwriting the output parameter. The original code used this count to size a fixed stack arraymiopenConvSolution_t solutions[2*requestedAlgoCount]which could overflow. Replaced withstd::vector<miopenConvSolution_t> solutions(availableAlgoCount)for safe dynamic sizing.Windows ROCm Build — CMakeLists.txt Self-Configuration
Added full Windows ROCm build support directly into `CMakeLists.txt.
Key Changes — MIGraphX Backend (New)
Added a complete MIGraphX graph-compiler backend (migraphxbackend.cpp, 1886 lines) as an alternative to the ROCm (MIOpen) backend. MIGraphX compiles the entire neural network into a single fused GPU program, leveraging AMD's graph-level optimizations (operator fusion, memory planning, kernel scheduling).
Architecture
migraphx::programat load time usingMIGraphXGraphBuilder, compiled once, then cached as.mxrfiles under~/.katago/migraphxcache/.{4, 8, 16, 24, 32, 40, 64}(capped bymaxBatchSize). At inference time,getBestBatchSize()selects the smallest compiled size ≥ actual batch to minimize GPU waste..mxrfiles with naming formatmigraphx_{modelName}_{sha256}_{H}x{W}_batch{N}_fp{0|1}_nhwc{0|1}_{exact|max}.mxr. First launch compiles all batch sizes (slow); subsequent launches load from cache in seconds.Neural Network Components Implemented
FP16 Support
useFP16ModeisAutoorTrue).float_typeon host; aconvertop inside the graph handles float→half on GPU.half_type; outputs are cast back tofloatviastatic_cast<float>in thevisit()lambda.Build Integration
USE_BACKEND=MIGRAPHXoption in CMakeLists.txt (~60 lines of build logic).libmigraphx(and optionallibmigraphx_gpu) from rocm."mgx"backend prefix insetup.cpp, forced NCHW format.main.cpp: prints"Using MIGraphX backend".Known Limitations
nnXLen×nnYLen.Change Stats — ROCm
23(non-merge: 18,merge: 5) + post-PR bug fixes21+3(rocmbackend.cpp,rocmhelpers.hip,rocmhelpers.h)+9372 / -4009++59 / -28Change Stats — MIGraphX
3(non-merge: 3)4(migraphxbackend.cpp, CMakeLists.txt, setup.cpp, main.cpp)+1977 / -1(1886 lines new backend + 91 lines build/setup integration)Included Commits (Author: Looong01)
ROCm Backend (
AMD_GPUbranch, 2025-07-28 ~ 2026-03-16)1f2ae46e2025-07-28 Add ROCm backendb45553042025-07-28 Fix bugs8b30cb962025-07-31 Update570ced012025-08-01 Fix bugsabb612402025-08-01 Fix bugsbfb292e72025-08-01 All bug fixed4606424f2025-08-01 Update1e8ea7882025-08-02 test new methodc1a09cf32025-08-02 Update0957b88b2025-08-02 Test finishedc70d841a2025-08-02 Update docks1d05ca8d2025-08-02 Update gitignore9d4662b72025-08-02 Update new methodd40bd5092025-08-02 Optimize performance158d24df2025-08-13 Update new Convlayer methodec32eb192025-08-13 Merge branch 'master' of https://github.com/Looong01/KataGo-ROCm0bfe0a142025-10-04 Add new compile targetf5fbb3362025-11-08 Merge branch 'lightvector:master' into master26d8c5bd2025-11-08 Add ROCm for Windows support555d2f172025-12-01 Merge branch 'lightvector:master' into masterdbc7cfa42026-02-22 Merge branch 'lightvector:master' into mastered396b722026-02-28 Fix bugsccec62c52026-03-16 Merge branch 'lightvector:master' into masterxxxxxxxx2026-04-19 Fix critical ConvLayer accumulate bug & algorithm buffer overflowMIGraphX Backend (
MIGraphXbranch, 2026-02-27 ~ 2026-04-19)c511c3382026-02-27 Add MIGraphX support00cb68812026-04-19 Fix bugs (MIGraphX: 5 structural bugs, GELU→MishScale8, NHWC→NCHW, dimension mismatches)b1da0e062026-04-19 Optimize performance (FP16 default, multi-batch compilation, cache per batch size)