Skip to content

initial commit enabling GPU direct communication for Cray-MPICH#724

Draft
JPRichings wants to merge 3 commits intoQuEST-Kit:develfrom
JPRichings:devel
Draft

initial commit enabling GPU direct communication for Cray-MPICH#724
JPRichings wants to merge 3 commits intoQuEST-Kit:develfrom
JPRichings:devel

Conversation

@JPRichings
Copy link
Copy Markdown
Contributor

This needs testing but here is an initial suggestion for checking support for GPU enabled MPI in CRAY_MPICH this should enable direct inter GPU comms for a wider range of systems in the HPC space that don't always support OpenMPI for there scale out network.

Plan to test on MI210 nodes on ARCHER2 when they are available.

@JPRichings
Copy link
Copy Markdown
Contributor Author

JPRichings commented Apr 25, 2026

Should have updated this before:

Hardware: 1 node 4 gpus Mi210 ARCHER2

before (no MPI direct comms):

QFT run time: 9.49068s
Total run time: 10.2344s

after (with MPI direct comms):

QFT run time: 2.29135s
Total run time: 2.92993s

I think it might be worth switching on GPU direct comms... ;)

@otbrown otbrown changed the title initial commit enabling GPU direct communictation for Cray-MPICH initial commit enabling GPU direct communication for Cray-MPICH Apr 27, 2026
@JPRichings
Copy link
Copy Markdown
Contributor Author

Test all failing due to:

[ 80%] Building CXX object _deps/catch2-build/src/CMakeFiles/Catch2.dir/catch2/matchers/catch_matchers_predicate.cpp.o
/Users/runner/work/QuEST/QuEST/quest/src/comm/comm_config.cpp:75:5: error: use of undeclared identifier 'MPI_Get_library_version'
   75 |     MPI_Get_library_version(version_string, resultlen);
      |     ^
1 error generated.
  • I will look to add macro guard around this mpi call to fix non-mpi compilation.
  • need to set some sensible default buffer sizes
  • Need to remove the debugging messages for the code

@JPRichings
Copy link
Copy Markdown
Contributor Author

need to return NONE for comm_whichMpi and false for comm_set_isMpiGpuAware

int isMultithreaded;
int isGpuAccelerated;
int isDistributed;
int isMPIGPUAware;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be isMpiGpuAware to be consistent with isGpuSharingEnabled, comm_set_isMpiGpuAware, etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants