Skip to content

Add configuration for AMD to EESSI-extend-easybuild.eb#206

Draft
zerefwayne wants to merge 3 commits intoEESSI:mainfrom
zerefwayne:eessi-extend-amd
Draft

Add configuration for AMD to EESSI-extend-easybuild.eb#206
zerefwayne wants to merge 3 commits intoEESSI:mainfrom
zerefwayne:eessi-extend-amd

Conversation

@zerefwayne
Copy link
Copy Markdown

No description provided.

Copy link
Copy Markdown
Member

@ocaisa ocaisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ocaisa
Copy link
Copy Markdown
Member

ocaisa commented Apr 17, 2026

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=x86_64/amd/zen4

@eessi-bot-aws
Copy link
Copy Markdown

eessi-bot-aws bot commented Apr 17, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-software
Building on: amd-zen4
Building for: x86_64/amd/zen4
Job dir: /project/def-users/SHARED/jobs/2026.04/pr_206/148714

date job status comment
Apr 17 12:14:02 UTC 2026 submitted job id 148714 awaits release by job manager
Apr 17 12:15:06 UTC 2026 released job awaits launch by Slurm scheduler
Apr 17 12:20:16 UTC 2026 running job 148714 is running
Apr 17 12:26:32 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-148714.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen4-17764286740.tar.zstsize: 1 MiB (1604937 bytes)
entries: 169
modules under 2025.06/software/linux/x86_64/amd/zen4/modules/all
EESSI-extend/2025.06-easybuild.lua
elfutils/0.193-GCCcore-14.2.0.lua
software under 2025.06/software/linux/x86_64/amd/zen4/software
EESSI-extend/2025.06-easybuild
elfutils/0.193-GCCcore-14.2.0
reprod directories under 2025.06/software/linux/x86_64/amd/zen4/reprod
elfutils/0.193-GCCcore-14.2.0/20260417_122321UTC
other under 2025.06/software/linux/x86_64/amd/zen4
no other files in tarball
Apr 17 12:26:32 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/5) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/22Jul2025-foss-2024a-kokkos %scale=1_node /ade8cad7 @BotBuildTests:x86-64-zen4+default
P: perf: 1148.362 timesteps/s (r:0, l:None, u:None)
[ OK ] (2/5) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /e4bf9965 @BotBuildTests:x86-64-zen4+default
P: latency: 1.45 us (r:0, l:None, u:None)
[ OK ] (3/5) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /3da4890b @BotBuildTests:x86-64-zen4+default
P: latency: 3.26 us (r:0, l:None, u:None)
[ OK ] (4/5) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /3255009a @BotBuildTests:x86-64-zen4+default
P: latency: 0.16 us (r:0, l:None, u:None)
[ OK ] (5/5) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /59f4b331 @BotBuildTests:x86-64-zen4+default
P: bandwidth: 14296.92 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 5/5 test case(s) from 5 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-148714.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@ocaisa
Copy link
Copy Markdown
Member

ocaisa commented Apr 17, 2026

@zerefwayne

== FAILED: Installation ended unsuccessfully: Expected amdgcn_capabilities to be
set to build this EasyConfig. Please specify either --amdgcn-capabilities, or
set amdgcn_capabilities in the EasyConfig! (took 58 secs)

@ocaisa
Copy link
Copy Markdown
Member

ocaisa commented Apr 17, 2026

Wrong build command!

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws on:arch=zen4 for:arch=x86_64/amd/zen4,accel=amd/gfx90a

@eessi-bot-aws
Copy link
Copy Markdown

eessi-bot-aws bot commented Apr 17, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-software
Building on: amd-zen4
Building for: x86_64/amd/zen4 and accelerator amd/gfx90a
Job dir: /project/def-users/SHARED/jobs/2026.04/pr_206/148715

date job status comment
Apr 17 12:56:39 UTC 2026 submitted job id 148715 awaits release by job manager
Apr 17 12:56:56 UTC 2026 released job awaits launch by Slurm scheduler
Apr 17 12:57:58 UTC 2026 running job 148715 is running
Apr 17 12:58:59 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-148715.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen4-accel-amd-gfx90a-17764306630.tar.zstsize: 0 MiB (25379 bytes)
entries: 13
modules under 2025.06/software/linux/x86_64/amd/zen4/accel/amd/gfx90a/modules/all
no module files in tarball
software under 2025.06/software/linux/x86_64/amd/zen4/accel/amd/gfx90a/software
no software packages in tarball
reprod directories under 2025.06/software/linux/x86_64/amd/zen4/accel/amd/gfx90a/reprod
no reprod directories in tarball
other under 2025.06/software/linux/x86_64/amd/zen4/accel/amd/gfx90a
2025.06/software/linux/x86_64/amd/zen4/modules/all/EESSI-extend/2025.06-easybuild.lua
2025.06/software/linux/x86_64/amd/zen4/modules/devel/EESSI-extend/2025.06-easybuild.lua
2025.06/software/linux/x86_64/amd/zen4/software/EESSI-extend/2025.06-easybuild/
2025.06/software/linux/x86_64/amd/zen4/software/EESSI-extend/2025.06-easybuild/easybuild/
2025.06/software/linux/x86_64/amd/zen4/software/EESSI-extend/2025.06-easybuild/easybuild/easybuild-EESSI-extend-2025.06-20260417.125730.log
2025.06/software/linux/x86_64/amd/zen4/software/EESSI-extend/2025.06-easybuild/easybuild/easybuild-EESSI-extend-2025.06-20260417.125730_test_report.md
2025.06/software/linux/x86_64/amd/zen4/software/EESSI-extend/2025.06-easybuild/easybuild/EESSI-extend-2025.06-easybuild-easybuild-devel
2025.06/software/linux/x86_64/amd/zen4/software/EESSI-extend/2025.06-easybuild/easybuild/EESSI-extend-2025.06-easybuild.eb
2025.06/software/linux/x86_64/amd/zen4/software/EESSI-extend/2025.06-easybuild/easybuild/reprod/
2025.06/software/linux/x86_64/amd/zen4/software/EESSI-extend/2025.06-easybuild/easybuild/reprod/easyblocks/
2025.06/software/linux/x86_64/amd/zen4/software/EESSI-extend/2025.06-easybuild/easybuild/reprod/easyblocks/bundle.py
2025.06/software/linux/x86_64/amd/zen4/software/EESSI-extend/2025.06-easybuild/easybuild/reprod/EESSI-extend-2025.06-easybuild.eb
2025.06/software/linux/x86_64/amd/zen4/software/EESSI-extend/2025.06-easybuild/easybuild/reprod/EESSI-extend-2025.06-easybuild.env
Apr 17 12:58:59 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/5) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/22Jul2025-foss-2024a-kokkos %scale=1_node /ade8cad7 @BotBuildTests:x86-64-zen4+default
P: perf: 1059.685 timesteps/s (r:0, l:None, u:None)
[ OK ] (2/5) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /e4bf9965 @BotBuildTests:x86-64-zen4+default
P: latency: 1.47 us (r:0, l:None, u:None)
[ OK ] (3/5) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /3da4890b @BotBuildTests:x86-64-zen4+default
P: latency: 3.41 us (r:0, l:None, u:None)
[ OK ] (4/5) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /3255009a @BotBuildTests:x86-64-zen4+default
P: latency: 0.15 us (r:0, l:None, u:None)
[ OK ] (5/5) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /59f4b331 @BotBuildTests:x86-64-zen4+default
P: bandwidth: 14282.18 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 5/5 test case(s) from 5 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-148715.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@ocaisa
Copy link
Copy Markdown
Member

ocaisa commented Apr 17, 2026

ERROR: Failed to parse configuration options: Found 1 environment variable(s) that are prefixed with EASYBUILD but do not match valid option(s): EASYBUILD_AMDGCN_COMPUTE_CAPABILITIES

Comment thread EESSI-extend-easybuild.eb Outdated
@ocaisa
Copy link
Copy Markdown
Member

ocaisa commented Apr 17, 2026

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws on:arch=zen4 for:arch=x86_64/amd/zen4,accel=amd/gfx90a

@eessi-bot-aws
Copy link
Copy Markdown

eessi-bot-aws bot commented Apr 17, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-software
Building on: amd-zen4
Building for: x86_64/amd/zen4 and accelerator amd/gfx90a
Job dir: /project/def-users/SHARED/jobs/2026.04/pr_206/148716

date job status comment
Apr 17 13:07:09 UTC 2026 submitted job id 148716 awaits release by job manager
Apr 17 13:08:03 UTC 2026 released job awaits launch by Slurm scheduler
Apr 17 13:09:06 UTC 2026 running job 148716 is running

@ocaisa
Copy link
Copy Markdown
Member

ocaisa commented Apr 17, 2026

We will also need to add a check to the EasyBuild hooks for software that requires an AMD GPU:

def pre_fetch_hook_check_installation_path(self, *args, **kwargs):
# When we know the CUDA status, we will need to verify the installation path
# if we are doing an EESSI or host_injections installation
accelerator_deps = ['CUDA']
strict_eessi_installation = (
bool(re.search(EESSI_INSTALLATION_REGEX, self.installdir)) or
self.installdir.startswith(HOST_INJECTIONS_LOCATION))
if strict_eessi_installation and not os.getenv("EESSI_OVERRIDE_STRICT_INSTALLPATH_CHECK"):
dependency_names = self.cfg.dependency_names()
if self.cfg.name in accelerator_deps or any(dep in dependency_names for dep in accelerator_deps):
# Make sure the path is an accelerator location
if "/accel/" not in self.installdir:
raise EasyBuildError(
f"It seems you are trying to install an accelerator package {self.cfg.name} into a "
f"non-accelerator location {self.installdir}. You need to reconfigure your installation to target "
"the correct location."
)
else:
# If we don't have an accelerator dependency then we should be in a CPU installation path
if "/accel/" in self.installdir:
raise EasyBuildError(
f"It seems you are trying to install a CPU-only package {self.cfg.name} into accelerator location "
f"{self.installdir}. If this is a dependency of the package you are really interested in you will "
"need to first install the CPU-only dependencies of that package."
)

but this can be in a follow-up PR. Should be possible by taking the full list of dependencies (including toolchain deps which I think you can get from self.cfg._toolchain.tcdeps), and then just add adding ROCm-LLVM to acclerator_deps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants