feat: Intel GPU Max (Ponte Vecchio) OpenMP target offload support by sbryngelson · Pull Request #1445 · MFlowCode/MFC

sbryngelson · 2026-05-18T14:00:57Z

Summary

Adds end-to-end support for building and running MFC on Intel Data Center GPU Max 1100 (Ponte Vecchio) using ifx 2025.0+ with OpenMP target offload to SPIR-V/SPIR64. Verified on GT CRNCH RoboGator (dash4). All 161 1D regression tests pass on the Intel GPU.

Usage

source ./mfc.sh load -c crnch -m g       # load Intel oneAPI 2025.1 modules
./mfc.sh build --gpu mp --intel-aot -j 8 # AOT compile to native PVC ISA
./mfc.sh test --gpu mp --intel-aot -- --binary mpirun

Changes

Build system (`CMakeLists.txt`, `toolchain/`)

Recognize IntelLLVM compiler ID throughout (was Intel)
Add -fiopenmp -fopenmp-targets=spir64 compile/link flags for GPU builds
Add -fp-model=precise to prevent ifx FP reassociation in SPIR-V kernels
Add --intel-aot flag: AOT compilation via ocloc to native PVC ISA, eliminates ~30 min Level Zero JIT delay (test runs: 30 min → 14 sec)
Strip SPIR-V from mkl_dfti_omp_offload.o via clang-offload-bundler to fix zeModuleDynamicLink Level Zero failures
Link libmkl_sycl_dft, libsycl, libOpenCL for oneMKL FFT
Add GT CRNCH RoboGator (crnch) module entry with Intel oneAPI 2025.1
run.py: auto-set LIBOMPTARGET_LEVEL_ZERO_COMMAND_BATCH=256 and SYCL_PI_LEVEL_ZERO_TRACK_INDIRECT_ACCESS_MEMORY=0 (~16% throughput gain)
Post-process pyrometheus m_thermochem.f90 for --gpu mp: replace C-macro GPU_ROUTINE with literal !$omp declare target
test.py: --binary mpirun support to bypass SLURM srun slot limits on CRNCH

GPU macro layer (`src/common/include/`)

omp_macros.fpp: Intel-specific OMP_PARALLEL_LOOP, OMP_ROUTINE, OMP_MKL_DISPATCH branches for SPIR-V codegen
parallel_macros.fpp: GPU_MKL_DISPATCH() macro for oneMKL dispatch
shared_parallel_macros.fpp: add USING_INTEL Fypp variable; extend all #:if not MFC_CASE_OPTIMIZATION and USING_AMD guards to (USING_AMD or USING_INTEL), and bare #:if USING_AMD guards for dimension(sys_size) in CBC modules

Source fixes (Intel SPIR-V constraints)

Assumed-shape arrays in GPU routines: Intel SPIR-V cannot propagate array descriptors in device subroutines — replaced with explicit-shape (num_fluids_max, dim(3), etc.) across 20 files
VLA private arrays in GPU loops: Intel SPIR-V needs fixed stack frame size at compile time — extended USING_AMD VLA guards to USING_INTEL in m_riemann_solvers, m_variables_conversion, m_bubbles_EE, m_weno, m_cbc, m_compute_cbc, and 13 other files
m_fftw.fpp: oneMKL DFTI + !$omp dispatch GPU FFT path for Intel
m_compute_levelset.fpp: split single if-else dispatch to fix multi-callee phi-node issue and ifx inliner ICE

Documentation

docs/documentation/intel-gpu-max.md: full build, run, and troubleshooting guide for Intel GPU Max

Test plan

All 161 1D tests pass on Intel GPU Max 1100 (verified locally on CRNCH dash4)
CI passes on existing gfortran / nvfortran / Cray ftn / ifx CPU targets
No regression on AMD GPU (USING_AMD guards preserved; USING_INTEL is orthogonal)

Add end-to-end support for building and running MFC on Intel Data Center GPU Max (Ponte Vecchio) using ifx 2025.0+ with OpenMP target offload to SPIR-V/SPIR64. Verified on GT CRNCH RoboGator (dash4) with Intel GPU Max 1100. All 161 1D regression tests pass. ## Compiler and build system - Recognize IntelLLVM compiler ID throughout CMakeLists.txt (was Intel) - Add -fiopenmp -fopenmp-targets=spir64 compile/link flags for GPU builds - Add -fp-model=precise to prevent ifx FP reassociation in SPIR-V kernels - Add -fpp to global compile flags for Intel preprocessor compatibility - Link MKL parallel, libmkl_sycl_dft, libsycl, libOpenCL for oneMKL FFT - Strip SPIR-V from mkl_dfti_omp_offload.o via clang-offload-bundler to fix zeModuleDynamicLink Level Zero failures - Add --intel-aot flag: AOT compilation via ocloc to native PVC ISA, eliminates ~30 min Level Zero JIT delay (test runs: 30 min -> 14 sec) - Add IntelLLVM to no-FFTW-from-source list in dependencies/CMakeLists.txt - Fix LAPACK PIE link error with ifx on Ubuntu 22.04 ## GPU kernel fixes - omp_macros.fpp: add Intel-specific OMP_PARALLEL_LOOP, END_OMP_PARALLEL_LOOP, OMP_ROUTINE, OMP_MKL_DISPATCH branches for SPIR-V codegen - parallel_macros.fpp: add GPU_MKL_DISPATCH() macro for oneMKL dispatch - shared_parallel_macros.fpp: add USING_INTEL Fypp variable; extend all #:if not MFC_CASE_OPTIMIZATION and USING_AMD guards to include USING_INTEL and bare #:if USING_AMD guards for dimension(sys_size) in m_cbc/m_compute_cbc - m_fftw.fpp: oneMKL DFTI + ! dispatch GPU FFT path for Intel - m_compute_levelset.fpp: split single if-else dispatch to fix multi-callee phi-node issue and inliner ICE; add -fno-inline workaround - m_riemann_solvers.fpp, m_variables_conversion.fpp, m_bubbles_EE.fpp, m_weno.fpp, m_sim_helpers.fpp, m_pressure_relaxation.fpp, m_boundary_common, m_chemistry.fpp, m_phase_change.fpp, m_bubbles_EL.fpp, m_viscous.fpp, m_ibm.fpp, m_hyperelastic.fpp, m_acoustic_src.fpp, m_surface_tension.fpp, m_data_output.fpp, m_qbmm.fpp, m_compute_cbc.fpp, m_cbc.fpp, m_ib_patches.fpp: explicit array sizes in GPU_ROUTINE arguments (no assumed-shape in SPIR-V) and extend VLA guards to USING_INTEL for non-case-optimized GPU builds - m_helper.fpp: Intel-specific workarounds for SPIR-V codegen ## Toolchain - Add GT CRNCH RoboGator (crnch) module entry with Intel oneAPI 2025.1 - run.py: Intel GPU detection, set LIBOMPTARGET_LEVEL_ZERO_COMMAND_BATCH=256 and SYCL_PI_LEVEL_ZERO_TRACK_INDIRECT_ACCESS_MEMORY=0 for ~16% speedup - run/input.py: post-process pyrometheus m_thermochem.f90 for --gpu mp (replace C-macro GPU_ROUTINE with literal ! declare target) - build.py, state.py: --intel-aot flag and ocloc device selection - test.py: --binary mpirun support to bypass SLURM srun slot limits on CRNCH - bootstrap/modules.sh: crnch module bootstrap - templates/include/helpers.mako: Intel MPI I_MPI_FABRICS=shm hint - modules: crnch entry (Intel oneAPI 2025.1, mpiifx, GPU Max 1100) ## Documentation - docs/documentation/intel-gpu-max.md: full build, run, troubleshoot guide

github-actions · 2026-05-18T14:06:42Z

Claude Code Review

Head SHA: 6b1d0de

Files changed:

39
CMakeLists.txt
src/common/include/omp_macros.fpp
src/common/include/parallel_macros.fpp
src/common/m_mpi_common.fpp
src/simulation/m_fftw.fpp
src/simulation/m_compute_levelset.fpp
src/simulation/m_ib_patches.fpp
src/simulation/m_pressure_relaxation.fpp
toolchain/mfc/run/input.py
toolchain/mfc/run/run.py

Findings:

Banned integer kind literals in `src/simulation/m_fftw.fpp`

In the new Intel GPU path of s_apply_azimuthal_filter, two integer-kind literal forms appear that are banned by fortran-conventions.md ("Bare integer kind like 2_wp → use 2.0_wp"):

(0_dp, 0_dp) — used to zero data_fltr_cmplx_gpu entries (appears in both the y==0 ring and in the fourier_rings loop body):

data_fltr_cmplx_gpu(...) = (0_dp, 0_dp)

0_dp is an integer literal of kind dp (= 8), not a real literal. Should be (0._dp, 0._dp).

2_dp — used in the Nyquist frequency computation inside the fourier_rings loop:

Nfq = min(floor(2_dp*real(i, dp)*pi), cmplx_size)

2_dp is an integer literal of kind dp. Should be 2._dp or plain 2.

Both appear inside the #if defined(MFC_GPU) && defined(__INTEL_LLVM_COMPILER) guard blocks. The source linter (toolchain/mfc/lint_source.py, run by ./mfc.sh precheck) enforces the "no bare integer kind" rule and would flag these.

…n m_qbmm and m_hyperelastic

…bric removed in 2021.x)

…ing bundled ssh

…nd OFI provider requirements

… permission issue

sbryngelson added 6 commits May 18, 2026 10:15

fix: replace integer kind literals with real literals in m_fftw.fpp

8d2c6b1

fix: extend VLA guards from USING_AMD to (USING_AMD or USING_INTEL) i…

e5728fb

…n m_qbmm and m_hyperelastic

fix: use shm:ofi + FI_PROVIDER=tcp for Intel MPI on crnch-gpu (tcp fa…

0534d69

…bric removed in 2021.x)

docs: document Intel MPI multi-node SSH bootstrap workaround for miss…

8863022

…ing bundled ssh

fix: add FI_PROVIDER_PATH to crnch-gpu modules; document SLURM GRES a…

faa9bbb

…nd OFI provider requirements

docs: document inter-node MPI fix (FI_TCP_IFACE) and dash3 renderD128…

c296e30

… permission issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Intel GPU Max (Ponte Vecchio) OpenMP target offload support#1445

feat: Intel GPU Max (Ponte Vecchio) OpenMP target offload support#1445
sbryngelson wants to merge 7 commits into
masterfrom
intel-gpu

sbryngelson commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

sbryngelson commented May 18, 2026

Summary

Usage

Changes

Build system (CMakeLists.txt, toolchain/)

GPU macro layer (src/common/include/)

Source fixes (Intel SPIR-V constraints)

Documentation

Test plan

Uh oh!

github-actions Bot commented May 18, 2026

Claude Code Review

Banned integer kind literals in src/simulation/m_fftw.fpp

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Build system (`CMakeLists.txt`, `toolchain/`)

GPU macro layer (`src/common/include/`)

Banned integer kind literals in `src/simulation/m_fftw.fpp`