feat: Intel GPU Max (Ponte Vecchio) OpenMP target offload support#1445
feat: Intel GPU Max (Ponte Vecchio) OpenMP target offload support#1445sbryngelson wants to merge 7 commits into
Conversation
Add end-to-end support for building and running MFC on Intel Data Center GPU Max (Ponte Vecchio) using ifx 2025.0+ with OpenMP target offload to SPIR-V/SPIR64. Verified on GT CRNCH RoboGator (dash4) with Intel GPU Max 1100. All 161 1D regression tests pass. ## Compiler and build system - Recognize IntelLLVM compiler ID throughout CMakeLists.txt (was Intel) - Add -fiopenmp -fopenmp-targets=spir64 compile/link flags for GPU builds - Add -fp-model=precise to prevent ifx FP reassociation in SPIR-V kernels - Add -fpp to global compile flags for Intel preprocessor compatibility - Link MKL parallel, libmkl_sycl_dft, libsycl, libOpenCL for oneMKL FFT - Strip SPIR-V from mkl_dfti_omp_offload.o via clang-offload-bundler to fix zeModuleDynamicLink Level Zero failures - Add --intel-aot flag: AOT compilation via ocloc to native PVC ISA, eliminates ~30 min Level Zero JIT delay (test runs: 30 min -> 14 sec) - Add IntelLLVM to no-FFTW-from-source list in dependencies/CMakeLists.txt - Fix LAPACK PIE link error with ifx on Ubuntu 22.04 ## GPU kernel fixes - omp_macros.fpp: add Intel-specific OMP_PARALLEL_LOOP, END_OMP_PARALLEL_LOOP, OMP_ROUTINE, OMP_MKL_DISPATCH branches for SPIR-V codegen - parallel_macros.fpp: add GPU_MKL_DISPATCH() macro for oneMKL dispatch - shared_parallel_macros.fpp: add USING_INTEL Fypp variable; extend all #:if not MFC_CASE_OPTIMIZATION and USING_AMD guards to include USING_INTEL and bare #:if USING_AMD guards for dimension(sys_size) in m_cbc/m_compute_cbc - m_fftw.fpp: oneMKL DFTI + ! dispatch GPU FFT path for Intel - m_compute_levelset.fpp: split single if-else dispatch to fix multi-callee phi-node issue and inliner ICE; add -fno-inline workaround - m_riemann_solvers.fpp, m_variables_conversion.fpp, m_bubbles_EE.fpp, m_weno.fpp, m_sim_helpers.fpp, m_pressure_relaxation.fpp, m_boundary_common, m_chemistry.fpp, m_phase_change.fpp, m_bubbles_EL.fpp, m_viscous.fpp, m_ibm.fpp, m_hyperelastic.fpp, m_acoustic_src.fpp, m_surface_tension.fpp, m_data_output.fpp, m_qbmm.fpp, m_compute_cbc.fpp, m_cbc.fpp, m_ib_patches.fpp: explicit array sizes in GPU_ROUTINE arguments (no assumed-shape in SPIR-V) and extend VLA guards to USING_INTEL for non-case-optimized GPU builds - m_helper.fpp: Intel-specific workarounds for SPIR-V codegen ## Toolchain - Add GT CRNCH RoboGator (crnch) module entry with Intel oneAPI 2025.1 - run.py: Intel GPU detection, set LIBOMPTARGET_LEVEL_ZERO_COMMAND_BATCH=256 and SYCL_PI_LEVEL_ZERO_TRACK_INDIRECT_ACCESS_MEMORY=0 for ~16% speedup - run/input.py: post-process pyrometheus m_thermochem.f90 for --gpu mp (replace C-macro GPU_ROUTINE with literal ! declare target) - build.py, state.py: --intel-aot flag and ocloc device selection - test.py: --binary mpirun support to bypass SLURM srun slot limits on CRNCH - bootstrap/modules.sh: crnch module bootstrap - templates/include/helpers.mako: Intel MPI I_MPI_FABRICS=shm hint - modules: crnch entry (Intel oneAPI 2025.1, mpiifx, GPU Max 1100) ## Documentation - docs/documentation/intel-gpu-max.md: full build, run, troubleshoot guide
Claude Code ReviewHead SHA: 6b1d0de Files changed:
Findings: Banned integer kind literals in
|
…n m_qbmm and m_hyperelastic
…bric removed in 2021.x)
…nd OFI provider requirements
… permission issue
Summary
Adds end-to-end support for building and running MFC on Intel Data Center GPU Max 1100 (Ponte Vecchio) using
ifx 2025.0+with OpenMP target offload to SPIR-V/SPIR64. Verified on GT CRNCH RoboGator (dash4). All 161 1D regression tests pass on the Intel GPU.Usage
Changes
Build system (
CMakeLists.txt,toolchain/)IntelLLVMcompiler ID throughout (wasIntel)-fiopenmp -fopenmp-targets=spir64compile/link flags for GPU builds-fp-model=preciseto prevent ifx FP reassociation in SPIR-V kernels--intel-aotflag: AOT compilation viaoclocto native PVC ISA, eliminates ~30 min Level Zero JIT delay (test runs: 30 min → 14 sec)mkl_dfti_omp_offload.oviaclang-offload-bundlerto fixzeModuleDynamicLinkLevel Zero failureslibmkl_sycl_dft,libsycl,libOpenCLfor oneMKL FFTcrnch) module entry with Intel oneAPI 2025.1run.py: auto-setLIBOMPTARGET_LEVEL_ZERO_COMMAND_BATCH=256andSYCL_PI_LEVEL_ZERO_TRACK_INDIRECT_ACCESS_MEMORY=0(~16% throughput gain)m_thermochem.f90for--gpu mp: replace C-macroGPU_ROUTINEwith literal!$omp declare targettest.py:--binary mpirunsupport to bypass SLURMsrunslot limits on CRNCHGPU macro layer (
src/common/include/)omp_macros.fpp: Intel-specificOMP_PARALLEL_LOOP,OMP_ROUTINE,OMP_MKL_DISPATCHbranches for SPIR-V codegenparallel_macros.fpp:GPU_MKL_DISPATCH()macro for oneMKL dispatchshared_parallel_macros.fpp: addUSING_INTELFypp variable; extend all#:if not MFC_CASE_OPTIMIZATION and USING_AMDguards to(USING_AMD or USING_INTEL), and bare#:if USING_AMDguards fordimension(sys_size)in CBC modulesSource fixes (Intel SPIR-V constraints)
num_fluids_max,dim(3), etc.) across 20 filesUSING_AMDVLA guards toUSING_INTELinm_riemann_solvers,m_variables_conversion,m_bubbles_EE,m_weno,m_cbc,m_compute_cbc, and 13 other filesm_fftw.fpp: oneMKL DFTI +!$omp dispatchGPU FFT path for Intelm_compute_levelset.fpp: split single if-else dispatch to fix multi-callee phi-node issue andifxinliner ICEDocumentation
docs/documentation/intel-gpu-max.md: full build, run, and troubleshooting guide for Intel GPU MaxTest plan
dash4)