Reduce the memory usage that is important for ne1024 simulation by sjsprecious · Pull Request #4102 · ESCOMP/CTSM

sjsprecious · 2026-06-25T21:24:45Z

This PR introduces some changes in CDEPS that will be used in CTSM later and are critical to reduce memory usage of a simulation at ne1024 resolution. All the changes are done by Claude under my supervisory.

This PR requires a new tag from CDEPS once my PR (ESCOMP/CDEPS#414) is merged.

The goal is to cut CTSM initialization memory (and some init time) at high resolution (like ne1024), where per-rank data replication and duplicate ESMF mesh construction dominate startup cost.

The detailed edits:

New per-node shared-memory helper: clm_shmem_mod.F90

A MPI-3 shared-memory module and specialized for CTSM's decomposition setup. The idea is that arrays that are otherwise allocated identically on every MPI rank instead get one physical copy per shared-memory node, mapped into every rank on that node — freeing ranks_per_node − 1 copies per node.

clm_shmem_alloc_i4_1d(ptr, win, n) — allocate a node-shared default-integer rank-1 array (only the node leader requests storage via MPI_Win_allocate_shared; peers map the leader's segment via MPI_Win_shared_query).
clm_shmem_leader_allreduce_sum_i4(ptr, win, n) — fence → node leaders sum partials across nodes over a leader-only communicator → fence to publish. Builds a globally-summed array in the shared buffer without every rank holding a global-sized copy.
clm_shmem_free / clm_shmem_fence / clm_shmem_is_leader / clm_shmem_leader_comm / clm_shmem_npes_per_node — lifecycle and query helpers; lazily build node-local and node-leader communicators via mpi_comm_split_type(MPI_COMM_TYPE_SHARED).

lnd_set_decomp_and_domain.F90 — apply the shmem helper to the global land mask

The global land mask lndmask_glob(gsize) was previously allocated on every rank and built with an all-rank ESMF_VMAllReduce into a second global-sized temporary (itemp_glob). Now, in both code paths (lnd_set_lndmask_from_maskmesh and lnd_set_lndmask_from_lndmesh):

lndmask_glob is allocated once per node via clm_shmem_alloc_i4_1d, with a new lndmask_win window handle threaded through both subroutine signatures.
Leader zeroes it, fence, each rank fills its disjoint local indices, then clm_shmem_leader_allreduce_sum_i4 replaces the ESMF_VMAllReduce + itemp_glob temporary (the temporary is deleted entirely).
Cleanup is now branch-aware: the cmeps driver paths free via clm_shmem_free(lndmask_glob, lndmask_win); the lilac path still uses plain deallocate (it uses a plain allocate).

This removes two global-sized integer arrays per rank (the mask copy + the all-reduce temp), replaced by one node-shared copy.

NetCDF file-handle close fixes

Closing pio file handles that were opened but closed late or never — frees buffers earlier in init:

clm_instMod.F90: moves ncd_pio_closefile(params_ncid) earlier — to right after its last use (bgc_vegetation_inst%Init) instead of at the end of init_accflds.
initVerticalMod.F90: moves ncd_pio_closefile(ncid) to right after the last read (STD_ELEV) instead of the end of initVertical.
UrbanParamsType.F90: adds a missing ncd_pio_closefile(ncid) on the early-return path (nlevurb == 0) that previously leaked the handle.
organicFileMod.F90: adds ncd_pio_closefile(ncid) after reading ORGANIC.
surfrdMod.F90: adds two ncd_pio_closefile(ncid) calls after dimension reads complete (after the pft/cft dims, and after nlevurb).

reuse already-built CLM mesh

PrigentRoughnessStreamType.F90 and UrbanTimeVarType.F90 wrap its single shr_strdata_init_from_inline call in an if (mapalgo == 'redist') branch:

redist branch (stream is already on the model grid, as for the ne1024 Prigent-roughness and urban-time-varying files): passes the new argument stream_mesh_in = mesh, handing CDEPS the already-built CLM model mesh so it does not read the stream mesh file and construct a duplicate full ESMF mesh — the duplicate is a large init memory/time cost at ne1024.
else branch: the original call unchanged (CDEPS builds the stream mesh as before).

samsrabin · 2026-06-26T15:52:32Z

Thanks for this, @sjsprecious! A couple of questions:

Do you have a date you need this in by?
Do you expect this to give bit-for-bit identical results to the previous version?

@ekluzek, I'm assigning you for now given your recent work on our task decomposition, but I'm also adding Next so we can discuss in our SE meeting.

sjsprecious · 2026-06-26T17:25:29Z

Thanks @samsrabin for your quick reply. To answer your questions:

Do you have a date you need this in by?

We are waiting for a new tag for these CTSM changes so that our collaborators can start their scientific runs soon. Thus I would say no hard date, but the sooner, the better.

Do you expect this to give bit-for-bit identical results to the previous version?

Yes, these changes should not change the answers for CTSM. I am happy to do some tests on Derecho if you can share the detailed instructions.

Let me know if you or Erik has any comments/suggestions about these code changes.

sjsprecious added 3 commits June 11, 2026 14:03

reduce memory usage in CTSM init

358d631

reuse already-built CLM mesh

134337d

temporary hold for a new cdeps tag

d3078f4

samsrabin added blocked: dependency Wait to work on this until dependency is resolved next this should get some attention in the next week or two. Normally each Thursday SE meeting. performance idea or PR to improve performance (e.g. throughput, memory) labels Jun 26, 2026

samsrabin requested review from ekluzek and removed request for ekluzek June 26, 2026 15:52

samsrabin assigned ekluzek Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce the memory usage that is important for ne1024 simulation#4102

Reduce the memory usage that is important for ne1024 simulation#4102
sjsprecious wants to merge 3 commits into
ESCOMP:masterfrom
sjsprecious:reduce_init_memory

sjsprecious commented Jun 25, 2026

Uh oh!

samsrabin commented Jun 26, 2026

Uh oh!

sjsprecious commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

sjsprecious commented Jun 25, 2026

Uh oh!

samsrabin commented Jun 26, 2026

Uh oh!

sjsprecious commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants