Skip to content

Reduce the memory usage that is important for ne1024 simulation#665

Open
sjsprecious wants to merge 4 commits into
ESCOMP:mainfrom
sjsprecious:add_lnd2rof_map_files
Open

Reduce the memory usage that is important for ne1024 simulation#665
sjsprecious wants to merge 4 commits into
ESCOMP:mainfrom
sjsprecious:add_lnd2rof_map_files

Conversation

@sjsprecious

Copy link
Copy Markdown
Contributor

This PR reduces the memory usage that is critical for ultra-high resolution simulation such as ne1024. All the code changes are done by Claude under my supervisory.

The goal is to let the lnd→rof conservative coupling maps be read from offline weight files instead of being computed online, which OOM-kills the job at ne1024 during DataInitialize (ESMF_FieldRegridStore → GeomRendezvous → Zoltan_RCB). This is the implementation of the lnd2rof_consf OOM fix. There are two distinct lnd→rof maps handled, plus an aoflux memory optimization.

  1. Flux/runoff coupling map — bug fix

In main, the lnd2rof_map attribute (driven by the pre-existing LND2ROF_FMAPNAME XML var) was already read into the lnd2rof_map variable — but the five addmap_from calls for the runoff fields (Flrl_rofsur, Flrl_rofi, Flrl_rofgwl, Flrl_rofsub, Flrl_irrig) hardcoded unset, so the file was silently ignored and the map was always built online.

  1. Fraction-init map — new feature

This is a separate map (destarea, not fracarea) used during med_fraction_init, with no pre-existing namelist hook. The fix adds full new plumbing:

  • New XML entry LND2ROF_FRAC_FMAPNAME (default unset, env_run.xml, group run_domain) in config_component.xml.
  • New driver namelist lnd2rof_fmap (modify_via_xml="LND2ROF_FRAC_FMAPNAME") in namelist_definition_drv.xml.
  • med_fraction_mod.F90: reads the lnd2rof_fmap attribute via NUOPC_CompAttributeGet; if it is present, set, and the file exists on disk (inquire), it calls med_map_routehandles_init with mapfile= to read offline weights; otherwise it falls back to the original online path — no behavior change when unset. Adds a use NUOPC import and local vars (isPresent/isSet/lexist, lnd2rof_fmap).
  • med_map_mod.F90: threads a new optional, intent(in) :: mapfile argument through med_map_routehandles_initfrom_fieldbundle, forwarding it to the field-level routine (which already accepted mapfile) when present. Backward-compatible.
  1. aoflux mesh memory optimization

Replaces is_local%wrap%aoflux_mesh = ESMF_MeshCreate(lmesh, rc=rc) with is_local%wrap%aoflux_mesh = lmesh — reuses the ocean field's existing mesh handle instead of allocating a duplicate full ESMF mesh per rank at ne1024pg2. Safe because lmesh persists in FBArea(compocn) and aoflux_mesh is never destroyed.

@billsacks billsacks requested review from billsacks and mvertens June 26, 2026 00:05
@billsacks

Copy link
Copy Markdown
Member

@mvertens - If you have a chance, I'd like to hear any thoughts you have on this change. (I have just skimmed it so far - haven't had a chance to look closely yet.)

@billsacks billsacks left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall changes here look good to me. I appreciate your plugging in the use of lnd2rof_map, which seems like it previously wasn't hooked up. And I think I see why you needed to introduce a separate lnd2rof_fmap (though see my comments asking for this to be made more explicit in some documentation).

I do have a few requests... many about editing some comments, but a couple slightly more substantial... but still overall minor - for the most part this looks good to me.

Once you make the final changes, I'd like to see that at least one or two tests have been run with baseline comparisons to verify that these changes work and are bit-for-bit in out-of-the-box configurations. One test that I think would cover all of these changes would be a B compset test with the aoflux grid set to ogrid (the setting of ogrid is needed to cover the changes in med_phases_aofluxes_mod, and should still cover the other changes here). I'd like to see that run with comparisons against a baseline.

Comment thread mediator/med_phases_aofluxes_mod.F90 Outdated
Comment on lines +585 to +587
! Reuse the ocean field's mesh directly rather than building a duplicate full
! ESMF mesh (ne1024pg2) per rank. lmesh persists in FBArea(compocn) and
! aoflux_mesh is never destroyed, so sharing the handle is safe (ogrid path).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems reasonable, and I appreciate the comment justifying this. Please remove the specific reference to ne1024pg2 from the comment.

That said, I don't understand this well enough to be totally confident that this is okay... I'll be more comfortable if you run a B compset test with the aoflux grid set to ogrid and show that it passes and is bit-for-bit with a baseline.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @billsacks . I have updated the comments as you suggested.

Can you share the detailed instructions to run a B compset test with the aoflux grid set to ogrid? I am happy to do the tests later and let you know if these changes are BFB.

Comment thread mediator/med_fraction_mod.F90 Outdated
integer :: n,n1,ns
integer :: maptype
integer :: fieldCount
logical :: isPresent, isSet, lexist ! lnd2rof_fmap attribute presence / weight-file check

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These logicals might be used for other variables in the future, so it's probably best to remove the comment suggesting that they're just used for this one variable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed in the latest commit.

Comment thread mediator/med_fraction_mod.F90 Outdated
integer :: maptype
integer :: fieldCount
logical :: isPresent, isSet, lexist ! lnd2rof_fmap attribute presence / weight-file check
character(len=CX) :: lnd2rof_fmap ! consd (destarea) lnd->rof fraction-map file ($LND2ROF_FRAC_FMAPNAME)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think comments in the Fortran should refer to xml variables (LND2ROF_FRAC_FMAPNAME), since not all systems use these xml variables.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed in the latest commit.

call NUOPC_CompAttributeGet(gcomp, name='lnd2rof_fmap', value=lnd2rof_fmap, &
isPresent=isPresent, isSet=isSet, rc=rc)
if (ChkErr(rc,__LINE__,u_FILE_u)) return
if (.not. (isPresent .and. isSet)) lnd2rof_fmap = 'unset'

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this line is redundant with the initialization to 'unset' a few lines above, and that just one or the other could be kept, with the other removed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I have removed lnd2rof_fmap = 'unset' a few lines above.

Comment thread mediator/med_fraction_mod.F90 Outdated
if (.not. (isPresent .and. isSet)) lnd2rof_fmap = 'unset'
if (trim(lnd2rof_fmap) /= 'unset') then
inquire(file=trim(lnd2rof_fmap), exist=lexist)
if (.not. lexist) lnd2rof_fmap = 'unset'

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should abort with an error message rather than silently setting this to 'unset'. Then this block could be merged with the following block so it would look like the following (noting also, as a side note, that trim is unnecessary in string comparisons... I know we do that in a lot of places, but it's unnecessary - e.g., see https://community.intel.com/t5/Intel-Fortran-Compiler/String-Comparisons/td-p/798323):

if (lnd2rof_fmap /= 'unset') then
   inquire(...)
   if (.not. lexist) then
      ! ERROR here with a meaningful message
   end if
   call med_map_routehandles_init(...)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I have updated it based on your suggestions.

<type>char</type>
<category>mapping</category>
<group>MED_attributes</group>
<desc>lnd to rof conservative destarea ('consd') fraction mapping file read by the mediator

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, it looks like the difference between this and LND2ROF_FMAPNAME is that this one uses consd mapping and the other uses consf mapping. Is that right? If so, I think it would be useful to extend the comment in LND2ROF_FMAPNAME pointing out that that one uses consf mapping.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I updated it in the latest commit.

Comment thread cime_config/config_component.xml Outdated
<default_value>unset</default_value>
<group>run_domain</group>
<file>env_run.xml</file>
<desc>lnd2rof conservative (destarea, 'consd') fraction mapping file used by the mediator

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comment in namelist_definition_drv: it looks like the difference between this one and LND2ROF_FMAPNAME is whether it's using consd or consf mapping. Is that right? If so, please add a comment to LND2ROF_FMAPNAME also pointing out that that one should be consf mapping.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I updated it in the latest commit.

Comment thread cime_config/config_component.xml Outdated
<desc>rof2lnd flux mapping file</desc>
</entry>

<entry id="LND2ROF_FRAC_FMAPNAME">

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new entry would be better positioned right after LND2ROF_FMAPNAME (similar to its positioning in namelist_definition_drv).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed in the latest commit.

Comment thread mediator/med_fraction_mod.F90 Outdated
Comment on lines +590 to +595
! At ne1024pg2 resolution: the lnd->rof fraction map is the only grid-crossing
! conservative coupling map; building it online (ESMF_FieldRegridStore ->
! GeomRendezvous -> Zoltan_RCB) OOM-kills the job during DataInitialize.
! Read offline consd (conserve/destarea) fraction weights from the file named by the
! 'lnd2rof_fmap' attribute (xmlchange LND2ROF_FRAC_FMAPNAME=<file>) when it is set
! and present; otherwise fall back to the online path (no behavior change).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is very specific to your case. This could be appropriate for an issue / PR comment, but the code comment should be more general. Please rewrite this comment to be more general. Also, it seems best to avoid referencing XML variables in the Fortran, since xml variables aren't used by all modeling systems.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed in the latest commit.

@sjsprecious sjsprecious requested a review from billsacks June 26, 2026 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants