Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arbitrary Lagrangian-Eulerian (ALE) code refactorization #388

Merged
merged 9 commits into from
Sep 24, 2024

Conversation

matsbn
Copy link
Contributor

@matsbn matsbn commented Aug 29, 2024

Major refactoring of the ALE method implementation that involves:

  • Separate modules for vertical coordinate data/routines and ALE regridding/remapping.
  • Added support for an additional vertical coordinate, vcoord = 'plevel', that uses the ALE method.
  • Reference potential densities and pressure levels can optionally be provided via namelist for vcoord = 'cntiso_hybrid' and vcoord = 'plevel'.
  • The ALE regridding and remapping has been overhauled to allow more choices to be controlled by namelist variables and to make the code more extensible.

The code is tested in NorESM2.0.6 and gives bit-identical results in N2NOIIAJRAOC20TR compset with TL319_tn14 grids for both vcoord = 'isopyc_bulkml' and vcoord = 'cntiso_hybrid', using default namelist settings.

- Separate modules for vertical coordinate data/routines and ALE regridding/remapping.
- Added support for an additional vertical coordinate, vcoord = 'plevel', that uses the ALE method.
- Reference potential densities and pressure levels can optionally be provided via namelist for vcoord = 'cntiso_hybrid' and vcoord = 'plevel'.
- The ALE regridding and remapping has been overhauled to allow more choices to controoled by namelist variables and to make the code more extensible.
@TomasTorsvik
Copy link
Contributor

@matsbn , @mvertens
I ran a test with noresm2_5_alpha04, testing against baseline noresm2_5_alpha04_dev1.6.1.5.
Most of the failed tests are expected. There are some BASELINE tests that report that baseline files are missing.
ERS_Ld3.TL319_tn05.NOIIAJRA.betzy_intel.blom-hybrid failes at the RUN stage.

20240830_090153_2llov1: 7 tests
  ERP_Ld3_P256.T62_tn14.NOINYOC.betzy_intel (Overall: FAIL) details:
    PASS ERP_Ld3_P256.T62_tn14.NOINYOC.betzy_intel CREATE_NEWCASE
    PASS ERP_Ld3_P256.T62_tn14.NOINYOC.betzy_intel XML
    PASS ERP_Ld3_P256.T62_tn14.NOINYOC.betzy_intel SETUP
    PASS ERP_Ld3_P256.T62_tn14.NOINYOC.betzy_intel SHAREDLIB_BUILD time=204
    PASS ERP_Ld3_P256.T62_tn14.NOINYOC.betzy_intel NLCOMP
    PASS ERP_Ld3_P256.T62_tn14.NOINYOC.betzy_intel MODEL_BUILD time=312
    PASS ERP_Ld3_P256.T62_tn14.NOINYOC.betzy_intel SUBMIT
    PASS ERP_Ld3_P256.T62_tn14.NOINYOC.betzy_intel RUN time=322
    FAIL ERP_Ld3_P256.T62_tn14.NOINYOC.betzy_intel COMPARE_base_rest
    FAIL ERP_Ld3_P256.T62_tn14.NOINYOC.betzy_intel BASELINE noresm2_5_alpha04_dev1.6.1.5: ERROR BFAIL some baseline files were missing
    FAIL ERP_Ld3_P256.T62_tn14.NOINYOC.betzy_intel MEMCOMP [Errno 2] No such file or directory: '/cluster/shared/noresm/noresm_baselines/blom_develop/noresm2_5_alpha04_dev1.6.1.5/ERP_Ld3_P256.T62_tn14.NOINYOC.betzy_intel/cpl-mem.log
'
    FAIL ERP_Ld3_P256.T62_tn14.NOINYOC.betzy_intel TPUTCOMP Error: TPUTCOMP: Throughput changed by 26.89%: baseline=9.303 sypd, tolerance=25%, current=6.801 sypd
    PASS ERP_Ld3_P256.T62_tn14.NOINYOC.betzy_intel MEMLEAK insufficient data for memleak test
    PASS ERP_Ld3_P256.T62_tn14.NOINYOC.betzy_intel SHORT_TERM_ARCHIVER
  ERR_Ld3.TL319_tn14.NOIIAJRAOC.betzy_intel.blom-hybrid (Overall: DIFF) details:
    PASS ERR_Ld3.TL319_tn14.NOIIAJRAOC.betzy_intel.blom-hybrid CREATE_NEWCASE
    PASS ERR_Ld3.TL319_tn14.NOIIAJRAOC.betzy_intel.blom-hybrid XML
    PASS ERR_Ld3.TL319_tn14.NOIIAJRAOC.betzy_intel.blom-hybrid SETUP
    PASS ERR_Ld3.TL319_tn14.NOIIAJRAOC.betzy_intel.blom-hybrid SHAREDLIB_BUILD time=10
    FAIL ERR_Ld3.TL319_tn14.NOIIAJRAOC.betzy_intel.blom-hybrid NLCOMP
    PASS ERR_Ld3.TL319_tn14.NOIIAJRAOC.betzy_intel.blom-hybrid MODEL_BUILD time=143
    PASS ERR_Ld3.TL319_tn14.NOIIAJRAOC.betzy_intel.blom-hybrid SUBMIT
    PASS ERR_Ld3.TL319_tn14.NOIIAJRAOC.betzy_intel.blom-hybrid RUN time=344
    PASS ERR_Ld3.TL319_tn14.NOIIAJRAOC.betzy_intel.blom-hybrid COMPARE_base_rest
    FAIL ERR_Ld3.TL319_tn14.NOIIAJRAOC.betzy_intel.blom-hybrid BASELINE noresm2_5_alpha04_dev1.6.1.5: ERROR BFAIL some baseline files were missing
    FAIL ERR_Ld3.TL319_tn14.NOIIAJRAOC.betzy_intel.blom-hybrid MEMCOMP [Errno 2] No such file or directory: '/cluster/shared/noresm/noresm_baselines/blom_develop/noresm2_5_alpha04_dev1.6.1.5/ERR_Ld3.TL319_tn14.NOIIAJRAOC.betzy_intel
.blom-hybrid/cpl-mem.log'
    FAIL ERR_Ld3.TL319_tn14.NOIIAJRAOC.betzy_intel.blom-hybrid TPUTCOMP [Errno 2] No such file or directory: '/cluster/shared/noresm/noresm_baselines/blom_develop/noresm2_5_alpha04_dev1.6.1.5/ERR_Ld3.TL319_tn14.NOIIAJRAOC.betzy_inte
l.blom-hybrid/cpl-tput.log'
    PASS ERR_Ld3.TL319_tn14.NOIIAJRAOC.betzy_intel.blom-hybrid MEMLEAK
    PASS ERR_Ld3.TL319_tn14.NOIIAJRAOC.betzy_intel.blom-hybrid SHORT_TERM_ARCHIVER
  ERS_Ld3.T62_tn14.NOINYOC.betzy_intel (Overall: DIFF) details:
    PASS ERS_Ld3.T62_tn14.NOINYOC.betzy_intel CREATE_NEWCASE
    PASS ERS_Ld3.T62_tn14.NOINYOC.betzy_intel XML
    PASS ERS_Ld3.T62_tn14.NOINYOC.betzy_intel SETUP
    PASS ERS_Ld3.T62_tn14.NOINYOC.betzy_intel SHAREDLIB_BUILD time=176
    PASS ERS_Ld3.T62_tn14.NOINYOC.betzy_intel NLCOMP
    PASS ERS_Ld3.T62_tn14.NOINYOC.betzy_intel MODEL_BUILD time=147
    PASS ERS_Ld3.T62_tn14.NOINYOC.betzy_intel SUBMIT
    PASS ERS_Ld3.T62_tn14.NOINYOC.betzy_intel RUN time=284
    PASS ERS_Ld3.T62_tn14.NOINYOC.betzy_intel COMPARE_base_rest
    FAIL ERS_Ld3.T62_tn14.NOINYOC.betzy_intel BASELINE noresm2_5_alpha04_dev1.6.1.5: ERROR BFAIL some baseline files were missing
    FAIL ERS_Ld3.T62_tn14.NOINYOC.betzy_intel MEMCOMP [Errno 2] No such file or directory: '/cluster/shared/noresm/noresm_baselines/blom_develop/noresm2_5_alpha04_dev1.6.1.5/ERS_Ld3.T62_tn14.NOINYOC.betzy_intel/cpl-mem.log'
    PASS ERS_Ld3.T62_tn14.NOINYOC.betzy_intel TPUTCOMP
    PASS ERS_Ld3.T62_tn14.NOINYOC.betzy_intel MEMLEAK insufficient data for memleak test
    PASS ERS_Ld3.T62_tn14.NOINYOC.betzy_intel SHORT_TERM_ARCHIVER
  ERS_Ld3.T62_tn14_wtn14nw.NOINY_WW3.betzy_intel.blom-wavice (Overall: PASS) details:
    PASS ERS_Ld3.T62_tn14_wtn14nw.NOINY_WW3.betzy_intel.blom-wavice CREATE_NEWCASE
    PASS ERS_Ld3.T62_tn14_wtn14nw.NOINY_WW3.betzy_intel.blom-wavice XML
    PASS ERS_Ld3.T62_tn14_wtn14nw.NOINY_WW3.betzy_intel.blom-wavice SETUP
    PASS ERS_Ld3.T62_tn14_wtn14nw.NOINY_WW3.betzy_intel.blom-wavice SHAREDLIB_BUILD time=28
    PASS ERS_Ld3.T62_tn14_wtn14nw.NOINY_WW3.betzy_intel.blom-wavice NLCOMP
    PASS ERS_Ld3.T62_tn14_wtn14nw.NOINY_WW3.betzy_intel.blom-wavice MODEL_BUILD time=285
    PASS ERS_Ld3.T62_tn14_wtn14nw.NOINY_WW3.betzy_intel.blom-wavice SUBMIT
    PASS ERS_Ld3.T62_tn14_wtn14nw.NOINY_WW3.betzy_intel.blom-wavice RUN time=411
    PASS ERS_Ld3.T62_tn14_wtn14nw.NOINY_WW3.betzy_intel.blom-wavice COMPARE_base_rest
    PASS ERS_Ld3.T62_tn14_wtn14nw.NOINY_WW3.betzy_intel.blom-wavice BASELINE noresm2_5_alpha04_dev1.6.1.5:
    FAIL ERS_Ld3.T62_tn14_wtn14nw.NOINY_WW3.betzy_intel.blom-wavice MEMCOMP [Errno 2] No such file or directory: '/cluster/shared/noresm/noresm_baselines/blom_develop/noresm2_5_alpha04_dev1.6.1.5/ERS_Ld3.T62_tn14_wtn14nw.NOINY_WW3.b
etzy_intel.blom-wavice/cpl-mem.log'
    PASS ERS_Ld3.T62_tn14_wtn14nw.NOINY_WW3.betzy_intel.blom-wavice TPUTCOMP
    PASS ERS_Ld3.T62_tn14_wtn14nw.NOINY_WW3.betzy_intel.blom-wavice MEMLEAK insufficient data for memleak test
    PASS ERS_Ld3.T62_tn14_wtn14nw.NOINY_WW3.betzy_intel.blom-wavice SHORT_TERM_ARCHIVER
  ERS_Ld3.T62_tn21.NOINY.betzy_intel (Overall: PASS) details:
    PASS ERS_Ld3.T62_tn21.NOINY.betzy_intel CREATE_NEWCASE
    PASS ERS_Ld3.T62_tn21.NOINY.betzy_intel XML
    PASS ERS_Ld3.T62_tn21.NOINY.betzy_intel SETUP
    PASS ERS_Ld3.T62_tn21.NOINY.betzy_intel SHAREDLIB_BUILD time=15
    PASS ERS_Ld3.T62_tn21.NOINY.betzy_intel NLCOMP
    PASS ERS_Ld3.T62_tn21.NOINY.betzy_intel MODEL_BUILD time=130
    PASS ERS_Ld3.T62_tn21.NOINY.betzy_intel SUBMIT
    PASS ERS_Ld3.T62_tn21.NOINY.betzy_intel RUN time=101
    PASS ERS_Ld3.T62_tn21.NOINY.betzy_intel COMPARE_base_rest
    PASS ERS_Ld3.T62_tn21.NOINY.betzy_intel BASELINE noresm2_5_alpha04_dev1.6.1.5:
    FAIL ERS_Ld3.T62_tn21.NOINY.betzy_intel MEMCOMP [Errno 2] No such file or directory: '/cluster/shared/noresm/noresm_baselines/blom_develop/noresm2_5_alpha04_dev1.6.1.5/ERS_Ld3.T62_tn21.NOINY.betzy_intel/cpl-mem.log'
    PASS ERS_Ld3.T62_tn21.NOINY.betzy_intel TPUTCOMP
    PASS ERS_Ld3.T62_tn21.NOINY.betzy_intel MEMLEAK insufficient data for memleak test
    PASS ERS_Ld3.T62_tn21.NOINY.betzy_intel SHORT_TERM_ARCHIVER
  ERS_Ld3.TL319_tn05.NOIIAJRA.betzy_intel.blom-hybrid (Overall: FAIL) details:
    PASS ERS_Ld3.TL319_tn05.NOIIAJRA.betzy_intel.blom-hybrid CREATE_NEWCASE
    PASS ERS_Ld3.TL319_tn05.NOIIAJRA.betzy_intel.blom-hybrid XML
    PASS ERS_Ld3.TL319_tn05.NOIIAJRA.betzy_intel.blom-hybrid SETUP
    PASS ERS_Ld3.TL319_tn05.NOIIAJRA.betzy_intel.blom-hybrid SHAREDLIB_BUILD time=8
    FAIL ERS_Ld3.TL319_tn05.NOIIAJRA.betzy_intel.blom-hybrid NLCOMP
    PASS ERS_Ld3.TL319_tn05.NOIIAJRA.betzy_intel.blom-hybrid MODEL_BUILD time=133
    PASS ERS_Ld3.TL319_tn05.NOIIAJRA.betzy_intel.blom-hybrid SUBMIT
    FAIL ERS_Ld3.TL319_tn05.NOIIAJRA.betzy_intel.blom-hybrid RUN time=37
  SMS_D_Ld1.T62_tn14.NOINYOC.betzy_intel.blom-hybrid (Overall: DIFF) details:
    PASS SMS_D_Ld1.T62_tn14.NOINYOC.betzy_intel.blom-hybrid CREATE_NEWCASE
    PASS SMS_D_Ld1.T62_tn14.NOINYOC.betzy_intel.blom-hybrid XML
    PASS SMS_D_Ld1.T62_tn14.NOINYOC.betzy_intel.blom-hybrid SETUP
    PASS SMS_D_Ld1.T62_tn14.NOINYOC.betzy_intel.blom-hybrid SHAREDLIB_BUILD time=118
    FAIL SMS_D_Ld1.T62_tn14.NOINYOC.betzy_intel.blom-hybrid NLCOMP
    PASS SMS_D_Ld1.T62_tn14.NOINYOC.betzy_intel.blom-hybrid MODEL_BUILD time=89
    PASS SMS_D_Ld1.T62_tn14.NOINYOC.betzy_intel.blom-hybrid SUBMIT
    PASS SMS_D_Ld1.T62_tn14.NOINYOC.betzy_intel.blom-hybrid RUN time=213
    FAIL SMS_D_Ld1.T62_tn14.NOINYOC.betzy_intel.blom-hybrid BASELINE noresm2_5_alpha04_dev1.6.1.5: ERROR BFAIL some baseline files were missing
    FAIL SMS_D_Ld1.T62_tn14.NOINYOC.betzy_intel.blom-hybrid MEMCOMP [Errno 2] No such file or directory: '/cluster/shared/noresm/noresm_baselines/blom_develop/noresm2_5_alpha04_dev1.6.1.5/SMS_D_Ld1.T62_tn14.NOINYOC.betzy_intel.blom-
hybrid/cpl-mem.log'
    PASS SMS_D_Ld1.T62_tn14.NOINYOC.betzy_intel.blom-hybrid TPUTCOMP
    PASS SMS_D_Ld1.T62_tn14.NOINYOC.betzy_intel.blom-hybrid MEMLEAK insufficient data for memleak test
    PASS SMS_D_Ld1.T62_tn14.NOINYOC.betzy_intel.blom-hybrid SHORT_TERM_ARCHIVER

@mvertens
Copy link
Contributor

@TomasTorsvik - thanks for doing these tests! I'm surprised that baselines are missing. What I have found is that if my laptop goes to sleep while the tests are running then my session on betzy stops and the tests don't complete. Hopefully, all the new baselines are now there with noresm2_5_alpha04.

@TomasTorsvik
Copy link
Contributor

TomasTorsvik commented Aug 30, 2024

@matsbn , @mvertens

Here is the log for the failed run test:
cesm.log from ERS_Ld3.TL319_tn05.NOIIAJRA.betzy_intel.blom-hybrid

srun: warning: can't honor --ntasks-per-node set to 128 which doesn't match the requested tasks 890 with the number of requested nodes 7. Ignoring --ntasks-per-node.
 (t_initf) Read in prof_inparm namelist from: drv_in
 (t_initf) Using profile_disable=          F
 (t_initf)       profile_timer=                      4
 (t_initf)       profile_depth_limit=               12
 (t_initf)       profile_detail_limit=               2
 (t_initf)       profile_barrier=          F
 (t_initf)       profile_outpe_num=                  1
 (t_initf)       profile_outpe_stride=               0
 (t_initf)       profile_single_file=      F
 (t_initf)       profile_global_stats=     T
 (t_initf)       profile_ovhd_measurement= F
 (t_initf)       profile_add_detail=       F
 (t_initf)       profile_papi_enable=      F
 ERROR: xcspmd: patch.input for arctic but ARCTIC undef.
(...)
Image              PC                Routine            Line        Source             
cesm.exe           0000000000FE32E7  Unknown               Unknown  Unknown
cesm.exe           0000000000D9EABD  shr_abort_mod_mp_         114  shr_abort_mod.F90
cesm.exe           0000000000C6C173  mod_xc_mp_xcstop_        2053  mod_xc.F90
cesm.exe           0000000000C613BE  mod_xc_mp_xcspmd_        1455  mod_xc.F90
cesm.exe           0000000000B32FF9  mod_blom_init_mp_          79  mod_blom_init.F90
cesm.exe           0000000000B2E7D9  ocn_comp_nuopc_mp         454  ocn_comp_nuopc.F90
libesmf.so         00002AEB718EDFCC  _ZN5ESMCI6FTable1     Unknown  Unknown
libesmf.so         00002AEB718F1A8F  ESMCI_FTableCallE     Unknown  Unknown
libesmf.so         00002AEB71E9FD6A  _ZN5ESMCI3VMK5ent     Unknown  Unknown
libesmf.so         00002AEB71EB828E  _ZN5ESMCI2VM5ente     Unknown  Unknown
libesmf.so         00002AEB718EF3E1  c_esmc_ftablecall     Unknown  Unknown
libesmf.so         00002AEB721433DB  esmf_compmod_mp_e     Unknown  Unknown
libesmf.so         00002AEB72382131  esmf_gridcompmod_     Unknown  Unknown
libesmf.so         00002AEB728D4CA0  nuopc_driver_mp_l     Unknown  Unknown
libesmf.so         00002AEB728FC642  nuopc_driver_mp_i     Unknown  Unknown
libesmf.so         00002AEB718EDFCC  _ZN5ESMCI6FTable1     Unknown  Unknown
libesmf.so         00002AEB718F1A8F  ESMCI_FTableCallE     Unknown  Unknown
libesmf.so         00002AEB71E9FD6A  _ZN5ESMCI3VMK5ent     Unknown  Unknown
libesmf.so         00002AEB71EB828E  _ZN5ESMCI2VM5ente     Unknown  Unknown
libesmf.so         00002AEB718EF3E1  c_esmc_ftablecall     Unknown  Unknown
libesmf.so         00002AEB721433DB  esmf_compmod_mp_e     Unknown  Unknown
libesmf.so         00002AEB72382131  esmf_gridcompmod_     Unknown  Unknown
libesmf.so         00002AEB728D4CA0  nuopc_driver_mp_l     Unknown  Unknown
libesmf.so         00002AEB728FC758  nuopc_driver_mp_i     Unknown  Unknown
libesmf.so         00002AEB7290610B  nuopc_driver_mp_i     Unknown  Unknown
libesmf.so         00002AEB718EDFCC  _ZN5ESMCI6FTable1     Unknown  Unknown
libesmf.so         00002AEB718F1A8F  ESMCI_FTableCallE     Unknown  Unknown
libesmf.so         00002AEB71E9FD6A  _ZN5ESMCI3VMK5ent     Unknown  Unknown
libesmf.so         00002AEB71EB828E  _ZN5ESMCI2VM5ente     Unknown  Unknown
libesmf.so         00002AEB718EF3E1  c_esmc_ftablecall     Unknown  Unknown
libesmf.so         00002AEB721433DB  esmf_compmod_mp_e     Unknown  Unknown
libesmf.so         00002AEB72382131  esmf_gridcompmod_     Unknown  Unknown
cesm.exe           0000000000435D81  MAIN__                    128  esmApp.F90
cesm.exe           0000000000424D92  Unknown               Unknown  Unknown
libc-2.17.so       00002AEB745C4545  __libc_start_main     Unknown  Unknown
cesm.exe           0000000000424CA9  Unknown               Unknown  Unknown

@TomasTorsvik
Copy link
Contributor

I get the same RUN error from ERS_Ld3.TL319_tn05.NOIIAJRA.betzy_intel.blom-hybrid when running noresm2_5_alpha04 without code changes from this PR, so it seems the error is not specific to this PR.

Copy link
Contributor

@TomasTorsvik TomasTorsvik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. There are some failed regression test, but nothing unusual that looks specific to this PR.

@matsbn
Copy link
Contributor Author

matsbn commented Aug 30, 2024

With the xcspmd error in the log, @TomasTorsvik , it seems the CPP flag ARCTIC is incorrectly not defined (leading to use_ARCTIC = .false.).

@TomasTorsvik
Copy link
Contributor

@matsbn - Would it be sufficient to add the tnx0.5v1 grid in cime_config/buildcpp? I see this grid is missing for both ARCTIC and LEVITUS2X cppdefs.

@matsbn
Copy link
Contributor Author

matsbn commented Aug 30, 2024

You're right, @TomasTorsvik . These cppdefs for the tnx0.5v1 grid was lost somehow. I have updated the PR with a fix for this.

Copy link
Collaborator

@milicak milicak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me.

@TomasTorsvik
Copy link
Contributor

@matsbn - looks like merging of PR #403 into master created a conflict with this PR.

@matsbn
Copy link
Contributor Author

matsbn commented Sep 23, 2024

@matsbn - looks like merging of PR #403 into master created a conflict with this PR.

Thanks for the heads-up, @TomasTorsvik . I will have a look at it and update the PR. Would be good to get this merged to master soon as further developments will build on this refactorization.

@mvertens
Copy link
Contributor

@matsbn - I have renamed mod_nuopc_methods.F90 to ocn_import_export.F90 to be consistent with other component nuopc cap code.

@matsbn matsbn added this to the NorESM2.5 - BLOM/iHAMOCC milestone Sep 23, 2024
@matsbn
Copy link
Contributor Author

matsbn commented Sep 24, 2024

Conflicts have been resolved, along with some formatting and the ability to build without iHAMOCC when using MCT. I still get bitidentical results with current master.

@TomasTorsvik
Copy link
Contributor

@matsbn - I looked briefly through the last changes. I don't think there is need for a new code review, so you can merge this to master if you think it's ready.

@matsbn matsbn merged commit 1ade407 into NorESMhub:master Sep 24, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants