Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize tasks to use shared steps #117

Merged
merged 10 commits into from
Sep 22, 2023

Conversation

xylar
Copy link
Collaborator

@xylar xylar commented Sep 14, 2023

This is the follow-up to #116 that actually reorganizes the test cases to use shared steps and that changes to the proposed new work-directory structure.

This branch builds on #116 to implements the design described in #109.

Tasks in the work directory are organized into planar and spherical (currently only cosine_bell).

Under planar, tests are largely organized as they were before except that:

  • baroclinic_channel uses shared init steps for each resolution
  • inertial_gravity_wave and manufactured_soluion are now the names of tasks (there is no longer a single convergence task within each)
  • the resolution, nx, and ny are now config options in single_column (rather than lx and ly), and there is no subdirectory for resolution

Under spherical:

  • tasks are now organized into icos and qu (with other meshes and mesh types to come later)
  • an ocean framework function has been added to create shared base_mesh/<res>km steps within ocean/spherical/icos and ocean/spherical/qu.
  • the cosine_bell tasks use the shared base_mesh/* steps, and make local symlinks to each res under their own base_mesh subdirectory (so that the organization looks much like before)
  • the cosine_bell/with_viz tasks share these base_mesh/* steps, and they also share the init/*, forward/* and analysis steps with cosine_bell tasks. They make local symlinks so it should be clear. See the updated docs and the design doc in New shared steps capability design doc #109 for more details.

I went through the entire documentation (it took a full day) and it seems like it's in pretty good shape now.

Checklist

  • User's Guide has been updated
  • Developer's Guide has been updated
  • API documentation in the Developer's Guide (api.md) has any new or modified class, method and/or functions listed
  • Documentation has been built locally and changes look as expected
  • Testing comment in the PR documents testing used to verify the changes
  • Test suites have been updated
  • Cached files have been updated

@xylar xylar marked this pull request as draft September 14, 2023 14:38
@xylar xylar changed the title Reorg for shared steps Reorganize tasks to use shared steps Sep 14, 2023
@xylar xylar mentioned this pull request Sep 14, 2023
1 task
@xylar xylar added enhancement New feature or request clean-up in progress This PR is not ready for review or merging ocean Related to ocean tests or analysis labels Sep 14, 2023
@xylar xylar removed the in progress This PR is not ready for review or merging label Sep 15, 2023
@xylar xylar marked this pull request as ready for review September 15, 2023 15:10
@xylar
Copy link
Collaborator Author

xylar commented Sep 15, 2023

Testing

I successfully ran all test cases except RPE 1 km and 4 km on Chrysalis. I will rerun once more before merging to make sure changes during updating the docs and review don't inadvertently affect the results.

@xylar xylar force-pushed the reorg-for-shared-steps branch 2 times, most recently from 961ed11 to 923e3cd Compare September 18, 2023 09:32
Copy link
Collaborator

@cbegeman cbegeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with these changes. I ran the nightly, pr and cosine_bell tests on chrys with intel, openmpi. I also tried polaris serial at the task and step levels, with and without cleaning. I made a few suggestions in #116. The only suggestion that may affect #117 is #116 (comment). Thanks for all your work on this @xylar!

@xylar xylar force-pushed the reorg-for-shared-steps branch 4 times, most recently from 30c3906 to 4553775 Compare September 19, 2023 10:31
Copy link
Contributor

@sbrus89 sbrus89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar, these changes look good to me. The pr and nightly suites run successfully and the manufactured_solution test fails with the expected convergence error on Perlmutter. However, I am seeing that the cosine_bell forward step fails with debug flags with the error:

Error termination. Backtrace:
At line 689 of file mpas_ocn_tracer_advection_mono.F
Fortran runtime error: Index '2' of dimension 1 of array 'tracercur' above upper bound of 1

I'm sure the case runs fine with standard compiler flags given @cbegeman's testing. I will verify to be sure.

Comment on lines -89 to +92
if i == 0:
if error_range is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this improvement.

subdir = f'planar/{name}'
super().__init__(component=component, name=name, subdir=subdir)

self.resolutions = [200., 100., 50., 25.]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a later PR, we should probably move these to being config options

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. I decided not to do that here but it should be changed.

@xylar
Copy link
Collaborator Author

xylar commented Sep 21, 2023

@sbrus89, do you know if that issue (Index '2' of dimension 1 of array 'tracercur' above upper bound of 1) is new in this PR or if that's always been present? It seems like it's a problem of not having initialized 3 debug tracers, and that didn't change here as far as I know.

@xylar
Copy link
Collaborator Author

xylar commented Sep 21, 2023

Hmm, I really don't get that error because we do initialize all three debug tracers to the same value. Is it possible that error was introduced in MPAS-Ocean and is unrelated to Polaris? But then why don't we see it in Compass? Mysterious!

@xylar
Copy link
Collaborator Author

xylar commented Sep 21, 2023

Error termination. Backtrace:
At line 689 of file mpas_ocn_tracer_advection_mono.F
Fortran runtime error: Index '2' of dimension 1 of array 'tracercur' above upper bound of 1

This appears to be an issue with indexing for single-layer runs. Maybe no one has been running in debug mode with single layer? In any case, it seems like that particular line in MPAS-O hasn't been changed in 2 years so I don't think it's a new issue and I don't think it's introduced here. Can you verify that you see it with main? If so, I would suggest we make an E3SM issue about it since it doesn't seem to be a Polaris problem as far as I can tell.

Copy link
Contributor

@sbrus89 sbrus89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar, the cosine_bell suite passes DEBUG=false so the issue is on the MPAS-O side, as we had suspected.

@xylar
Copy link
Collaborator Author

xylar commented Sep 22, 2023

Thanks @sbrus89! Do you have time to make an E3SM issue for this? I'd hate for us to lose track of it. Presumably, we should add a single-layer test to the Compass pr suite so this gets flagged in testing that is regularly done on E3SM, since Polaris is not used that way yet.

@xylar xylar merged commit fe7dcf3 into E3SM-Project:main Sep 22, 2023
5 checks passed
@xylar xylar deleted the reorg-for-shared-steps branch September 22, 2023 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clean-up enhancement New feature or request ocean Related to ocean tests or analysis
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants