-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorganize tasks to use shared steps #117
Conversation
568e533
to
827bcfe
Compare
TestingI successfully ran all test cases except RPE 1 km and 4 km on Chrysalis. I will rerun once more before merging to make sure changes during updating the docs and review don't inadvertently affect the results. |
961ed11
to
923e3cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with these changes. I ran the nightly, pr and cosine_bell tests on chrys with intel, openmpi. I also tried polaris serial
at the task and step levels, with and without cleaning. I made a few suggestions in #116. The only suggestion that may affect #117 is #116 (comment). Thanks for all your work on this @xylar!
30c3906
to
4553775
Compare
4553775
to
cdd3559
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, these changes look good to me. The pr
and nightly
suites run successfully and the manufactured_solution
test fails with the expected convergence error on Perlmutter. However, I am seeing that the cosine_bell
forward step fails with debug flags with the error:
Error termination. Backtrace:
At line 689 of file mpas_ocn_tracer_advection_mono.F
Fortran runtime error: Index '2' of dimension 1 of array 'tracercur' above upper bound of 1
I'm sure the case runs fine with standard compiler flags given @cbegeman's testing. I will verify to be sure.
if i == 0: | ||
if error_range is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this improvement.
subdir = f'planar/{name}' | ||
super().__init__(component=component, name=name, subdir=subdir) | ||
|
||
self.resolutions = [200., 100., 50., 25.] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a later PR, we should probably move these to being config options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. I decided not to do that here but it should be changed.
@sbrus89, do you know if that issue ( |
Hmm, I really don't get that error because we do initialize all three debug tracers to the same value. Is it possible that error was introduced in MPAS-Ocean and is unrelated to Polaris? But then why don't we see it in Compass? Mysterious! |
This appears to be an issue with indexing for single-layer runs. Maybe no one has been running in debug mode with single layer? In any case, it seems like that particular line in MPAS-O hasn't been changed in 2 years so I don't think it's a new issue and I don't think it's introduced here. Can you verify that you see it with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, the cosine_bell
suite passes DEBUG=false
so the issue is on the MPAS-O side, as we had suspected.
Thanks @sbrus89! Do you have time to make an E3SM issue for this? I'd hate for us to lose track of it. Presumably, we should add a single-layer test to the Compass |
This is the follow-up to #116 that actually reorganizes the test cases to use shared steps and that changes to the proposed new work-directory structure.
This branch builds on #116 to implements the design described in #109.
Tasks in the work directory are organized into
planar
andspherical
(currently onlycosine_bell
).Under
planar
, tests are largely organized as they were before except that:baroclinic_channel
uses sharedinit
steps for each resolutioninertial_gravity_wave
andmanufactured_soluion
are now the names of tasks (there is no longer a singleconvergence
task within each)nx
, andny
are now config options insingle_column
(rather thanlx
andly
), and there is no subdirectory for resolutionUnder
spherical
:icos
andqu
(with other meshes and mesh types to come later)base_mesh/<res>km
steps withinocean/spherical/icos
andocean/spherical/qu
.cosine_bell
tasks use the sharedbase_mesh/*
steps, and make local symlinks to each res under their ownbase_mesh
subdirectory (so that the organization looks much like before)cosine_bell/with_viz
tasks share thesebase_mesh/*
steps, and they also share theinit/*
,forward/*
andanalysis
steps withcosine_bell
tasks. They make local symlinks so it should be clear. See the updated docs and the design doc in New shared steps capability design doc #109 for more details.I went through the entire documentation (it took a full day) and it seems like it's in pretty good shape now.
Checklist
api.md
) has any new or modified class, method and/or functions listedTesting
comment in the PR documents testing used to verify the changes