Basic test setup #80

nordmoen · 2021-04-20T11:02:24Z

It would be nice to setup some basic testing for BLOM, both for continuous integration (to ensure that the output of the different compilers actually work) and for assurance when updating the code that nothing is broken. With a good test setup it would also be easier to make performance improvements since there would be ways to ensure that the new code is up to specification.

Meson has built in support for unit testing which we could leverage, however, the structure of these unit tests are quite free. Ideally the executable used for the test should be self-contained (meaning it should test a few relevant factors), deterministic (the data which is tested against should not be affected by changes to the code) and quite quick to run.

Several avenues are open to us when implementing this:

We could implement the executable in Fortran. The executable would then call into BLOM with predefined inputs and known valid output and compare the results.
We could implement the executable in a different language (e.g. Python) and use existing facilities in BLOM that output statistics during a run to compare against. This requires us to generate an initial set of known good values for different inputs, but it requires minimal changes to BLOM.
- One challenge with this is that the test executable needs to run the new version of BLOM, something that is easier to automate with suggestion 1.
Other suggestions?

I would be available to help with this, especially the Meson and CI integration, but I would require some help defining test cases and figuring out how to implement it in BLOM.

nordmoen · 2021-04-23T13:39:14Z

I'm trying to set it up so that Meson can run a few test cases (to begin with, simply run the binary). One thing that is not clear to me is what the namelist file is which phy/rdlim.F is checking for is, or where it could be in the repository. Is this something which I could generate based on files in the repository?

AleksiNummelin · 2021-04-23T14:30:06Z

At the moment the limits file (which is what the phy/rdlim.Fexpects) is not included in the repository. In the NorESM context, it is created by cime_config/buildnml but I don't think one can easily run that outside NorESM since it needs some XML files. I created a pull request just today that includes a limits.json and a limits_from_json.py that will create the limits file. There is also a simple submission script for running on Betzy. Note that these are all specific to the channel setup.

nordmoen · 2021-04-26T05:33:23Z

At the moment the limits file (which is what the phy/rdlim.Fexpects) is not included in the repository. In the NorESM context, it is created by cime_config/buildnml but I don't think one can easily run that outside NorESM since it needs some XML files. I created a pull request just today that includes a limits.json and a limits_from_json.py that will create the limits file. There is also a simple submission script for running on Betzy. Note that these are all specific to the channel setup.

Thanks for the quick reply @AleksiNummelin

Reading through the responses in your PR it seems that this is something that needs to be decided before I can move forward with testing.

nordmoen · 2021-04-27T07:09:14Z

Alternatively, for testing we would just need a minimal workable limits file for each of the test cases. If the channel setup is more or less standalone I can use that for testing.

AleksiNummelin · 2021-04-28T16:06:15Z

We discussed this today in the BLOM-core meeting. For now, one can use for example the channel setup for testing, but we agreed that a first proper test case should be built upon the fuk95 case. In this context, I'd imagine testing would imply building, running, and checking diagnostics against a reference (+these cases could be used for testing scalability etc.). Therefore a full-flexed test case should also include checks for the physical parameters (tracer conservation, matching a reference kinetic and potential energy budgets etc.). Related to this discussion, we thought that there is also a need to move some of the idealized cases to another folder. I created another issue for this #86 since that discussion might be a bit different from the focus here.

nordmoen · 2021-04-29T12:53:39Z

We discussed this today in the BLOM-core meeting. For now, one can use for example the channel setup for testing ...

Could you help me set this up? I don't know what would be needed, but I have the general setup for testing and generating an individual dimension.F file for each test. "All" I need now is some test data so the program will actually run and then we can start to think about checking correctness.

matsbn · 2021-04-29T13:36:11Z

I have placed a Fortran namelist (“limits”) for the fuk95 test case here: https://gist.github.com/matsbn/718c1419cc1ecc064d78d18f5687439f

This test case does not need any other input files. To shorten the integration time, "NDAY2" can be reduced from 10 to say 1.

nordmoen · 2021-04-30T06:06:43Z

I downloaded the limits file and it worked with some configuration updates. One snag is that the program crashes when I use OpenMP (the program runs single threaded).

Thread 1 "fuk95_blom" received signal SIGSEGV, Segmentation fault.                                                
0x00000000005016df in mod_dia::diaacc (m=<error reading variable: Cannot access memory at address 0x7fffff46f978>,
    n=<error reading variable: Cannot access memory at address 0x7fffff46f970>,                                   
    mm=<error reading variable: Cannot access memory at address 0x7fffff46f968>,                                  
    nn=<error reading variable: Cannot access memory at address 0x7fffff46f960>,                                  
    k1m=<error reading variable: Cannot access memory at address 0x7fffff46f958>,                                 
    k1n=<error reading variable: Cannot access memory at address 0x7fffff46f950>) at ../phy/mod_dia.F:972         
972           subroutine diaacc(m,n,mm,nn,k1m,k1n)

I'm developing the unit tests on the branch feature_unit_tests if any of you have time to help me debug the OpenMP problem. Once this is fixed I think we can start looking into how to check the results.

nordmoen · 2021-05-03T09:47:52Z

I downloaded the limits file and it worked with some configuration updates. One snag is that the program crashes when I use OpenMP (the program runs single threaded).

Thread 1 "fuk95_blom" received signal SIGSEGV, Segmentation fault.                                                
0x00000000005016df in mod_dia::diaacc (m=<error reading variable: Cannot access memory at address 0x7fffff46f978>,
    n=<error reading variable: Cannot access memory at address 0x7fffff46f970>,                                   
    mm=<error reading variable: Cannot access memory at address 0x7fffff46f968>,                                  
    nn=<error reading variable: Cannot access memory at address 0x7fffff46f960>,                                  
    k1m=<error reading variable: Cannot access memory at address 0x7fffff46f958>,                                 
    k1n=<error reading variable: Cannot access memory at address 0x7fffff46f950>) at ../phy/mod_dia.F:972         
972           subroutine diaacc(m,n,mm,nn,k1m,k1n)

I'm developing the unit tests on the branch feature_unit_tests if any of you have time to help me debug the OpenMP problem. Once this is fixed I think we can start looking into how to check the results.

I was able to overcome this locally by increasing the stack size with ulimit -s unlimited. Now it is only a matter of configuring CI with the same setup.

The next step is probably to create some scripts that can be run to check the results of the run. I can implement this if I know what the different files mean, their expected values and their type.

nordmoen · 2021-05-21T09:30:01Z

To follow up on this.

Now that the initial test can be run we need to check that the output is as expected. For me it would be easiest to write a small script in Python that could check the output, but I need some help with the file type and what the expected output should look like.

TomasTorsvik · 2021-05-21T14:04:28Z

There is a tool called "cprnc" that has been created for comparison of output netCDF files for CESM. The cime source that is bundled with NorESM is a bit old, but it seems to work. The most recent version is available at
https://github.com/ESMCI/cime/tree/master/tools/cprnc

There is also a python version of this tool, but it does not seem to be maintained
https://github.com/NCAR/cprnc_python

Basically, the tool takes two netCDF files as input and creates a report on the difference in the data, while ignoring any information related to the specific run.

I compiled a version of the tool on Betzy (using the source code from NorESM2.0.4), which is available from
/cluster/shared/noresm/diagnostics/cprnc/

Maybe something along this line would be useful as a check on the output?

AleksiNummelin · 2021-05-27T13:55:15Z

A bit late with a response here, but for a more comprehensive test in the future, would be maybe interesting to use a package like xarray (if we can afford having a conda environment that is fast to install). There is some nice functionality (coming from numpy) that allows for very basic checks as well, for example checking if two variables are equal to a tolerance http://xarray.pydata.org/en/stable/generated/xarray.testing.assert_allclose.html

With xarray it would be easy to implement checks for dynamical consistency (energy levels etc. we've talked about).

matsbn · 2021-05-27T19:26:25Z

Also very late with my response here, sorry for that! If one wants to test for bit-identical simulations, I think the available checksum functionality in BLOM should work well. Actually each BLOM simulation dumps " chksum: dp: 0x ... " to stdout at the end of the simulation that I have found very reliable in detecting simulation differences. This could be extended, e.g by adding a checksum for a sensitive iHAMOCC field. There is of course a value in actually checking the output since the generation of output can also be erroneous.

For detecting simulation differences within an acceptable tolerance, I promised to implement some energy diagnostics in BLOM that would dump say global kinetic and potential energy sums to stdout. For "simple" metrics like that, I believe this approach would be easier to integrate in a CI framework compared to relying on external tools for obtaining these metrics. More sophisticated metrics might surely be more convenient to develop in something other than Fortran. Unfortunately, I have had no time to implement these energy metrics yet, but hopefully I can make a stab at it very soon.

nordmoen · 2021-05-28T05:38:39Z

I think both of these tracks should be followed. Bit-identical checksums are excellent for CI where we just need to verify that changes do not affect the output of simulation. However, for day-to-day development it would be better with tolerance based tests so that one can better gauge the effect of changes while developing. Tolerance based testing is also essential for moving to GPUs where bit-identical will be difficult (maybe even impossible) and it would here be good to be able to measure the difference in accuracy.

AleksiNummelin mentioned this issue Apr 28, 2021

Restructuring of BLOM directory #86

Open

gold2718 added this to NorESM Development Jun 2, 2023

gold2718 moved this to Todo in NorESM Development Dec 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic test setup #80

Basic test setup #80

nordmoen commented Apr 20, 2021

nordmoen commented Apr 23, 2021

AleksiNummelin commented Apr 23, 2021

nordmoen commented Apr 26, 2021

nordmoen commented Apr 27, 2021

AleksiNummelin commented Apr 28, 2021 •

edited

Loading

nordmoen commented Apr 29, 2021

matsbn commented Apr 29, 2021

nordmoen commented Apr 30, 2021

nordmoen commented May 3, 2021

nordmoen commented May 21, 2021

TomasTorsvik commented May 21, 2021

AleksiNummelin commented May 27, 2021

matsbn commented May 27, 2021

nordmoen commented May 28, 2021

Basic test setup #80

Basic test setup #80

Comments

nordmoen commented Apr 20, 2021

nordmoen commented Apr 23, 2021

AleksiNummelin commented Apr 23, 2021

nordmoen commented Apr 26, 2021

nordmoen commented Apr 27, 2021

AleksiNummelin commented Apr 28, 2021 • edited Loading

nordmoen commented Apr 29, 2021

matsbn commented Apr 29, 2021

nordmoen commented Apr 30, 2021

nordmoen commented May 3, 2021

nordmoen commented May 21, 2021

TomasTorsvik commented May 21, 2021

AleksiNummelin commented May 27, 2021

matsbn commented May 27, 2021

nordmoen commented May 28, 2021

AleksiNummelin commented Apr 28, 2021 •

edited

Loading