Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.7.0 Release Prep #408

Merged
merged 18 commits into from
Jun 3, 2016
Merged

0.7.0 Release Prep #408

merged 18 commits into from
Jun 3, 2016

Conversation

pbauman
Copy link
Member

@pbauman pbauman commented May 26, 2016

Updated CHANGES, added @nicholasmalaya to AUTHORS (thanks!), fixed one test that didn't run with mpiexec -np 8 (or any -np except 1), distcheck fixes, updated so we don't distribute the license tool (that way it's only used in a repo clone, not a tarball), and updated LICENSE date to 2016.

@pbauman
Copy link
Member Author

pbauman commented May 26, 2016

Barring any comments, will merge this COB tomorrow and do the release.

@pbauman
Copy link
Member Author

pbauman commented May 29, 2016

@roystgnr says he's getting a compilation failure, but I can't reproduce with GCC 4.8.5, GCC 4.9.3, or GCC 5.3. Going to hold off on tagging this until we can sort that out.

@roystgnr
Copy link
Member

Working on a fix in https://github.com/roystgnr/grins/tree/fix_metaprogramming - I'll try to have it done late tonight.

This test is effectively identical to its parsed_qoi.sh cousin,
it just has a few extra qois. So, use the same solver options so
we can successfully run the test in parallel.
Only run if it's a cloned repo. This way, we get rid of those
annoying warnings that come out in a build from a tarball.
@pbauman
Copy link
Member Author

pbauman commented May 31, 2016

Rebased on #411.

Going to clean up some new warnings that showed up with GCC 6.1 and then I think this should be good.

@roystgnr
Copy link
Member

roystgnr commented May 31, 2016

After the fix, Everything passes for me with -np 1. With -np 4, GRINS master, I get

FAIL: exact_soln/poisson_periodic_2d_x.sh
FAIL: exact_soln/poisson_periodic_2d_y.sh
FAIL: exact_soln/poisson_periodic_3d_xz.sh
FAIL: exact_soln/poisson_periodic_3d_yz.sh
FAIL: exact_soln/poisson_periodic_3d_xy.sh
FAIL: regression/penalty_poiseuille_stab.sh

In that last regression, linear solvers are failing to converge after 2500 steps and leading to nonlinear solver convergence failure too.

In the former cases it looks like I'm getting segfaults. I'll see if I can hunt down where.

@pbauman
Copy link
Member Author

pbauman commented May 31, 2016

Crap, thanks for pointing these out. I'll take care of the "after the fix" cases.

Although the current use cases will never be the SCALAR type, there
may be needs in the future. This fixes warnings emitted by GCC 6.1.0.
Now we run the tests again if Antioch is enabled.
@pbauman
Copy link
Member Author

pbauman commented May 31, 2016

OK, the test/run_tests_parallel_loop.sh script is almost through -np 12 and here's what I've got:

[12:36:25][pbauman@fry:/fry1/data/users/pbauman/research/grins/build/0.7.0-prep/devel/test]$ grep -e LIBMESH_RUN -e FAIL parallel_loop_check_tests.log | grep -v XFAIL
LIBMESH_RUN is mpiexec -np 1
# FAIL:  0
LIBMESH_RUN is mpiexec -np 2
# FAIL:  0
LIBMESH_RUN is mpiexec -np 3
LIBMESH_RUN is mpiexec -np 1
# FAIL:  0
LIBMESH_RUN is mpiexec -np 2
# FAIL:  0
LIBMESH_RUN is mpiexec -np 3
# FAIL:  0
LIBMESH_RUN is mpiexec -np 4
FAIL: regression/penalty_poiseuille_stab.sh
# FAIL:  1
LIBMESH_RUN is mpiexec -np 5
FAIL: regression/reacting_low_mach_antioch_kinetics_theory.sh
# FAIL:  1
LIBMESH_RUN is mpiexec -np 6
# FAIL:  0
LIBMESH_RUN is mpiexec -np 7
# FAIL:  0
LIBMESH_RUN is mpiexec -np 8
# FAIL:  0
LIBMESH_RUN is mpiexec -np 9
# FAIL:  0
LIBMESH_RUN is mpiexec -np 10
# FAIL:  0
LIBMESH_RUN is mpiexec -np 11
# FAIL:  0
LIBMESH_RUN is mpiexec -np 12

I don't know how regressions got in for those two cases. I'll fix them. I'm not seeing the Poisson-periodic failures. I'm running with SerialMesh - were your failures with ParallelMesh @roystgnr?

@roystgnr
Copy link
Member

They are indeed with ParallelMesh. So if you'll worry about the solvers I'll worry about the segfaults.

@pbauman
Copy link
Member Author

pbauman commented May 31, 2016

They are indeed with ParallelMesh. So if you'll worry about the solvers I'll worry about the segfaults.

👍

@pbauman
Copy link
Member Author

pbauman commented May 31, 2016

Just for completeness, here's the final result of ./run_tests_parallel_loop.sh 16

LIBMESH_RUN is mpiexec -np 1
# FAIL:  0
LIBMESH_RUN is mpiexec -np 2
# FAIL:  0
LIBMESH_RUN is mpiexec -np 3
LIBMESH_RUN is mpiexec -np 1
# FAIL:  0
LIBMESH_RUN is mpiexec -np 2
# FAIL:  0
LIBMESH_RUN is mpiexec -np 3
# FAIL:  0
LIBMESH_RUN is mpiexec -np 4
FAIL: regression/penalty_poiseuille_stab.sh
# FAIL:  1
LIBMESH_RUN is mpiexec -np 5
FAIL: regression/reacting_low_mach_antioch_kinetics_theory.sh
# FAIL:  1
LIBMESH_RUN is mpiexec -np 6
# FAIL:  0
LIBMESH_RUN is mpiexec -np 7
# FAIL:  0
LIBMESH_RUN is mpiexec -np 8
# FAIL:  0
LIBMESH_RUN is mpiexec -np 9
# FAIL:  0
LIBMESH_RUN is mpiexec -np 10
# FAIL:  0
LIBMESH_RUN is mpiexec -np 11
# FAIL:  0
LIBMESH_RUN is mpiexec -np 12
FAIL: regression/reacting_low_mach_antioch_kinetics_theory.sh
# FAIL:  1
LIBMESH_RUN is mpiexec -np 13
# FAIL:  0
LIBMESH_RUN is mpiexec -np 14
# FAIL:  0
LIBMESH_RUN is mpiexec -np 15
# FAIL:  0
LIBMESH_RUN is mpiexec -np 16
FAIL: regression/reacting_low_mach_antioch_statmech_constant_prandtl.sh
# FAIL:  1

Going to focus first on PETSc versions to see if that's the issue.

@pbauman
Copy link
Member Author

pbauman commented May 31, 2016

Good lord. I'm focusing on regression/penalty_poiseuille_stab.sh and going back to the revision that we both ran successfully now fails with mpiexec -np 4. I trolled through the issues and couldn't find what PETSc version we were using so I'll run through various PETSc versions to see if I can get success out of that commit again.

@roystgnr
Copy link
Member

So the other regressions are clearly a libMesh failure, and not an easy one to fix. Don't let them hold up 0.7.0. If you want to hard-code mesh type == serial into the periodic BC tests that might be a good stopgap.

@pbauman
Copy link
Member Author

pbauman commented May 31, 2016

So the other regressions are clearly a libMesh failure, and not an easy one to fix. Don't let them hold up 0.7.0. If you want to hard-code mesh type == serial into the periodic BC tests that might be a good stopgap.

Glad I added the tests those tests then! I'll hardcode the test as you suggest and I'll add a check in the periodic boundary condition factory that throws an error that references the libMesh issue (that we can remove once it's sorted).

Then, once I sort out these other solver-related regressions, I'll update this ticket.

@pbauman
Copy link
Member Author

pbauman commented May 31, 2016

Update: So I thought our original test updating was with switch from PETSC 3.5.x to PETSC 3.6.y, but apparently not for regression/penalty_poiseuille_stab.sh. Going back d4a432e (which is after the "robustness tweaks" and before shuffling I did down stream (that shouldn't have affected the convergence behavior of the test)) I get pass with mpiexec -np 4 with PETSc 3.5.4 and fail with PETSc 3.6.0, all other things unchanged.

I'll work on updating options for this test and see if the other failures behave similarly.

@pbauman
Copy link
Member Author

pbauman commented May 31, 2016

Update: Going back to the tip of this branch (pbauman/0.7.0-prep) and using PETSc 3.5.4, all the previous failing processor/test combinations pass.

This get me consistency across PETSc 3.5.4, 3.6.0, and 3.6.4, both
debug and opt versions, for LIBMESH_RUN="mpiexec -np 1" through
LIBMESH_RUN="mpiexec -np 16". Previously, everything was fine for
3.5.4, but then moving to PETSc 3.6, something changed (beyond
the default pc_factor_shift_type and I haven't been able to
ascertain what). Hopefully, these will be more consistent options.
This way we can restrict the TESTS that it runs when going through
all the processor variations.
@pbauman
Copy link
Member Author

pbauman commented Jun 2, 2016

OK, in theory, we should be good again. I've run this through PETSc 3.5.4, 3.6.4, and 3.7.1 using GCC 4.8.5 and GCC 6.1.0 using the parallel loop test up to 16 cores. I had to adjust a few a long the way. I'm resisting the urge to go back and change all the tests that use (implicitly or explicitly) -sub_pc_type ilu to -sub_pc_type lu. If this becomes a problem again, I'll probably do that to try and get more stability out of the tests again compiler/petsc changes. I did go ahead and update the parallel loop test script to take the TESTS="..." argument in 59c1876.

Two things left to do: tidy up some warnings when antioch and cantera are off and "serialize" the periodic tests/throw error for ParallelMesh in PeriodBoundary conditions.

@pbauman
Copy link
Member Author

pbauman commented Jun 2, 2016

OK, I think we're ready to go. I snuck in adding accessors for the Neumann bcs that I'd forgotten to put in the boundary conditions PR.

We're almost to the point where we could add -Werror for Antioch enabled and Cantera disabled. Just a couple of warnings to clear up in ParsedFunction and ParsedFEMFunction on the libMesh side.

@roystgnr
Copy link
Member

roystgnr commented Jun 2, 2016

🏆

@pbauman
Copy link
Member Author

pbauman commented Jun 2, 2016

Have to run now, but will merge, tag, and release tonight.

@pbauman
Copy link
Member Author

pbauman commented Jun 2, 2016

(not merging now in case anyone wants to look a little closer before tonight)

@pbauman pbauman merged commit 7de3104 into grinsfem:master Jun 3, 2016
@pbauman pbauman deleted the 0.7.0-prep branch June 3, 2016 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants