Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BAOA integrator for extended Lagrangian dynamics #507

Merged
merged 12 commits into from
Sep 15, 2023
Merged

BAOA integrator for extended Lagrangian dynamics #507

merged 12 commits into from
Sep 15, 2023

Conversation

jhenin
Copy link
Member

@jhenin jhenin commented Dec 21, 2022

Much more stable for high friction and long time steps. Previous integrator was primitive and did the job well only for low friction.

@giacomofiorin
Copy link
Member

I didn't know about this new stochastic integrator, I'll check it out.

@finkeljos: may be interesting to you as well.

The equivalence of these schemes is a recurring notion it seems: https://doi.org/10.1063/5.0066008

What we found earlier was that BAOAB and GJF have identical convergence of the canonical sampling, i.e. are both better than older integrators than BBK. But BAOAB overestimates diffusivity in the high friction limit to an extent that is timestep-dependent. https://doi.org/10.1080/00268976.2019.1649493

@jhenin Is the scheme from the JCTC paper that you just implemented consistent with the assigned friction coefficient? If not, it may be a good idea to either put GJF in there, or (even easier) add a note in the comment. Diffusivity wouldn't matter as much since you have a harmonic potential, but if someone changes that code at least they would know.

@jhenin
Copy link
Member Author

jhenin commented Jan 18, 2023

@giacomofiorin Yes, the friction is correct, see section 5 here https://pubs.acs.org/doi/10.1021/ct3000876

@finkeljos
Copy link

Hi Giacomo! Sorry for delay. I must admit I’m a bit confused why one would consider to use baoa, there is no time symmetry and you don’t end up with verlet in gamma to zero limit. I’m not sure about the large gamma properties here, but the large gamma behavior for GJF and the other GJs is quite good. I guess it could be that baoa is exactly baoab except you combine the two b’s. So perhaps it’s really the same thing.

@jhenin
Copy link
Member Author

jhenin commented Jan 18, 2023

@finkeljos I chose GSD/BAOA based on results from this paper https://arxiv.org/abs/2204.02105 but I am not strongly opinionated about any of these, and frankly I think this is not a very critical application of Langevin integrators - I just needed something better than the half-baked algorithm I cobbled together many years ago.

That said, right now my problem is a NAMD regtest failure for a case that passes easily on my machine, so the reason for the divergence is unclear.

@giacomofiorin
Copy link
Member

Thanks @jhenin. I don't see numerical tests of the dynamics generated by this scheme that specifically use the Einstein relation. Moreover, the Berendsen paper says "In all simulations, a time step of 2 fs was used." For MARTINI, that's way shorter that what people normally use. They also have results for larger time steps but there the friction coefficient is rather low.

You really start getting deviations when $\gamma \delta t$ starts becoming close to 1, i.e. as you approach the overdamped limit. You may not be working in the overdamped regime in most cases, but occasionally one would get there and become confused by the discrepancy (diffusion coefficient starts to change with the time step). See the fig below from our Mol Phys paper with Josh

image

I agree that with a hard-coded harmonic potential the diffusivity is not an issue, which is why I suggested just adding a note in the code comments for future reference, saying that when diffusion matters a time-consistent scheme like GJF should be used.

@jhenin
Copy link
Member Author

jhenin commented Jan 18, 2023

As a matter of fact, with the newer applications we are removing the harmonic restraint and going to the overdamped limit, so this conversation is quite relevant. I think the current application is not as sensitive as running a full simulation using this integrator, but still more sensitive than the eABF case. I'm happy to use the G-JF integrator if it behaves well for the underdamped case too.
Edit: I ascribed the unexpected change in diffusivity of BAOAB to the momentum imprecision documented by Kieninger and Keller, so I assume that the BAOA/GSD scheme being more accurate on short-term kinetics would also have correct diffusion. There is no explicit data to back that up though.

@giacomofiorin
Copy link
Member

giacomofiorin commented Jan 19, 2023

Basically, most (if not all) "modern" Langevin schemes will reproduce the configurational distributions equally well for relatively large time steps, as long as they are below the stability limit. They will then suddenly lead to unstable dynamics once you exceed that limit, which is the same for all schemes. Older methods like BBK, however, will give you biased distributions well before you hit that limit.

Specifically at the high friction limit, you would expect the friction coefficient you impose to determine the diffusivity in full. However, BAOAB will overestimate diffusivity to an extent that is time-step dependent. G-JF performs equally well at all friction values and time-steps (again, as long as the stability limit is respected).

To be clear, the two methods and many others are thermodynamically equivalent, and they only differ in kinetic properties. This was the conclusion of our papers that I linked above. https://doi.org/10.1080/00268976.2019.1649493 https://doi.org/10.1063/5.0066008

Some additional fine details. Also at very high friction, the G-JF scheme as defined in the 2013 paper has a subtler issue, i.e. it underestimates velocities when these are computed as in a deterministic Verlet scheme. This paper introduced a better definition of the discrete velocity that is shifted by a half-step, so that the bias introduced by the discontinuous random forces is compensated for. The two variations of G-JF are identical for positions, the only difference is that the latter also reports velocities consistent with the temperature, and you could just call this one "the" G-JF method. (Maybe this is related to the "momentum imprecision" from the KK paper?) To my knowledge, G-JF is only available in LAMMPS. Niels has mostly been operating from the assumption that developers of MD codes will independently review the literature and implement the best scheme for their applications.

In summary, if you are going to be computing diffusion you may want to use G-JF or any other scheme that is equivalent to it within the same time units, not within some rescaling factor. If you are only looking at thermodynamics, you may be already okay but do consider @finkeljos's comment about time reversibility.

@finkeljos
Copy link

finkeljos commented Jan 20, 2023

@giacomofiorin I actually think that’s why our JCP paper was really nice. If you are going to use an ABO splitting to do Langevin dynamics (so that you reduce to Verlet)
, we write down how you need to tune everything in order to get the right kinetics and thermodynamics. So you might as well use a GJ so you are always covered and don’t need to care about which regime you are in.

@jhenin if you plan to be in the overdamped regime, I think it would probably be best to stay away from the ABO schemes. In the JCP paper Giacomo sent, https://aip.scitation.org/doi/10.1063/5.0066008, along with Niels’ 2019 paper, these show that

  1. If you require a time symmetric Langevin splitting method that only calculates one force evaluation per step and reduces to Verlet
  2. And you want correct thermodynamics (in the linear case) for any chosen friction value,
  3. And, you require exactly correct transport/diffusion in all regimes,

You must use GJF or one of the related GJ methods. In addition you can write GJF and the related methods as a split operator method and we show exactly how you need to “tune” the ABO methods to get both correct kinetics and thermodynamics. I also think the splitting approach is pretty useful for coding things up in already established codes.

@jhenin
Copy link
Member Author

jhenin commented Jan 22, 2023

Thank you @finkeljos . GJ-III seemed the most tempting for ts simplicity - until I realized it's unstable in the high-damping regime. I'm having a hard time deciding between the others - in part because it may not matter that much for my purposes.
For G-JF the LAMMPS implementation provides a nice working example.

@jhenin
Copy link
Member Author

jhenin commented Jan 22, 2023

Going back to the essentials, here I care about:

  1. correct configurational statistics
  2. stability in the overdamped regime
  3. saving my own time

Since I have 1 and 2 with what is currently implemented, I would keep it in the interest of 3. Now @giacomofiorin if you feel GJF would be useful in Colvars for other use cases that are not known to me at this point, and since you are an expert already, maybe you'd be in a better position to take a stab at implementing it?

@finkeljos
Copy link

@jhenin you are welcome! Actually, GJ-III is not unstable in the high damping regime. What happens is that time becomes scaled by a function of \delta t * \gamma. One really needs to consider the scaled time step and not \delta t. GJF is the one method where this time scaling factor is exactly 1. From a configurational perspective though, this is really the only difference between all the GJ methods. So I would say you may as well stick with GJF if that’s already coded up and you only care about configurational statistics.

@giacomofiorin
Copy link
Member

giacomofiorin commented Jan 31, 2023

Hi, I can look again at this in a few days. But very quickly, as @finkeljos pointed out already the schemes discussed here are all effective for straight sampling of the Boltzmann distribution. It's only if you care also about diffusivity and other time-dependent properties that not all schemes are equal. At the moment, with a confining harmonic potential, this point is moot. I would only consider revisiting the issue when you remove the harmonic potential.

@jhenin
Copy link
Member Author

jhenin commented Jan 31, 2023

To be precise, I'm removing the harmonic potential now, but I don't care about predicting diffusive properties with any precision. I need diffusion to happen and the overdamped regime to reliably exist - that is the rather low bar I have to meet. The current integration scheme of the master branch does not meet even those requirements.
Edit: in particular, when using reflecting boundary conditions and somewhat stiff underlying potentials, stability becomes an issue.

@jhenin jhenin force-pushed the BAOA branch 2 times, most recently from 927a79e to 5a94575 Compare April 5, 2023 07:30
@jhenin
Copy link
Member Author

jhenin commented Apr 5, 2023

Found issues in the implementation, in the process of running more tests...

@jhenin
Copy link
Member Author

jhenin commented Apr 21, 2023

After the fix, the regtests pass on my machine, but one (distance-wall-bypassExtended-off) fails by a narrow margin on the CI system.

@giacomofiorin
Copy link
Member

@jhenin If you re-generate the reference output using the container, you could avoid making an exception, which would also be harder to maintain across multiple engines.

@jhenin jhenin force-pushed the BAOA branch 3 times, most recently from 1c0d89e to 990b1aa Compare July 26, 2023 09:45
@jhenin
Copy link
Member Author

jhenin commented Aug 29, 2023

@giacomofiorin OK to merge this?

@giacomofiorin
Copy link
Member

@jhenin Sorry I missed the review request.

The changes look good, but I've been pondering about the regtest update. Since we never check the standard output files (e.g. test.colvars.out) how would you feel if we finally stopped tracking them? Differences in those files are examples of warnings that is routinely ignored because they are harmless, in which case maybe we just don't print them.

@jhenin
Copy link
Member Author

jhenin commented Sep 6, 2023

@giacomofiorin Yes, I was thinking the same about the standard output log, although I just wanted to remove the warnings about them. I think tracking them can still be useful to diagnose some failed tests, by manual comparison.

@giacomofiorin
Copy link
Member

@giacomofiorin Yes, I was thinking the same about the standard output log, although I just wanted to remove the warnings about them. I think tracking them can still be useful to diagnose some failed tests, by manual comparison.

Yes, that would be useful. But in that case how about just replacing here the reference outputs of only the tests that are affected by this PR?

@jhenin
Copy link
Member Author

jhenin commented Sep 7, 2023

I agree, I'll do that. Your last push was just a rebase, right?

@jhenin
Copy link
Member Author

jhenin commented Sep 7, 2023

A side question is when we switch to NAMD3 as the platform for generating reference test output. I'll still use 2.14 here, and the CPU code paths have not diverged yet anyway.

@jhenin
Copy link
Member Author

jhenin commented Sep 7, 2023

Reference test results are minimally different from those on the master branch now. Not updating LAMMPS tests for now pending fix. Gromacs tests pass, rather mysteriously.

@giacomofiorin
Copy link
Member

@jhenin Rebased this to master to include the LAMMPS changes.

@jhenin
Copy link
Member Author

jhenin commented Sep 8, 2023

The LAMMPS tests pass when built without MPI, however the MPI tests say this:

application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[55574,1],1]
  Exit code:    1
--------------------------------------------------------------------------

@giacomofiorin
Copy link
Member

The LAMMPS tests pass when built without MPI, however the MPI tests say this:

...

Hi, is this reproducible? If so, what are the OS and MPI version? I have tried using the CentOS 7 and CentOS 9 containers (run on Fedora and on CentOS 7 physical machines).

I didn't see this error specifically, but I did see occasional warnings from OpenMPI 4 about task placement with respect to the NICs, and or crashes. It may be that we need to tweak how mpirun is launched, since we're not using it on a properly configured cluster node.

@giacomofiorin
Copy link
Member

FYI I just noticed that in my latest container update I have forgotten to include lmod, which is used by the script to load OpenMPI. Not sure whether it led you to build a mixed MPI/serial binary but it's worth a check.

@jhenin
Copy link
Member Author

jhenin commented Sep 10, 2023

Hi, is this reproducible? If so, what are the OS and MPI version?

OS: Linux "Rocky Linux 8.8 (Green Obsidian)" 4.18.0-425.13.1.el8_7.x86_64 x86_64
Compiler: GNU C++ 8.5.0 20210514 (Red Hat 8.5.0-18) with OpenMP 4.5
C++ standard: C++11
MPI v3.1: MPICH Version: 3.4.2
MPICH Release date: Wed May 26 15:51:40 CDT 2021
MPICH ABI: 13:11:1

I'm trying to use OpenMPI instead, but for some reason I can't set it up properly on that OS. That said, I don't think this is blocking for this particular PR.

@giacomofiorin
Copy link
Member

Rebased onto master and solved a merged conflict. Ok to merge if tests pass.

@jhenin jhenin merged commit 51bebbf into master Sep 15, 2023
31 checks passed
@jhenin jhenin deleted the BAOA branch September 15, 2023 07:32
@giacomofiorin
Copy link
Member

@jhenin FYI I fixed the OpenMPI 4 warnings with a commit currently in master (didn't open a PR because it was just environment variables for the testing scripts)

@giacomofiorin giacomofiorin mentioned this pull request Aug 5, 2024
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants