Skip to content

Latest commit

 

History

History
989 lines (657 loc) · 220 KB

Advanced_Sampling_Introduction.md

File metadata and controls

989 lines (657 loc) · 220 KB
jupyter
jupytext kernelspec
formats main_language text_representation
ipynb,md
python
extension format_name format_version jupytext_version
.md
markdown
1.3
1.16.4
display_name name
Python 3
python3

Introduction to Advanced Sampling

Ludwig Schneider and Juan de Pablo

Pritzker School of Molecular Engineering

The University of Chicago

Berlin, July 28th, 2022

Setting up the environment

First, we are setting up our environment. We use an already compiled and packaged installation of HOOMD-blue and the DLExt plugin. We copy it from Google Drive and install PySAGES for it.

BASE_URL="https://drive.google.com/u/0/uc?id=1hsKkKtdxZTVfHKgqVF6qV2e-4SShmhr7&export=download"
wget -q --load-cookies /tmp/cookies.txt "$BASE_URL&confirm=$(wget -q --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate $BASE_URL -O- | sed -rn 's/.*confirm=(\w+).*/\1\n/p')" -O pysages-env.zip
rm -rf /tmp/cookies.txt
%env PYSAGES_ENV=/env/pysages
mkdir -p $PYSAGES_ENV .
unzip -qquo pysages-env.zip -d $PYSAGES_ENV
import os
import sys

ver = sys.version_info
sys.path.append(os.environ["PYSAGES_ENV"] + "/lib/python" + str(ver.major) + "." + str(ver.minor) + "/site-packages/")

os.environ["XLA_FLAGS"] = "--xla_gpu_strict_conv_algorithm_picker=false"
os.environ["LD_LIBRARY_PATH"] = "/usr/lib/x86_64-linux-gnu:" + os.environ["LD_LIBRARY_PATH"]

PySAGES

The next step is to install PySAGES. First, we install the jaxlib version that matches the CUDA installation of this Colab setup. See the JAX documentation here for more details.

pip install -q --upgrade pip
# Installs the wheel compatible with CUDA
pip install -q --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html &> /dev/null

Now we can finally install PySAGES. We clone the newest version from here and build the remaining pure python dependencies and PySAGES itself.

rm -rf PySAGES
git clone https://github.com/SSAGESLabs/PySAGES.git &> /dev/null
cd PySAGES
pip install -q . &> /dev/null

Advanced Sampling Methods

mkdir /content/advanced_sampling
cd /content/advanced_sampling

Many systems have rugged free-energy landscapes, where different basins are divided by energy barriers.

Examples

Collective molecular transitions

  • Protein folding
  • Protein-ligand binding
  • Liqid membrane fusion
Smirnova, Yuliya G., et al. "Solvent-exposed tails as prestalk transition states for membrane fusion at low hydration." Journal of the American Chemical Society 132.19 (2010): 6710-6718

Phase transitions

  • Liquid-vapor transition
  • Crystallization
  • Shape transitions
  • Magnetization (Ising model)

picture

Wikipedia: https://en.wikipedia.org/wiki/Ising_model

Collective Variables

Also known as order parameters in other contexts.

Usually, such transitions can be described in terms of order parameters/collective variables (CV).

$⇒$ Reduction of the phase space $\hat\xi({\vec{r}, \vec{p}})$. In general, a collective variable can be multi-dimensional. [Note: the hat "$\hat\ $" indicates that it is a collective variable that explicitly depends on the underlying molecular configuration. Without the "$\hat\ $" it usually indicates a scalar, like a target.]

Examples:

Example system: particle in multi-well

Let's start exploring an easy system and find the free-energy landscape.

  • a priori $A(\xi)$ is unknown
  • but we can measure the density of states $p(\xi)$

We can recover the free-energy profile from $p(\xi)$

$$A = - \frac{1}{\beta}\ln(Z)$$

  • $A$: free-energy
  • $Z$: NVT partition function

We can formulate the partial partition function for a collective variable $\xi$

$$Z(\xi) = \int\text{d}^N\vec{r}_i\text{d}^N\vec{p}_i \delta (\hat\xi({\vec{r}, \vec{p}} - \xi) e^{-\beta H({(\vec{r}_i, \vec{p}_i})}$$ Normalization: $$p(\xi) = \frac{\int\text{d}^N\vec{r}_i\text{d}^N\vec{p}_i \delta (\hat\xi({\vec{r}, \vec{p}} -\xi) e^{-\beta H({(\vec{r}_i, \vec{p}_i})}}{\int\text{d}^N\vec{r}_i\text{d}^N\vec{p}_i e^{-\beta H({(\vec{r}_i, \vec{p}_i})}}$$

$$\Rightarrow A(\xi) = -\frac{1}{\beta} \ln(p(\xi)) + C$$ We can determine the free-energy profile $A(\xi)$ from the probability distribution $p(\xi)$.

With regular MC/MD a finite simulation trajectory is confined to a single basin as the probability of leaving the basin scales as

$$p \propto e^{-\beta \Delta E} \ll 1$$

  • system might not equilibrate
  • trajectory is not ergodic

Computational realization via a toy system

Let's examine such a system via computer simulations.

  • fast to integration
  • custom bond potential to shape the free-energy landscape

$$P(r) = Ar^2 + A(1-e^{-r^2})\cos(r p \pi)$$

import numpy as np

def potential(x, rmin=0, rmax=100, amplitude=1., roughness=4, periodicity=1):
  energy = x**2
  energy += (1-np.exp(-x**2))*roughness*np.cos(periodicity*x*np.pi)
  energy *= amplitude
  force = 2*x
  force -= np.pi*periodicity*roughness*(1-np.exp(-x**2))*np.sin(periodicity*x*np.pi)
  force += 2*roughness*np.exp(-x**2)*x*np.cos(periodicity*x*np.pi)
  force *= -amplitude
  return energy, force
  • customizable roughness and steepness
  • symmetric around the origin
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.set_xlabel(r"$r$ $[\sigma]$")
ax.set_ylabel(r"$E$ $[k_B T]$")
ax.set_xlim((0,4))

x = np.linspace(0,4,100)
ax.plot(x, potential(x)[0], label="reference")
ax.plot(x, potential(x, roughness=6)[0], label="rougher")
ax.plot(x, potential(x, amplitude=2)[0], label="steeper")
ax.plot(x, potential(x, periodicity=2)[0], label="more minima")

# Uncommet to inspect the forces
# ax.plot(x, potential(x)[1], label="analytic force")
# ax.plot(x[:-1], -np.diff(potential(x)[0])/(x[1]-x[0]), label="numeric force")

ax.legend(loc="best")
fig.show()
  • We can choose and adjust the potential

Here we choose a system with multiple basins separated by energy barriers.

However, the free energy of this system is not the energy of this bond. Because we are using a distance the system is best described in radial coordinates, which includes the Jacobi determinant $2\pi r^2$ in the weight.

$$p(r) \propto 2\pi r^2 e^{-\beta E(r)}$$

Hence we can obtain the free energy with a logarithmic correction.

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.set_xlabel(r"$r$ $[\sigma]$")
ax.set_ylabel(r"$A$ $[k_B T]$")
ax.set_xlim((0,4))

def correct_free_energy(x, energy):
  corrected_free_energy = energy - np.log(2*np.pi*x**2)
  corrected_free_energy -= corrected_free_energy[1]
  return corrected_free_energy


x = np.linspace(0,4,100)
ax.plot(x, correct_free_energy(x, potential(x)[0]), label="reference")
ax.plot(x, correct_free_energy(x,potential(x, roughness=6)[0]), label="rougher")
ax.plot(x, correct_free_energy(x, potential(x, amplitude=2)[0]), label="steeper")
ax.plot(x, correct_free_energy(x, potential(x, periodicity=2)[0]), label="more minima")

ax.legend(loc="best")
fig.show()
  • Notice how the unlikely short distances remove the first minimum around 0
  • use of a single bond between two particles to realize the potential in HOOMD-blue
import hoomd
import hoomd.md

kBT=1

def generate_context(**kwargs):
    """
    Generates a simulation context, we pass this function to the attribute
    `run` of our sampling method.
    """
    fes_coeffs = kwargs.get("fes_coeffs", {"amplitude": 1., "roughness": 4, "periodicity": 1})
    hoomd.context.initialize("")

    ### System Definition
    snapshot = hoomd.data.make_snapshot(
        N = 2,
        box = hoomd.data.boxdim(Lx = 50, Ly = 50, Lz = 50),
        particle_types = ['P', 'G'],
        bond_types = ["bond"],
    )

    snapshot.particles.typeid[0] = 0
    snapshot.particles.typeid[1] = 1

    # Refernce particle at an extension and a ghost particle at origin
    positions = np.array([[3.0,  0,  0], [0, 0,  0]])

    snapshot.particles.position[:] = positions[:]

    snapshot.bonds.resize(1)
    snapshot.bonds.typeid[0] = 0

    snapshot.bonds.group[:] = [[0, 1]]

    hoomd.init.read_snapshot(snapshot)

    # Connect custom bond to create energy landscape
    fes = hoomd.md.bond.table(width=500)
    fes.bond_coeff.set("bond", func=potential, rmin=0, rmax=10, coeff=fes_coeffs)

    dt=1e-3
    hoomd.md.integrate.mode_standard(dt = dt)
    # We do not integrate the ghost particle
    integrator = hoomd.md.integrate.nvt(group = hoomd.group.all(), kT = kBT, tau = 100*dt)
    integrator.randomize_velocities(seed = 42)

    return hoomd.context.current

The bond realizes the custom energy landscape between two particles. In this case, we can determine the free energy with the Jacobian correction and the bond energy. This is only true because of the simple nature of this toy system. Usually, the free energy is unknown and needs to be discovered via advanced sampling methods.

Collective variable (order parameter)

Here we choose the distance from the origin as the collective variable.

We use PySAGES to define this collective variable.

from pysages.colvars import Distance
import pysages

# Distance from our particle to origin (particle 1)
cvs = [Distance(([0], [1]))]

Unbiased simulation

Next, we are interested in an unbiased simulation.

PySAGES offers a special method for unbiased simulations, that can still record the collective variable.

from pysages.methods import Unbiased
method = Unbiased(cvs)

We also want to track the collective variable over time and as a histogram, so we can log it via PySAGES.

from pysages.methods.utils import HistogramLogger
hist = HistogramLogger(period=100)

We now simulate $1\times10^5$ time steps. To investigate the unbiased trajectory and statistics.

result = pysages.run(method, generate_context, int(1e5), callback=hist)

Let's see how the particle moved in this potential landscape.

def plot_one_result(result):
  histogram_log = result.callbacks[0]
  cv_log = np.asarray(histogram_log.data)
  time = np.linspace(0, len(cv_log)*0.1, len(cv_log))
  fig, ax = plt.subplots()
  ax.set_xlabel(r"$t$ $[\tau]$")
  ax.set_ylim((0, 4))
  ax.set_ylabel(r"$\xi$ $[\sigma]$")

  ax.plot(time, cv_log, label="cv trajectory")

  ax2 = ax.twiny()
  ax2.set_xlabel(r"$A(\xi)$ $[k_BT]$")
  x = np.linspace(0, 4, 200)
  corrected_free_energy = potential(x)[0]- np.log(2*np.pi*x**2)
  corrected_free_energy -= corrected_free_energy[1]
  ax2.plot(correct_free_energy(x, potential(x)[0]), x, label="energy landscape", color="orange")

  ax.legend(loc="center right")
  ax2.legend(loc="upper left")
  fig.show()
plot_one_result(result)

We see, that the system never leaves the local minimum around $\xi=3$. Since the phase space is not fully explored the prediction of the free energy is not complete. Here the system is not even equilibrated.

The sampling is not ergodic! This is common for normal MD (although not as easy to spot usually).

###Solution: Biased simulations

Introducing a weight function $w(\xi({r,p}))$ to allow phase space exploration.

$$p({(r,p)}) \propto p_{eq}({r,p})/w(\xi({r, p}))$$ $p({(r,p)})$ is the biased distribution, $p_{eq}$ refers to the unbiased equilibrium distribution.

Find the biased distribution as a function of the collective variable $\xi$. $$p(\xi) = \int\text{d}r\text{d}p \delta(\xi - \hat\xi) p({(r,p}))$$ $$p(\xi) = \int\text{d}r\text{d}p \delta(\xi - \hat\xi) \frac{p_{eq}({(r,p)})}{w(\xi)}$$ $$p(\xi) = \frac{1}{w(\xi)} \int\text{d}r\text{d}p \delta(\xi - \hat\xi) p_{eq}({(r,p)})$$ $$p(\xi) = \frac{p_{eq}({(r,p)})}{w(\xi)} $$

  • with known distribution $p(\xi)$ (measured) and known weight $w(\xi)$, we can infer $p_{eq}$
  • optimal weight $w(\xi) = p_{eq}(\xi)$: almost random walk in $\xi$
  • optimal weight a priori unknown

For NVT MD we can modify the Hamiltonian to introduce a weight. $$p({(r,p)}) \propto e^{-\beta H^0({(p,r)})} \frac{1}{w(\xi({(r,p)}))} = e^{-\beta (H^0({(p,r)})+H^w({(r,p)}))$$

$$\Rightarrow H^w({(r,p)} = k_BT \ln(w(\xi({(r,p)})))$$

Here is where PySAGES comes into play! PySAGES allows you to easily (python code) introduce a biasing Hamiltonian into a given MD backend (like HOOMD-blue, OpenMM, or ASE). So it is not necessary to modify the MD backend and via JAX we offer automatic differentiation, so forces are calculated automatically.

Harmonic Biasing

We can start biasing by using a simple harmonic biasing, where we bias the system towards one of the less explored regions in phase space.

$$H^b(r) = \frac{k}{2} (c-r)^2$$

PySAGES offers a pre-implemented method class, that we are utilizing.

In our example toy system, we choose $c=2\sigma$ as a maximum of our external potential.

We don't know a priori what a good spring constant is. Let's start with $k=1 \frac{k_BT}{\sigma^2}$.

from pysages.methods import HarmonicBias
method = HarmonicBias(cvs, kspring=1, center=2)
hist = HistogramLogger(period=100)
result = pysages.run(method, generate_context, int(1e5), callback=hist)

Ok, we analyze the trajectory as before to see how the energy landscape is explored now.

plot_one_result(result)

We observe that the free-energy barrier at $c=2\sigma$ is already much better explored, but the biasing force is only strong enough to pull the particle across the barrier once.

Let's try $k=100\frac{k_BT}{\sigma^2}$.

method = HarmonicBias(cvs, kspring=100, center=2)
hist = HistogramLogger(period=100)
result = pysages.run(method, generate_context, int(1e5), callback=hist)
plot_one_result(result)

Ok, now the system mostly oscillates around the maximum with two minima, but these two minima are not close to the actual minima of the free-energy landscape.

The spring constant is so strong, that restricts the exploration of the phase space too much. Let's try the middle ground instead $k=10\frac{k_BT}{\sigma^2}$.

kspring=10
method = HarmonicBias(cvs, kspring=kspring, center=2)
hist = HistogramLogger(period=100)
result = pysages.run(method, generate_context, int(1e5), callback=hist)
plot_one_result(result)

This looks much better!

We observe multiple transitions between the minima at $c=1\sigma$ and $c=3\sigma$ (rare events), so the phase space is better explored. We also see that the lower minimum is frequented more than the upper one as expected.

We now analyze the histograms of this trajectory to determine the free-energy landscape $A(\xi)$ from the biased simulation.

from scipy import integrate
def plot_one_histogram(result):
  histogram_log = result.callbacks[0]

  hist, edges = histogram_log.get_histograms(bins=30, range=[(0,4)])
  fig, ax = plt.subplots()
  ax.set_xlabel(r"$\xi$ $[\sigma]$")
  ax.set_ylabel(r"p(\xi)")
  ax.set_xlim((0, 4))

  x = edges[0][:-1] + np.diff(edges[0])/2
  ax.plot(x, hist, label=r"biased $p(\xi)$")
  weight = np.exp(-kBT*kspring/2*(x-2)**2)
  unbiased_distribution = hist/weight
  unbiased_distribution /= integrate.simpson(unbiased_distribution, x)
  ax.plot(x, unbiased_distribution, label=r"unbiased $p_{eq}(\xi)$")

  ax.legend(loc="best")
  fig.show()
plot_one_histogram(result)

We can see, that the unbiased distribution puts the minima in the wrong place, but correcting it with the weight gives us the correct minima positions. However, we can't be sure that this is the correct profile yet.

So let's compare to the expected free-energy profile.

$$A(\xi) = -k_BT \ln(p_{eq}(\xi) + C$$

def plot_one_free_energy(result):
  histogram_log = result.callbacks[0]

  hist, edges = histogram_log.get_histograms(bins=30, range=[(0,4)])
  fig, ax = plt.subplots()
  ax.set_xlabel(r"$\xi$ $[\sigma]$")
  ax.set_ylabel(r"A(\xi)")
  ax.set_xlim((0, 4))

  x = edges[0][:-1] + np.diff(edges[0])/2

  weight = np.exp(-kBT*kspring/2*(x-2)**2)
  unbiased_distribution = hist/weight
  unbiased_distribution /= integrate.simpson(unbiased_distribution, x)

  estimated_profile = -kBT * np.log(unbiased_distribution)
  constant_C = -np.min(estimated_profile) + np.min(potential(x)[0])
  ax.plot(x, estimated_profile + constant_C, label=r"estimated $A(\xi)$")
  ax.plot(x, correct_free_energy(x, potential(x)[0]), label=r"true $A(\xi)$")

  ax.legend(loc="best")
  fig.show()
plot_one_free_energy(result)

That estimation is not bad. We get the approximate right shape in the middle and that could be further improved by running the sampling trajectory longer. Or try a different spring constant. [Try it out!]

But there are still some issues because we still cannot sample the entire space:

  • the up trend on the right is uncovered
  • the energy barrier is underestimated
  • the first minimum is under-sampled

Can we bias simulations in these regions too, to improve sampling coverage?

Umbrella Sampling

We want to find the free-energy profile along a given path in the space for collective variables. Usually, this path can be multidimensional.

Example dihedral angles of Alanine Dipeptide. PySAGES Alanine Dipentide examples

Wu, Xiongwu, Bernard R. Brooks, and Eric Vanden‐Eijnden. Journal of computational chemistry 37.6 (2016): 595-601.

Bonomi, Massimiliano, Carlo Camilloni, and Michele Vendruscolo. Scientific reports 6.1 (2016): 1-11.

Downside:

We need to know the path $\xi_0(t)$ we are interested in.

Goal:

Obtain the free-energy profile along the path $A(\xi_0(t))$.

Discretize the path $\xi_0(t_i)$ and bias the simulation at each point $i$ along the path.

$$H_i({(r)}) = H^0({(r)}) + H^b_{i}(\xi({(r)}))$$

$H^b_{i}$ biases the system close to $\xi_0(t_i)$ along the path.

$$p_i^u(\xi) = \frac{\int\text{d}r e^{-\beta H(r)} \delta(\hat\xi(r) - \xi)}{\int\text{d}r e^{-\beta H(r)}}$$

$$p_i^b(\xi) = \frac{\int\text{d}r e^{-\beta[H(r) + H^b_{i}(\hat\xi(r))]} \delta(\hat\xi(r) - \xi)}{\int\text{d}r e^{-\beta[H(r) + H^b_{i}(\hat\xi(r))]}}$$

$$ = e^{-\beta H^b*{i}(\xi)} \frac{\int\text{d}r e^{-\beta H(r)} \delta(\hat\xi(r) - \xi)}{\int\text{d}r e^{-\beta[H(r) + H^b*{i}(\hat\xi(r))]}}$$

$$\Rightarrow p_i^u(\xi) = p_i^b(\xi) e^{\beta H^b_{i}(\xi)} \frac{\int \text{d} r e^{-\beta H^b_{i}(\xi(r))} e^{-\beta H(r)}}{\int\text{d}r e^{-\beta H(r)}}$$

So we can describe the last term as a thermal average.

$$\Rightarrow p_i^u(\xi) = p_i^b(\xi) e^{\beta H^b_{i}(\xi)} \langle e^{-\beta H^b_{i}}\rangle$$

And the free energy is

$$A_i(\xi) = -k_BT \ln(p_i^b(\xi)) - H^b_{i}(\xi) + F_i$$ With $F_i = -k_BT\ln(\langle e^{-\beta H^b_{i}}\rangle)$ as constant independent of $\xi$.

With a single biasing potential $F_i$ does not matter. $A(\xi)$ is defined up to a constant.

But if we want to combine multiple windows, we have to estimate it.

$$e^{-\beta F_i} = \langle e^{-\beta H^b_{i}}\rangle$$ $$ = \int \text{d} \xi p^u(\xi) e^{-\beta H^b*{i}(\xi)}$$ $$ = \int\text{d}\xi e^{-\beta [A(\xi) + H^b*{i}(\xi)}$$

not directly solvable.

Shape of the potential

Ideal $e^{-\beta H^b_{i}}(\xi) \propto p_{eq}(\xi)$ and differentiable.

  • For now we use a harmonic potential
  • fix the system with a spring to point on the path. $$H^b_{i}(\xi) = k/2 (\xi - \xi_0(t_i))^2$$
  • combine different points into one analysis
centers = np.linspace(0, 4, 10)
kspring = 100
fig, ax = plt.subplots()
ax.set_xlabel(r"$\xi$")
ax.set_ylabel(r"$H_i(\xi)$")
ax.set_ylim((-10, 20))
ax.set_xlim((0,4))

x = np.linspace(0, 4, 200)
ax.plot(x, correct_free_energy(x, potential(x)[0]), label="potential")
for point in centers:
  label = None
  if point == 0:
    label = "biasing potential"
  ax.plot(x, kspring/2*(x-point)**2, label=label)
ax.legend(loc="best")
fig.show()
  • one free parameter per window $k$ Goal: $\Rightarrow$ overlapping histograms

$$\int \text{d} \xi p_i^b(\xi) p_{i+1}^b(\xi) \gg 0$$

from pysages.methods import UmbrellaIntegration
method = UmbrellaIntegration(cvs, kspring, centers, 100)
result = pysages.run(method, generate_context, int(1e5))

You can already see one downside, we have to run multiple replicas, which consume more computational resources.

Let's see what the histograms look like.

def plot_multi_histogram(result):
  fig, ax = plt.subplots()
  ax.set_xlabel(r"$\xi$ $[\sigma]$")
  ax.set_ylabel(r"p(\xi)")
  ax.set_xlim((0, 4))
  for histogram_log in result.callbacks:
    hist, edges = histogram_log.get_histograms(bins=30, range=[(0,4)])
    x = edges[0][:-1] + np.diff(edges[0])/2
    ax.plot(x, hist, label=r"biased $p(\xi)$")
  fig.show()
plot_multi_histogram(result)

The result is OK, good enough for our test case here. We have some overlap between the histograms.

Umbrella Integration

Now, how can we combine these histograms into a single free-energy profile?

$$A_i(\xi) = -k_BT \ln(p_i^b(\xi)) - H^b_{i}(\xi) + F_i$$

One option is weighted histogram analysis (WHAM), but that works best for one-dimensional collective variables. (Implement it yourself, all you need from PySAGES is available.)

Here we are using a simple umbrella integration instead.

Idea: differentiate by $\xi$ to eliminate $F_i$.

$$\frac{\partial A_i^u}{\partial \xi} = - k_BT \frac{\partial \ln(p_i^b(\xi))}{\partial \xi} - \frac{\partial H^b_{i}(\xi)}{\partial \xi}$$

For this we approximate $p_i^b(\xi)$ as a Gaussian with the mean $\bar\xi_i$ and variance $\sigma_i^b$.

$$p_i^b(\xi) = \frac{1}{\sigma_i^b \sqrt{2\pi}} e^{-\frac{1}{2} \frac{(\xi-\xi_i^b)^2}{\sigma_i^{b2}}}$$

approximating up to second order gives us

$$\frac{\partial A_i^u}{\partial \xi} = - k_BT \frac{\xi-\bar\xi_i}{\sigma_i^{b2}} - k_i (\xi - \xi_0(t_i)).$$ We can also approximate up to first order only $$\frac{\partial A_i^u}{\partial \xi} = k_i (\xi - \xi_0(t_i)),$$ which is easily expandable to higher order collective variables and the implementation in PySAGES.

Now we can combine all these windows

$$\frac{\partial A}{\partial \xi} = \sum_i p_i(\xi) \frac{\partial A_i^u}{\partial \xi}$$ with $$p_i(\xi) = \frac{a_i(\xi)}{\sum_j a_j(\xi)}$$ and $$a_i(\xi) = N_i p_i^b(\xi)$$ with $N_i$ being the number of samples in window $i$.

And integrate the derivative for the profile numerically.

  • this averages the mean-force
  • we are approximating the distribution
  • easy to generalize in higher dimensions

Let's see how PySAGES analyzes it for us and produces the free-energy result.

def plot_umbrella_free_energy(result):
  processed_result = pysages.analyze(result)

  fig, ax = plt.subplots()
  ax.set_xlabel(r"$\xi$ $[\sigma]$")
  ax.set_ylabel(r"A(\xi)")
  ax.set_xlim((0, 4))

  centers = np.asarray(processed_result["centers"])[:,0]
  free_energy = np.asarray(processed_result["free_energy"])
  ax.plot(centers, free_energy, label=r"estimated $A(\xi)$")
  x = np.linspace(0, 4, 50)
  ax.plot(x, correct_free_energy(x, potential(x)[0]), label=r"true $A(\xi)$")
  ax.legend(loc="best")
  fig.show()
plot_umbrella_free_energy(result)

This appears to be much better. Even with the crude approximations, we were doing we can estimate the shape of the potential.

Just the second minimum is underestimated, which could be fixed with more sampling and more sampling points in that vicinity. [Try it out!]

Difficulties:

  1. choose a good spring constant
    • too large and the histograms don't overlap
    • too small and you can sample barriers
  2. choose a good number of replicas

Can we do better than this? Yes, of course:

  • Meta-dynamics: approximate one weight function with a sum of Gaussians
  • ANN: approximate biasing force with artificial neuronal networks (ANN)

GPU computing

This example was using a single CPU core only. For a realistic system, GPU acceleration is required.

What do we have to do to run this code on a GPU?

Nothing!

Just change the runtime environment to GPU and you are good to go. Or run your examples on the GPU accelerated workstation.

PySAGES is preinstalled, so you are good to go!

String algorithm

We can determine the profile of free energy now. In more than one dimension the profile depends on the path taken.

Bonomi, Massimiliano, Carlo Camilloni, and Michele Vendruscolo. Scientific reports 6.1 (2016): 1-11.

So how do we find the minimum (free) energy path (MEP)?

This path runs through a valley of the free-energy landscape and has the smallest free-energy barrier. Hence it is the most likely kinetic path.

The formal definition of this path is that the forces on a state point are always parallel (up or down not sideways) to the path.

$$\xi_0(t) \parallel \Delta V$$ The normal component of the force $\Rightarrow \Delta V^\perp = 0$ is zero.

General Idea:

Move the string (path) along the perpendicular component $\Delta V^\perp$ at each discretized point $v_i = - \Delta V^\perp$. (Gradient descent) The parallel component would move the discretized point just along the path and end up in the minima.

Problem 1:

How to get $\Delta V$? [Especially if it is a free-energy]

Solution: Harmonic biased simulations at each replica of the string

$$\Delta V|_{t_i} = - k \langle \xi_0(t_i) - \xi \rangle$$

[computationally expensive]

Problem 2:

How to get the perpendicular component?

  • the discretization of the path is rough

Solution: use a cubic spline-interpolation

  • piece-wise 3rd order polynomial
  • easy to parameterize
  • easy to interpolate
  • differentiable

$\Rightarrow$ perpendicular component $\Delta V^\perp$! $\Rightarrow$ smooth path $\xi_0^s(t)$ with $\xi_0^s(t) = \xi(t_i)$.

Improved string method

Using the perpendicular component directly does not lead to good convergences. Instead, we can use the improved string method.

Iterate:

  1. Calculate $\Delta V|_{t_i}$
  2. Move string $\xi_0(t_i)$ += $\alpha \Delta V|_{t_i}$
  3. Calculate spline $\xi_0^s(t)$
  4. Re-parametrize the string points.

    Such that $|\xi0^s(t_i) - \xi_0^s(t{i+1})| = $const.

    • cancels out parallel force component
    • well-discretized string
    • requires a norm in $\xi$ space. For example $L^2$ norm in $\xi$ space.

Stop iteration if converged and test $\Delta V^\perp = 0$.

  • typically slow convergence

With the converged path, calculate the free-energy profile using umbrella integration.

What is a good collective variable?

Let us define a committor probability.

For a given point $r_i$ we can define the probability that a simulation started from this point ends in minima $B$ first before it moves through $B$. $$\text{commitor probability B: } p_B(r_i)$$

This is a probability since we can have multiple realizations of $r_i$ in momentum space (Maxwell-Boltzmann distribution). Each of these realizations has its path and we simulate them and measure if they arrive in $A$ or $B$ first.

commitor.png

Intuitively, if $\xi$ is at the halfway point of the path ($t=1/2$) the committor probability should be $1/2$. And the probability decreases if we move towards $A$ ($t < 1/2$) the probability should also decrease $p_B(t < 1/2) < 1/2$ and vice versa $p_B(t>1/2) > 1/2$.

fig, ax =plt.subplots()
ax.set_xlabel("$t$")
ax.set_xlim((0,1))
ax.set_ylabel("$ p_B(t)$")

x = np.linspace(0, 1, 50)
p_good = np.tan(2*(x-.5))
p_good -= np.min(p_good)
p_good /= integrate.simpson(p_good, x)
p_bad = x*0+1
p_bad /= integrate.simpson(p_bad, x)

ax.plot(x, p_good, label="good path")
ax.plot(x, p_bad, label="bad path")

ax.legend(loc="best")
fig.show()

We can generalize this concept to the entire space.

First, the isosurfaces of the collective variable are all points in space $(r,p)$ where $\xi((r,p))=c$ are equal and $c$. In our example, the sphere around the origin with distance $c=0.5$.

We can also define isosurfaces for the committor probabilities. All points in space where the committor probabilities are equal $p_B(r_i) =c$.

For a descriptive collective variable, we want that when changing $\xi$ the committor probability changes equally. This is best ensured if the isosurfaces of the collective variables are aligned with the isosurfaces of committor probabilities.

Usually, calculating committor probabilities is computationally so expensive that it is prohibitive. But it is a way to check results in case of doubt about collective variables.

text1446.png