Practical 5: Free energy calculations from non-equilibrium dynamics

Bert de Groot

Contents
Introduction: Free energy from non-equilibrium processes:
- The Jarzynski equation
- The Crooks theorem
Solvation free energy of a sodium ion in water using the Crooks theorem
Potential of Mean Force for a potassium ion permeating an ion channel
Optional further analysis
References

Introduction

In a previous tutorial we learned two methods to compute Free Energies associated to a given thermodynamic process: the solvation Free Energy of sodium ions in water and the folding of a peptide in water. In the first case we used Free Energy Perturbation (FEP), that takes advantage of the fact that Free Energy is a state function. For an equilibrium process, the Free Energy is independent of the path the system takes to move from state A (no sodium in water) to state B (sodium fully solvated in water).


As the sodium-water interactions were slowly turned on, we recorded the changes in potential energy. Integration of these potential energy differences over the morphing path yields the Free Energy.
To study the themodynamics of peptide folding, we relied on the Boltzamn probability distribution of themodynamic ensembles in equilibrium. The relative population of folded and unfolded states was extracted from a molecular dynamics simulation, which gives direct access to Free Energy differences.

So far we have used equilibrium dynamics to compute Free Energies: the transition between states is done infinitely slow, or reasonably slow, and the work performed/gained is equal to the Free Energy. By extracting Free Energies from equilibrium simulations we have to make sure that the system fulfills a Boltzmann equilibrium. In practice this means that we have to sufficiently sample the phase space available. While this is conceptually not a problem, in practice this poses a problem for systems with many degrees of freedom and long relaxation times.

The Jarzynski equation

As a follow up, today we will learn that we can use non-equilibrium dynamics to compute Free Energies. Although several new relationships for non-equilibrium thermodynamics were found during last century, the link between the work and the Free Energy remained as an inequality: the work is always greater or equal to the underlying equilibrium Free Energy ( ). It was not until the very end of last century (1997-1998) that new expressions directly relating the work and the Free Energy were found. The first one, known as the Jarzynski equation, relates the ensemble average over an exponential distribution of the work with the Free Energy.


This relationship turned out to be extremely useful, as it allows, for example, to compute binding Free Energies of ligands by repeatedly enforcing the unbinding and recording the work distribution. This process can be mimicked in a computer experiment, with the advantage that the simulation times are reduced due to the fast non-equilibrium trajectory. Nevertheless, several experiments have to be carried out to get a reasonable estimate of the work distribution.

The Crooks theorem

Yet another non-equilibrium relationship, even more fundamental, was put forward by Crooks in 1999. The expression is grounded on microscopic reversibility, and one useful theorem derived from it is the Transient Fluctuation Theorem (TFT). The former relates the ratio of the work distribution for a Markovian process that moves from state A to state B and from state B to a A with the dissipated work involved in the transformation. Look at the wikipedia page for more information, if you're interested.




A first approach is to estimate the free energy as the work value W where both distributions overlap, PF(W) = PB(-W):




A second approach is to take the logarithm of the Crooks equation,




Then, plot the left side of this equation vs. the work, which is a straight line, and estimate the free energy from the intercept.

Today we will take advantage of these recent breakthroughs to compute Free Energies ( an equilibrium property ) from non-equilibrium processes. To illustrate the capabilities of these relationships we will first study the solvation Free Energy of an ion in water, now by means of several fast (non-equilibrium) Free Energy Perturbation simulations. A second application of non-equilibrium relationships will serve us to elucidate the Free Energy profile for an ion permeating through a peptidic ion channel.

Go back to
Contents

Solvation free energy of a sodium ion in water using the Crooks theorem

This exercise uses the same simulation technique (FEP) previously introduced in these tutorials. We learned that sufficiently long simulations were needed to obtain a converged value for the solvation Free Energy, where the morphing parameter lambda was slowly changed from 0 (no ion-water interactions) to 1 (all interactions switched on).


Another way of computing the solvation Free Energy would be to perform multiple (fast) simulations. Some will drive the system from an equilibrated state A (lambda = 0) to a final state B (lambda = 1). We will call such a process the Forward transformation. Additionally, some will be perfomed in the opposite direction (Reverse transformation), e.g. an equilibrated system in the B state will be transformed into state A. By collecting the work performed in either Forward or Reverse transformation we can obtain their respective probability distributions. Recalling Crooks' Transient Fluctuation Theorem we know that the point where the two probability distributions meet the dissipative work is zero. Hence, at this point the work will equal the Free Energy.




As we saw last week, from each of the transition simulations we get a dhdl.xvg file that contains the dH/dlambda as a function of time. These curves can be integrated over dlambda to extract the work. As described by the Crooks theorem the intersection of the distributions for the forward and backward transitions are a free energy estimate.
For this week's exercise we ran many fast (10ps) simulations and obtained a dhdl.xvg file for each of them. Here you find one directory for the forward and one directory for the backward transitions. To extract the content, use

wget http://www3.mpibpc.mpg.de/groups/de_groot/compbio/p10/400runs.tar.gz
tar -xvzf 400runs.tar.gz

We now want to plot the distributions of the work values for the forward and backward transition to be able to use Crooks theorem.

To do so go to the directory of the forward transitions first.

cd 400runs/FWD_RUN_TFT

To gather the work values from the dhdl.xvg files a script has been prepared. Run it

./gather_work.csh |  grep -v csh > ../Work_fwd.txt

This can take a few seconds, but eventually it writes a file containing all forward work values. To be able to use Crooks theorem we need the backward work values as well. To collect them go to the backward directory

cd ../REV_RUN_TFT 
./gather_work.csh | grep -v csh > ../Work_rev.txt

Let's see if we computed enough values to estimate a reasonable probability distribution of the work.
If we don't have any extra information on the distribution function, it is necessary to have an overlap between the two distributions, since Crooks' TFT states that the work value that equals the true Free Energy is the one which has the same probability in the forward and in the reverse distribution.
To compare the work values, we will generate histograms (or numerical estimates of unormalized probability distributions) for both work distributions with xmgrace,
open them both directly with

cd .. 
xmgrace Work_fwd.txt Work_rev.txt

To compute histograms, first of all create a new graph: Go to "edit" -> "arrange graphs" and set the "cols" value to 2. Apply the changes and close. Now you have two graphs. To compute the histograms, go to "data" -> "transformations" and click on "histograms". From the "source" list click now on the first (g0) and then the first "set" (go.so). As a "destination" list select the second (g1). Now have a look at the values you have for the distribution, typically between -420 to -360. Use these numbers as "start at" and "stop at" and something like 40 bins. The more data points, the more bins we can use. Click in "normalize" and then press "apply" to perform the histogram on the first data set. To obtain one for the second data set, simply select (g0.s1), under the second "set" in the first graph in the "source" list and press "apply". In order to see the histograms, select the new graph by clicking it with the mouse and then press the "as" (for auto scale) button on the upper left.

Question: Based on the two adjacent histograms, what is the estimate for the free energy value and how does it compare to the expected value of around -393 kJ/mol? Do you think 10 simulations in each direction would have been enough to get an reasonable estimate?

OPTIONAL: Estimation of the free energy taking the logarithm of the Crooks equation

Once you obtained the work distributions, it is possible to estimate the free energy taking the logarithm of the Crooks equation (see here). You should extract the portion of both distributions in which there is an overlap and PF(W), PB(-W) are not zero (to avoid singularities).
First recompute the histograms, and filter the overlap region (work values between -404 and -385 kJ/mol):

sort -g Work_fwd.txt | awk -f histogram.awk | awk '$1 > -404 && $1 < -385 {print $0}' > Work_fwd_over.his
sort -g Work_rev.txt | awk -f histogram.awk | awk '$1 > -404 && $1 < -385 {print $0}' > Work_rev_over.his


now collect for each non-zero count of the forward and reverse work distribution the logarithm:

paste Work_fwd_over.his Work_rev_over.his | awk '$2>0 && $4> 0 {print $1,log($2/$4)}' > W_log.txt


Then, plot ln ( PF(W) / PB(-W) ) as a function of W and do a linear regression. The slope of this curve is β = 1/KT and the intercept is -βΔ F

xmgrace W_log.txt

and in xmgrace, under Data -> Transformations, click 'Regression'.

Questions: How much does the regression deviate from a straight line? How does the estimate of β = 1/KT compare with the input value, β = 1/(2.5 kJ/mol) , at the given temperature of 300 K? What is the estimate of the free energy? How does it compare with the previous estimations?

This is a short version of a more extended tutorial. If you are interested in running the simulations yourself you can do the first part of the original tutorial.


Go back to Contents

Potential of Mean Force for a potassium ion permeating an ion channel

Ion channels, located in the lipidic membrane that separates organelles or cells, allow the passage of ions from one side of the membrane to the other. Their structure and composition is diverse, although they share some characteristics. Gramicidin is a small peptidic ion channel of 30 aminoacids, whereas potassium ion channels, like KcsA, are proteins with quaternary structure. Their function affects electrochemical gradients, and thus ion channels have a fundamental role in the bioenergetics of living cells. A subtle balance of ion gradients is, for example, responsible for the polarization-depolarization of neuron's membrane that leads to electrical signal transmission in nerve cells. Many membrane proteins need to couple their transport mechanisms to ion gradients. For these and many other reasons, ion channels are of great interest in biophysics and electrophysiology, as well as in medicine.


Ion channels differ from ion transporters because they do not need an external energy source to achieve their functionality. Their activity can be gated by several means: voltage, drug binding, pressure, or simple ion gradient.
Most ion channels present a selectivity filter to achieve specificity for certain types of ions. These regions are usually carefully constructed to have favorable stabilizing environment for the ion they want to permeate. Ions in solution are stabilized by the interactions with the solvation water shell, as we saw in the previous exercise. As the ion enters the channel several interactions with solvating waters disappear, which can not be fully compensated by the channel's partially hydrophobic interior. Since ions are charged species, this desolvating effect is much stronger than for water, and so the ion flux is significantly lower compared to water flux.

Molecular dynamics is a very powerful tool to study the energetics of ion channels, provided a good set of parameters that describes the interactions between channel, ions and membrane. Nevertheless, the high activation energy for ion permeation makes direct estimation of ion flux almost impossible: the simulation time needed to observe a significant amount of ion translocations is currently out of reach. Luckily, rate theory provides a clear connection between the underlying energetics for ion permeation and their rates or fluxes. Several theories and methodologies exist that connect the Free Energy profile, or Potential of Mean Force, with rates. Although they vary in the level of sophistication, the main idea is that the flux is propotional to the ratio of populations in the lower energy state and the population in the highest energy along the path. Since the populations in steady state are Boltzman weighted, the flux is inversely proportional to the exponential of the activation energy, the difference in energy between the lowest and highest energy along the path.



In order to predict ion selectivity, or compare ion fluxes qualitatively, we can then extract the Potential of Mean Force for the desired ion permeation and compare the activation energies. In this exercise we will learn how to obtain a profile for an ion permeating an engineered peptidic channel. Their cylindric symmetry clearly defines the main pore axis as the reaction coordinates. In this respect, we will effectively project all the forces of the orthogonal degrees of freedom onto the main reaction coordinate, reducing the complexity of the system to a 1D mean interaction potential. Hence the name of Potential of Mean Force.
In equilibrium simulations, the Potential of Mean Force could be extracted by integration of the force the ion feels as it permeates the pore, or by computing the ion number density (probability distribution) along the channel.
As said before, the lack of ion mobility in the time scales available for Molecular Dynamics makes it compulsory to find alternative methods. The common approach is usually to force the ion to sample a given region of the reaction coordinate using an external potential of known characteristics. Umbrella Sampling method restrains the ion at given bins along the channels using a harmonic potential. After removing the force due to the harmonic potential we obtain the underlying Potential of Mean Force. The method needs good sampling of the several bins to achieve a converged equilibrated ensemble, especially critical in the regions where the potential is steep.
Another possibility would be to drive the ion along the pore by means of a guiding harmonic potential: at a given speed, we pull the ions through the pore using a harmonic spring. If done infinitely slowly, the system does not dissipate energy and the work done at a given coordinate equals the Potential of Mean Force.

Molla
But thanks to the non-equilibrium relationships discussed above, we can extract Free Energies out of an ensemble of non-equilibrium trajectories. Here we will use a special application of Crooks' TFT and the fact that our spring force constant will be very high. The method is know as FR method, or Forward-Reverse.



Crooks' TFT and the stiff spring approximation

The stiff spring approximation guarantees that the work distribution obtained is Gaussian. Combining this result with Crooks' TFT, we can see that if the Forward trajectories generate a Gaussian work probability distribution with mean and variance sigma, the Reverse probability distribution is also Gaussian with mean and the same variance sigma.


Putting all this together, we see that the Potential of Mean Force can be simply computed as the mean of the averaged work in Forward and Reverse direction (the minus sign is due to the definition of the dissipated work). There is no need to compute the overlapping point between the two work distributions, since they are Gaussians with the same variance: the point were they cross is the average of the two mean values.

FR visual

Since generating enough trajectories to estimate a work average in the forward and reverse directions is time consuming, here we will simply work on a set of pre-computed dynamics. For each ion being pulled in either direction, we obtain a file describing the position of the ion and the spring reference position at any time. Given a stiff force constant, that we previously selected (10000 kJ / mol nm^2), it is straightforward to compute the force the system is exerting on the ion as a function of the spring position. Integration of the resulting force along the pore yields the work in a given direction.

You will find a multiple entry pdb to visualize a trajectory of an ion being pulled through a channel. Download the file and open it with pymol:

wget http://www3.mpibpc.mpg.de/groups/de_groot/compbio/p10/pulling_movie.pdb
pymol pulling_movie.pdb

If you did not see a potassium in orange, try the following: in Pymol's command line write

select potassium, name PO1

And in the newly created entry on the main program's window, name (potassium), select "S" and "spheres". You can play the movie using the controls on the lower right side of the main window. Have a look and quit the viewer when satisfied.

Now, remove the movie:

rm -f pulling_movie.pdb

And we start with the actual analysis.

You will find all the needed files and scripts here to follow the exercise. Download it to your account.

First switch to the home folder:

cd

, download and extract the contents of the folder like we did before:

wget http://www3.mpibpc.mpg.de/groups/de_groot/compbio/p10/ion_FR.tar.gz
tar -xvzf ion_FR.tar.gz ; rm -f ion_FR.tar.gz

or, if you downloaded to the default 'Desktop' folder:

tar -xvzf Desktop/ion_FR.tar.gz ; rm -f Desktop/ion_FR.tar.gz

Get in the extracted folder:

cd ion_FR

You will see a couple of folders, one for each channel. If time permits we will compute the Potential of Mean Force for three channels, otherwise we will illustrate the method with just p-ala15.

cd p-ala15_single

As said before, doing multiple simulation runs is very tedious if one has to do everything by hand. We will first compute the work along one of the forward and reverse paths by hand, and afterwards we will use a set of scripts to compute the rest. Firstly, let's extract the work for the forward run,

cd forward_run

Here you will find a pull.pdo file, which contains: the time (ps), a reference position in the channel (we don't need it here), the position of the group being pulled (ion, in our case) and the position of the spring (nm) . Since we know the force applied is harmonic with a force constant of 10000 kJ / mol nm^2, we can easily compute the force,



We will use the following comand to extract the value of the force at each spring position, which moves from one end to the other of the pore,

 cat pull.pdo | awk '($1 != "#" && $1 != "#####") {print $3, 10000*($4 - $3)}' > force.txt

With this command for each line of numeric input we print the third column (position) and we compute the force based on the above formula. Finally, the pair of values is put into a file.
Now we will sort the values corresponding to the z axis in ascending order,

sort -n -k 1 force.txt > sorted_forces.txt

Now open the file sorted_forces.txt to perform the integration of the force, which will lead to the work along the path.

xmgrace sorted_forces.txt

Never mind if you get an xmgrace warning. Just close the warning window and proceed. It will not affect the result. As we did before, under "Data", select "Transformations", followed by "Integration", and press "Accept". The integrated curve is now in graph G0.S1. We will write it to a file: select "Data", then "Export" and "ASCII". Select the graph GO.S1 and under "Selection" choose a name, for example forward_work.txt.
Since we would like to average the work in both directions, we will need to make sure that both files have the same number of data points. This can be accomplished by performing a window average: all the data points lying in the interval [z,z+width] will be averaged. Use the following command,

awk -f ../../p-ala15/forward_run/window.awk < forward_work.txt  > forward_W_av.txt 

Copy the resulting file to the FR_single folder,

cp forward_W_av.txt ../FR_single 


Question: If you have a look at the result, with xmgrace, you will see that this result cannot be the correct Potential of Mean force. Why not?

Ok, so now we will repeat the same operations for the reverse path,

cd ../reverse_run

cat pull.pdo | awk '($1 != "#" && $1 != "#####") {print $3, 10000*($4 - $3)}' > force.txt
sort -n -k 1 force.txt > sorted_forces.txt

Integrate the forces the same way as before. Export the result to a file, say reverse_work.txt. Since we have sorted the position in ascending orther, we automatically have switched the sign of the integration. For that reason we can directly compute the average of both Forward and Reverse works to get the Potential of Mean Force. We finally use window average to have a compatible set of data points, like before,

awk -f ../../p-ala15/forward_run/window.awk < reverse_work.txt  > reverse_W_av.txt
cp reverse_W_av.txt ../FR_single

cd ../FR_single

Now that we have all the work profiles along the channel, let's use the result we derived for TFT and the stiff spring approximation: average Forward and Reverse work to get the Potential of Mean Force. We will use a tiny script that takes care to average the right values:

./average_work.csh forward_W_av.txt reverse_W_av.txt | grep -v csh > Free_Energy_15.txt 

We provide a reference PMF for this channel. Compare it with the one you just computed and the work performed in the forward and reverse path,

wget http://www3.mpibpc.mpg.de/groups/de_groot/compbio/p10/reference_PMF_ala15.txt
xmgrace reference_PMF_ala15.txt Free_Energy_15.txt forward_W_av.txt reverse_W_av.txt -legend load

The black curve is the reference Potential of Mean Force and the red curve is the averaged curve we just calculated. The green curve is the forward transition, whereas the blue curve is the reverse transition.

Question: Did the result get better, after averaging? Where is the main barrier for ion permeation through this channel located?

Now that we learned that the approach seems to yield reasonably good results, let's compute a bunch of work profiles in both directions to see how the overall result improves. Since it's a lot of work to do it by hand, we will use scripts that will automate the exact same process you just did,

cd ../../p-ala15
cd forward_run

This script will generate different work profiles for 5 different trajectories in the Forward direction.

./compute_work.csh

Now move all the resulting profiles to the folder where we will average them all,

cp W_fwd_*.txt ../FR_average

To obtain the work in the Reverse direction, let's simply do the same again,

cd ../reverse_run

./compute_work.csh cp W_rev_*.txt ../FR_average

Now we move to the FR_average folder and perform the averages with the following commands,

cd ../FR_average
./do_averages.csh

Have a look at the obtained Free Energy (or Potential of Mean Force) profile. Compare it with the previous one using xmgrace:

wget http://www3.mpibpc.mpg.de/groups/de_groot/compbio/p10/reference_PMF_ala15.txt
xmgrace reference_PMF_ala15.txt Free_energy.txt -legend load


Question: Does the result improve? How does it compare to the reference profile?

Go back to Contents

Optional: further analysis

Since we already know how to use the Crooks fluctuation theorem to extract potentials of mean force, let's use this method to learn something about ion channels. Following the same procedure, we already computed the work along the trajectory for two channels longer than 15 aminoacids, namely 19 and 25. To compute the Potential of Mean Force, simply move to their FR_results

cd ../../p-ala19/FR_results
./do_averages.csh 

Now we align the profile with the previous one:

awk '{print $1-0.3,$2}' Free_energy.txt > t ; mv -f t Free_energy.txt

and look at the two together:

xmgrace Free_energy.txt ../../p-ala15/FR_average/Free_energy.txt -legend load

cd ../../p-ala25/FR_results
./do_averages.csh 

Again, we align the profile with the previous ones:

awk '{print $1-0.5,$2-5}' Free_energy.txt > t ; mv -f t Free_energy.txt 

and look at them together:

xmgrace Free_energy.txt ../../p-ala15/FR_average/Free_energy.txt ../../p-ala19/FR_results/Free_energy.txt -legend load

Compare the resulting profiles with the p-ala15.


Question: How does the profile change as we move to longer channels? Which factors could you imagine that result in a length dependence of the free energy profile?

Go back to Contents

Further references: