Introduction to coarse grained modeling

This practical provides an introduction to coarse-grained (CG) modeling. A CG model reduces the number of degrees of freedom in a system by reducing the number of interaction sites, resulting in a model that is computationally less expensive. In this practical, we will optimize the bonded terms of the coarse-grained force field with the help of all-atom (AA) molecular dynamics (MD) simulation. The practical builds on concepts covered in previous lectures, such as the potential of mean force (PMF) and free energy calculation. We will carry out the Martini (coarse-grained model) parametrization of a dodecane molecule using target atomistic simulation of a dodecane solution. The goal of the optimization process is to match a mapped atomistic distribution to a coarse-grained one.

To start the optimization of a coarse-grained force field, we need to generate reference data first. To begin the practical, download the relevant files and unpack them using the following commands:

wget http://www3.mpibpc.mpg.de/groups/de_groot/compbio/p_coarse_grained/files/CGPractical.tar.gz
tar xzvf CGPractical.tar.gz
cd CGPractical/AA

pymol start.gro

load_traj traj_aa.xtc

To get a perspective on the speed-up in CG simulation compared to AA, we will first run an AA MD simulation and then compare it to the CG simulation later in the practical. Start the simulation and terminate it after some time using Ctrl+C.(Write down the speed (ns/day))

gmx grompp -f md.mdp -p topol.top -c start.gro -o aa.tpr
gmx mdrun -v -deffnm aa

Mapping AA trajectory

Now that we have our reference data, we need to decide on the resolution of the CG model and the mapping. Mapping defines which atoms are assigned to which CG beads. The Martini model generally assumes a 4:1 mapping, which means 4 heavy atoms are assigned to 1 CG bead. We will follow the same mapping strategy and map 4 carbon atoms with their attached hydrogen atoms to 1 CG bead.

With the defined mapping, we can proceed to transform the AA trajectory into CG simply by calculating the center of mass for each group of atoms within a CG bead. To transform our trajectory, we first need to generate an index file where the index of every atom is assigned to a CG bead. Since we have 201 molecules in our system, we will use a script to write 603 index groups (3 CG beads per molecule).

python mkindex.py

To see which atom index is assigned to which CG bead, open this Python script with any text editor.

You should now have the index file, cg_ind.ndx. To use it with gmx traj, request 603 groups (number of CG beads in the system), set the -com flag, and provide the PBC-treated AA trajectory, the compiled topology, and the new index file:

seq 0 602 | gmx traj -f traj_aa.xtc -s aa.tpr -oxt traj_cg.xtc -n cg_ind.ndx -com -ng 603
seq 0 602 | gmx traj -f traj_aa.xtc -s aa.tpr -oxt cg.gro -n cg_ind.ndx -com -ng 603 -e 0

 pymol cg.gro start.gro

show spheres, cg
set sphere_scale, 0.5, cg

With your mapped trajectory file ready, you can now use gmx distance and gmx angle to extract bonded distributions of the beads. To do this, you'll need to generate index files that specify bonds and angles. Since generating hundreds of atom pairs and triplets will require some scripting, use mkbonded_ind.py to generate index files.

python mkbonded_ind.py

Now you should have index files bonds.ndx and angles.ndx containing all bonds and angles in your system. With these, we can proceed to calculate distributions.

gmx distance -f traj_cg.xtc -len 0.47 -tol 0.3 -n bonds.ndx -oh bond_dist.xvg
gmx angle -n angles.ndx -f traj_cg.xtc -od angle_dist.xvg

Select the bonds in the index file "0" and press Ctrl+D when generating the bond distribution.

xmgrace bond_dist.xvg
xmgrace angle_dist.xvg

Optimization of the CG force field

To obtain bonded parameters for our CG model, we will use the Boltzmann inversion method. In this method, the effective bond and angle potentials between CG sites are derived from bond and angle distributions. To estimate the spring constants for bonds and angles, we will assume that our atomistic distributions are Gaussian.

Question: If we assume that the probability distribution is Gaussian, how does the PMF formula simplify? What is the spring constant?

Run a following scripts for every distribution to get parameters of a harmonic potential.

python getparams.py bond_dist.xvg
python getparams.py angle_dist.xvg

With these parameters, we can now create a topology for the CG representation of dodecane. Change directory and replace "?" in ddec_cg.itp file with appropriate parameters. We also need a coordinate file for the CG simulation. We can use the one we already generated.

cd ../CG
cp ../AA/cg.gro .
gedit ddec_cg.itp

Now run minimization and equilibration of the system. In the first round of simulation, atom types in the coordinate and topology files do not match. To ignore the warning, add the -maxwarn 1 flag.

gmx grompp -f min.mdp -p topol.top -c cg.gro -o min.tpr -maxwarn 1
gmx mdrun -v -deffnm min

gmx grompp -f eq.mdp -p topol.top -c min.gro -o eq.tpr 
gmx mdrun -v -deffnm eq

Now, with the equilibrated system, we can finally run the CG simulation and compare it to the AA simulation

gmx grompp -f md.mdp -p topol.top -c eq.gro -o md.tpr
gmx mdrun -v -deffnm md

Question: By how much is CG simulation faster than AA? Is the speed-up linear with the number of coarse-grained atoms (On average, we replaced 13 atoms with 1 CG bead)?

Visualizing the CG system is a little bit tricky compared to AA because bond lengths in the CG system are generally much larger than in any AA bond, which many software do not recognize. To visualize our system, run the following commands

gmx trjconv -f md.xtc -o cg.pdb -pbc mol -conect yes -s md.tpr -e 0
gmx trjconv -f md.xtc -o traj_cg.xtc -pbc mol -s md.tpr

Press "0" to select the whole system. Now you can open cg.pdb file with pymol. To load trajectory use load_traj command in pymol

Compute bond and angle distributions for our CG system and compare them to the mapped AA distributions using xmgrace.

cp ../AA/*.ndx .
gmx distance -f traj_cg.xtc -len 0.47 -tol 0.3 -n bonds.ndx -oh bond_dist.xvg
gmx angle -n angles.ndx -f traj_cg.xtc -od angle_dist.xvg

xmgrace bond_dist.xvg ../AA/bond_dist.xvg -legend load
xmgrace angle_dist.xvg ../AA/angle_dist.xvg -legend load

Question: How do these distributions compare? Is this a good agreement? Why do we see a discrepancy between AA and CG distributions?

In a real optimization process of the CG force field, parameters of the force field are iteratively updated until the CG distribution matches the AA distribution within some tolerance. Here we can update our parameters manually to improve agreement between distributions. Replace angle in ddec_cg.itp file with 180 degree and set force.c in range between 30-40 kj/mol/rad^2 and rerun CG simulaiton. Compare all 3 distribution

gmx grompp -f md.mdp -p topol.top -c eq.gro -o md.tpr
gmx mdrun -v -deffnm md
gmx trjconv -f md.xtc -o traj_cg.xtc -pbc mol -s md.tpr

gmx angle -n angles.ndx -f traj_cg.xtc -od angle_dist_new.xvg 
xmgrace angle_dist.xvg ../AA/angle_dist.xvg angle_dist_new.xvg -legend load

Comparison of AA and CG system

With the optimized CG force field, we can now compare our system's fundamental statistical mechanical quantities, e.g., the radial distribution function (RDF). To compute and compare the RDF, use the following commands:

gmx rdf -f traj_cg.xtc -excl -s md.tpr -o rdf_cg.xvg
gmx rdf -f ../AA/traj_cg.xtc -excl -s md.tpr -o rdf_aa.xvg

xmgrace rdf* -legend load

For both commands, specify 2 selections to calculate RDF, both as '0', so we calculate the RDF of every CG bead with every other CG bead.

Additionally, we can set RDF as a target for optimizing the non-bonded potential (taken from Martini here).
Question: How do these RDFs compare? Based on this distribution, do you think that additional refinement is needed for non-bonded interactions?

gmx msd -f traj_cg.xtc -o msd_cg.xvg -selrpos res_com -s md.tpr
gmx msd -f ../AA/traj_cg.xtc -o msd_aa.xvg -selrpos res_com -s md.tpr

xmgrace msd* -legend load

Question: In which systm diffusion is faster? Why do you think diffusion is faster? How does this modifies effective time in CG system?

Optional: Partition coefficient P

cd ../US
gmx wham -it tpr_files.dat -if pullf_files.dat -o -hist -unit kT

Practical: Introduction to the coarse grained modeling

Bert de Groot / Maksim Kalutskii

Introduction

Mapping AA trajectory

Optimization of the CG force field

Comparison of AA and CG system

Optional: Partition coefficient P