6.4. TrpCage Analysis#

Please run this in noto. Upload the zip file from colab into the noto directory space by selecting the folder icon on the left your screen while in noto and then selecting the up arrow (for upload) from the top menu. Then upload your archive.zip file that you generated in the previous exercise.

import sys
sys.path.append("..")
import helpers
from helpers import *
helpers.set_style()
# unzip
!unzip -o archive.zip
import numpy as np

import matplotlib.pyplot as plt

from matplotlib.ticker import MaxNLocator
from scipy import stats

import warnings

6.4.1. Lets load the trajectory files#

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    u = MDAnalysis.Universe("trp_cage.prmtop","trp_cage_gb.dcd")  # always start with a Universe
show_trajectory(u)

Exercise 9

What type of structure is the folded Trp-cage miniprotein? List the main components contributing to this structure, including the residues which are responsible for their formation.

We will also load an experimental NMR structure from an experimental NMR ensemble.

Experimental structures Quick overview

Experimental structures can be obtained using three main methods:

X-Ray diffraction Freeze the proteins and shoot x-rays at it. The diffraction pattern allows to reconstruct the electron density map in which we can fit a protein model. The models are usually very accurate, may have crystal artefacts due to packing and low temperature and can be obtained for proteins from large to small. Some proteins are difficult to crystallize.

NMR In solution structure using nuclear magnetic resonance (usually carbon and hydrogen) using complicated pulse sequences. Allows to resolve dynamical properties of the protein but normally is limited to smaller proteins.

Cryo EM The new kid on the block of structural biology. Works very well especially for large proteins, models are fit into a reconstructed map of the protein. Resolution tends to be lower than using X-Ray crystallography but sample preparation is much easier.

6.4.2. Align trajectory#

Aligning a trajectory is an important step when analysing molecular dynamics simulations. We will choose a reference structure/frame and project each frame to the reference (e.g by minimizing RMSD to reference).

In this case we align on the backbone nitrogen and carbon atoms.

As a reference we use one structure from the experimental NMR ensemble (PDB 1l2y).

from MDAnalysis.analysis import align
ref = MDAnalysis.Universe("1L2Y.pdb",  topology_format="PDB")
align.AlignTraj(u,  # trajectory to align
                ref,  # reference
                select='name CA',  # selection of atoms to align
                filename='aligned.xtc',  # file to write the trajectory to
                match_atoms=True,  # whether to match atoms based on mass
               ).run()
<MDAnalysis.analysis.align.AlignTraj at 0x7ff0fb5426d0>
# load new aligned trajectory
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    u = MDAnalysis.Universe("trp_cage.prmtop", "aligned.xtc")

We can also compute the RMSD to the reference structure using a differnt function.

from MDAnalysis.analysis import rms
R = rms.RMSD(u,  # universe to align
             ref,  # reference universe or atomgroup
             select='backbone',  # group to superimpose and calculate RMSD
             
             ref_frame=0)  # frame index of the reference
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    R.run()

Next, we plot this RMSD.

import pandas as pd
df = pd.DataFrame(R.results.rmsd,
                  columns=['Frame', 'Time (ps)', 'RMSD (A)'])
df["Time (ns)"]=df["Time (ps)"]/1000
ax = df.plot(x='Time (ns)', y=['RMSD (A)'],
             kind='line')
ax.set_title('RMSD')
ax.set_ylabel(r'RMSD ($\AA$)');
ax.set_title('RMSD to NMR structure')
ax.set_xlabel('time [ns]')
plt.show()

an alternative is the RMSF

c_alphas = u.select_atoms('protein and name CA')
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    R = rms.RMSF(c_alphas).run()
fig, ax = plt.subplots(1)

ax.set_title('RMSF')
ax.plot( c_alphas.resids, R.results.rmsf)
ax.xaxis.set_major_locator(MaxNLocator(integer=True))
ax.set_xlim(0,20)
ax.set_xlabel('residue')
ax.set_ylabel('RMSF [$\AA$]')
plt.show()

Exercise 10

Explain the RMSD and RMSF plots. Does the trajectory reach the same conformation as the experimental structure? Which metric is more useful for the problem at hand? Bonus: Provide a use case for the other metric.

6.4.3. Visualize trajectory#

Now let’s compare the trajectory to one of the structures from the NMR structural ensemble to see how our linear Trp cage folded.

show_trajectory([u, "1L2Y.pdb"])

6.4.4. Emergence of secondary structure#

6.4.4.1. Hbonds#

We know that hydrogen bonds are very important for the formation of secondary structure elements. Let’s count the number of hbonds per frame and bin them in 1 ns bins by taking the mean of 10 frames.

from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis as HBA
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    hbonds = HBA(universe=u)
    hbonds.run()
hbond_data = pd.DataFrame(hbonds.results.hbonds,
                  columns=['frame', 'donor index',
        'hydrogen index',
        'acceptor index',
        'distance',
        'angle'])
n_hbonds_per_frame = hbond_data.groupby("frame").size()
# Bin every nanosecond (i.e. 10 snapshots)
statistic, bin_edges, binnumber = stats.binned_statistic(n_hbonds_per_frame.index, n_hbonds_per_frame.values, bins=20)
fig, ax = plt.subplots(1)

times = np.linspace(0,20, num=u.trajectory.n_frames)

ax.set_title('Average number of Hbonds')
ax.plot(statistic)
ax.set_xlabel('time [ns]')
ax.set_ylabel('n hbonds')
plt.show()

Exercise 11

Include the hbond graph in your report, and explain the observed trend with reference to the structural components of the Trp-cage miniprotein ?

Exercise 12

Perform the Q1 and Q2 analysis explained below and provide the graph of the number of contacts. Can you infer at which interval (in nanoseconds) the secondary structure forms?

Here we calculate a Q1 vs Q2 plot, where Q1 refers to fraction of native contacts along a trajectory with reference to the first frame, and Q2 represents the fraction of native contacts with reference to the last.

from MDAnalysis.analysis import distances, contacts
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    q1q2 = contacts.q1q2(u, 'name CA', radius=8).run()
q1q2_df = pd.DataFrame(q1q2.results.timeseries,
                       columns=['Frame',
                                'Q1',
                                'Q2'])
q1q2_df['Frame']/=10
fig, ax = plt.subplots(1)

q1q2_df.plot(x='Frame', ax=ax)
ax.set_ylabel('Fraction of native contacts');

ax.set_title('Native contacts')
ax.set_xlabel('time [ns]')
plt.show()

6.4.4.2. Dihedral#

The central tryptophan in the Trp cage protein is located in the alpha helix. Here we track the dihedral angle along our simulation. When the protein forms ordered secondary structure elements we expect to see the dihedral only fluctuate a little.

from MDAnalysis.analysis import dihedrals
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    dihs = dihedrals.Dihedral( [u.residues[6].phi_selection()]).run()
fig, ax = plt.subplots(1)


ax.set_title('Trp6 $\phi$ dihedral')
ax.plot(np.linspace(0,20,num=200),dihs.results.angles.flatten())
ax.set_xlabel('time [ns]')
ax.set_ylabel('dihedral $[degree]$')
plt.show()

Exercise 13

Why is it useful to constrain bond lengths for larger MD simulations (typically with the SHAKE algorithm)? Which bonds would you typically constrain in such a scenario, and why?

Exercise 14 - Bonus

Which properties do you need to take into account in order to select an appropriate timestep for your MD simulation? Are there any other reasons you might wish to reduce or increase this timestep?

Exercise 15 - Bonus

Is it better to sample 2 x 10 ns from the same starting structure or 1 x 20 ns in order to explore conformational space efficiently?