ToyMD

5.2. ToyMD#

5.2.1. How to run the exercises#

We provide the correct version of the code from last week, feel free to use it for this exercise to be sure to start from a correct code, which you will further modify in this exercise.

How to access the corrected version of the code:

Access the NVE version of the code on Moodle (“ToyMD Code from last week” file)
Download the zip file
Upload it on the Noto repository (Ex5 folder)
Unzip the file with the following command:

!unzip ToyMD.zip

Adapt permissions to be able to run the code

!chmod -R 700 ./ToyMD

import sys
sys.path.append("..")
import helpers
from helpers import *
helpers.set_style()
import matplotlib.pyplot as plt
import numpy as np
import MDAnalysis
from MDAnalysis.analysis import rdf
from scipy.spatial import distance

5.3. Exercises#

5.3.1. MD Initialization and Temperature#

Exercise 1

Implement the initialization described in the Theory section in our ToyMD code in toy_md.py in the section ##INITIALIZATION HERE##. For this you should pick random velocities using np.random.randn(dim1,dim2) which will produce random numbers between 0 and 1. You need to shift this random gaussian distribution appropriately and then multiply it such that the width of the velocity distribution matches the kinetic energy at the target temperature. Use the variables masses[i] for the mass of particle i and Boltzmanns constant 0.00831415. Remember that for each degree of freedom (e.g velocity in x direction) (5.22) holds.

Exercise 2

Implement the Berendsen thermostat in the toy_md.py and toy_md_integrate.py (change the compute_lambda_T function) files.

Exercise 3

A better thermostat is the Andersen thermostat. It can be implemented as follows. Describe what problems this thermostat will present to us e.g for sampling of diffusion coefficients. What advantage does this thermostat have compared to the Berendsen thermostat?

#Andersen Thermostat
for i in range(N):
    if (r.random()<float(md_params["tau-T"])*float(md_params['time-step'])):# pick random particles according to nu
        sd=np.sqrt(0.00831415*float(md_params['temperature'])/masses[i])  # Velocity distribution at target temperature
        velocities[i]=0+np.random.randn(1,3)*sd # set new random velocities

5.3.2. Force field#

Now we can run a MD simulation in our desired NVT ensemble using our code on simple systems. Let’s simulate a box of CO2 molecules.
Investigate the force_field.txt file in the carbon-dioxide folder and then run a short molecular dynamics simulation using the following parameters:

number-of-steps  2000 # Number of integration time steps
time-step        0.001  # Integration time step (picoseconds)
temperature      50  # Simulation temperature
tau-T            0.05 # Temperature coupling time (picoseconds)
output-frequency  10   # Store coordinates every N steps

The simulation is run using the following command. Make sure to correctly adapt the path to the toy_md.py code and the input files.

!path/to/toy_md.py -c path/to/co2.pdb -p path/to/params.txt -f path/to/force_field.txt -o path/to/co2-traj.pdb -w path/to/co2-final.pdb

where co2-traj.pdb is a trajectory written each output-frequency steps and co2-final.pdb is the final geometry after the simulation has run the specified number of steps.

Exercise 4

Visualize the trajectory and visualize the distance of a C-O bond using the code cells below. We will use the bond length of the force field as a reference. Explain the fluctuations that you observe. What does the average value correspond to?

5.3.2.1. Visualization of the trajectory#

Here you can visualize the trajectory to see if your simulation is running correctly. A quick visualization often highlights if the parameters/simulation are correct.

# ADJUST THE PATHS AS NECESSARY TO MATCH THE NAME OF YOUR FOLDERS
traj = MDAnalysis.Universe('./path/to/co2-traj.pdb')
show_trajectory(traj)

carbon = traj.select_atoms("resid 1 and name C")
oxygen1 = traj.select_atoms("resid 1 and name O1")

carbon = traj.select_atoms("resid 1 and name C")
oxygen2 = traj.select_atoms("resid 1 and name O2")

carbon_oxygen1_dist = []
carbon_oxygen2_dist = []
for ts in traj.trajectory:
    carbon_oxygen1_dist.append(distance.euclidean(carbon.positions[0], oxygen1.positions[0]))
    carbon_oxygen2_dist.append(distance.euclidean(carbon.positions[0], oxygen2.positions[0]))

forcefield = 0  #check force_field.txt in ToyMD folder for this value # modify here

fig, ax = plt.subplots(1)

ax.plot(carbon_oxygen1_dist, label="C-O1")
ax.plot(carbon_oxygen2_dist, label="C-O2")
ax.axhline(forcefield, label="distance from forcefield", color='black')
ax.set_xlabel("timestep")
ax.set_ylabel(r'distance ($\AA$)')
ax.set_title('Distance')
ax.legend()

plt.show()

Exercise 5

Visualize the radial distribution function of this system using below code. What do you observe?

5.3.2.2. Compute RDF#

We can use the MDAnalysis library to compute the radial distribution function.

oxygens = traj.select_atoms("name O1" or "name 02")
carbons = traj.select_atoms("name C")

rdf = MDAnalysis.analysis.rdf.InterRDF(carbons, oxygens)
rdf.run()

fix, ax = plt.subplots()
plt.plot(rdf.results.bins, rdf.results.rdf)
ax.set_xlabel('distance r (angstrom)')
ax.set_ylabel('radial probability density g(r)')
ax.set_title('RDF Oxygen atoms to Carbon atoms')
plt.show()

Bonus Exercise 6

How would the radial distribution function look like for a heterogenous selection? Below is the code for plotting the RDF of solvent vs different amino acids in a solvated protein (GB1) system. Based on the plot you generate, discuss the difference in solvation patterns around the phenylalanine (F) amino acid sites and around the aspartate (D) amino acid sites. Why do you think these differences exist? Take a look on the PDB Data Bank site to view the relative positions of these amino acids: 3D GB1

traj_gb1 = MDAnalysis.Universe('./ToyMD/carbon-dioxide/gb1.pdb')

#selection of different groups of atoms to compare
#select the solvent atoms 
solvent = traj_gb1.select_atoms("resname WAT")
#selects the atoms which run along the protein sequence (nitrogen, alpha-carbon, carbon)
solute_backbone = traj_gb1.select_atoms("backbone")
#select the atoms which comprise the specific amino acids
solute_PHE = traj_gb1.select_atoms("resname PHE")
solute_ASP = traj_gb1.select_atoms("resname ASP")

rdf0 =  MDAnalysis.analysis.rdf.InterRDF(solvent, solute_backbone)
rdf0.run()
rdf1 = MDAnalysis.analysis.rdf.InterRDF(solvent, solute_PHE)
rdf1.run()
rdf2 = MDAnalysis.analysis.rdf.InterRDF(solvent, solute_ASP)
rdf2.run()

fix, ax = plt.subplots()

plt.plot(rdf0.results.bins, rdf0.results.rdf, color="black", label="Backbone")
plt.plot(rdf1.results.bins, rdf1.results.rdf, color="blue", label="PHE")
plt.plot(rdf2.results.bins, rdf2.results.rdf, color="red", label="ASP")

ax.set_xlabel('distance r ($\AA$)')
ax.set_ylabel('radial probability density g(r)')
ax.set_title('RDF Solvent to Different Amino Acid Groups')
ax.set_xlim(0,15)
ax.set_ylim(0,1)
ax.legend()

plt.show()

5.3.3. Timestep and coupling#

Exercise 7

Test the influence of the coupling parameter \(\tau\) and timestep \(dt\). Test the combinations of:

large \(\tau\) and small \(dt\)
large \(dt\) and small \(\tau\)
very large \(dt\) and \(\tau\)

For some of these settings the system may explode (you can see that because really high energies and temperatures are reached and the simulation gets stuck). If this is the case, you will not be able to complete the simulation, but you can just restart the notebook and use new parameters. In your report, even for those parameters add a comment and explain the behaviour you observe.

Hint: To check the influence of the coupling parameter \(\tau\), you can plot the instantaneous temperatures for several runs using different \(\tau\) values.
For this, you need to rename the logfile produced after running toy_md (to avoid overwritting it with the next run). You can do this by executing this command on the terminal: cp <path_to_logfile>/logfile <path_to_logfile>/logfile_tau_<value_of_tau>. To then read the temperatures from each logfile, you can use the function read_temperatures shown below.

Note that the code cell for plotting already suggests you a possible set of values of \(\tau\) and \(dt\), adapt the values of the parameters and plot different/more curves.

The simulation is run using the following command (for this test, you can run a shorter simulation, i.e. decreasing the number of steps). Make sure to correctly adapt the path to the toy_md.py code and the input files.

!path/to/toy_md.py -c path/to/co2.pdb -p path/to/params.txt -f path/to/force_field.txt -o path/to/co2-traj.pdb -w path/to/co2-final.pdb

!cp path/to/logfile path/to/logfile_tau_<value_of_tau>_dt_<value_of_dt>

def read_temperatures(
    logfile_path: str,
) -> list:
    """
    Read temperatures from logfile and return as list.
    Args:
        logfile_path (str): path of logfile
    Returns:
        temps (list): list of temperatures for every simulation step
    """
    with open(logfile_path) as f:
        logfile = f.readlines()

    temps = [float(line.split(',')[4]) for line in logfile[1:] if line[0:5]!='step:']
    return temps

# Code for plotting the instantaneous temperatures for several runs using different $\tau$ values.

# Read temperatures
temps_1 = read_temperatures(logfile_path = './ToyMD/carbon-dioxide/logfile') # modify logfile_path with the path to your files!
# Add here similar lines for reading log files from simualtions with different parameters

fig, ax =plt.subplots(1)

ax.plot(temps_1, label=r'$\tau$ = 0.5, dt = 0.001')
# Add here similar lines for reading log files from simualtions with different parameters

ax.set_xlabel(r"Simulation step")
ax.set_ylabel(r"T (K)")
ax.set_title('Temperature variation')
plt.legend()
plt.show()