{"cells": [{"cell_type": "markdown", "metadata": {"id": "hvi6vErkJqf5"}, "source": ["# TrpCage Analysis"]}, {"cell_type": "markdown", "metadata": {"id": "a6bnIklPJqf8"}, "source": ["Please run this in noto. Upload the zip file from colab into the noto directory space by selecting the folder icon on the left your screen while in noto and then selecting the up arrow (for upload) from the top menu. Then upload your archive.zip file that you generated in the previous exercise.\n"]}, {"cell_type": "code", "execution_count": 58, "metadata": {"colab": {"base_uri": "https://localhost:8080/", "height": 17}, "id": "-ljErnHeJqf9", "outputId": "844d6933-8ef3-4745-8eb0-18ecc70a0b29"}, "outputs": [{"data": {"text/html": [""], "text/plain": [""]}, "execution_count": 58, "metadata": {}, "output_type": "execute_result"}], "source": ["import sys\n", "sys.path.append(\"..\")\n", "import helpers\n", "from helpers import *\n", "helpers.set_style()"]}, {"cell_type": "code", "execution_count": null, "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "TpIQYwEWJqf_", "outputId": "f43de4d8-b22f-4e0e-b46f-b138f61a8c80"}, "outputs": [], "source": ["# unzip\n", "!unzip -o archive.zip"]}, {"cell_type": "code", "execution_count": 61, "metadata": {"colab": {"base_uri": "https://localhost:8080/", "height": 17, "referenced_widgets": ["e824c678dadb4d92a59cd8d977878248", "ad417a32fd1b4dbdaa12f3afc94016de"]}, "id": "Z33OdAQoJqf_", "outputId": "738cca00-3a3b-403e-811a-51eac67e28d8"}, "outputs": [], "source": ["import numpy as np\n", "\n", "import matplotlib.pyplot as plt\n", "\n", "from matplotlib.ticker import MaxNLocator\n", "from scipy import stats\n", "\n", "import warnings\n"]}, {"cell_type": "markdown", "metadata": {"id": "CuTvI_MiJqgA"}, "source": ["## Lets load the trajectory files"]}, {"cell_type": "code", "execution_count": 62, "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "u3ruKACTiUX9", "outputId": "3017d98d-a7c1-4fa6-97b1-cd4e097fc6b2"}, "outputs": [], "source": ["with warnings.catch_warnings():\n", " warnings.simplefilter(\"ignore\")\n", " u = MDAnalysis.Universe(\"trp_cage.prmtop\",\"trp_cage_gb.dcd\") # always start with a Universe"]}, {"cell_type": "code", "execution_count": 63, "metadata": {"colab": {"base_uri": "https://localhost:8080/", "height": 337, "referenced_widgets": ["a752d572151447c993db873e4ef115e2", "c0bd1388e9ac43f99eb0005b71fd20b8", "f32d68edcfc340cf9f5a8f6862edb8c1", "f411027e049b41d7a7c5ca1221f2ebc4", "28b0a0c5809c44219f76a3a30521ac07", "10dfd76d5194415592db9960570a5330", "0b939e7c8d4a49519da7053dc637cdb2", "4056c6920fd74baaae53e474d4862890", "f8756c75f72c4da6ad4cf0b2aef208af", "11a8fbf552dc4145aaf57e54aa8fdbaf", "6ed77c418a734ee683301fb9e553c880", "8933405dd4f046ee99eaf93d470abc6d", "cae18757408b4e628e1303081774a36d"]}, "id": "9NWvCYbRi5Jn", "outputId": "40f6a90e-ea8f-4412-ab6b-9c30bfdffa47"}, "outputs": [{"data": {"application/vnd.jupyter.widget-view+json": {"model_id": "bbda5ef5b2d74e64b839494eaedef7a3", "version_major": 2, "version_minor": 0}, "text/plain": ["Tab(children=(VBox(children=(HBox(children=(Play(value=0, description='Press play', max=200, show_repeat=False\u2026"]}, "execution_count": 63, "metadata": {}, "output_type": "execute_result"}], "source": ["show_trajectory(u)"]}, {"cell_type": "markdown", "metadata": {"id": "9T65KAfCJqgC"}, "source": ["\n", ":::{admonition} Exercise 9\n", ":class: exercise\n", "What type of structure is the folded Trp-cage miniprotein? List the main components contributing to this structure, including the residues which are responsible for their formation.\n", ":::"]}, {"cell_type": "markdown", "metadata": {"id": "572_D5AbJqgD"}, "source": ["We will also load an experimental NMR structure from an experimental NMR ensemble."]}, {"cell_type": "markdown", "metadata": {"id": "iz1Sgso4JqgD"}, "source": ["```{dropdown} Experimental structures Quick overview\n", "Experimental structures can be obtained using three main methods:
\n", "**X-Ray diffraction**\n", "Freeze the proteins and shoot x-rays at it. The diffraction pattern allows to reconstruct the electron density map in which we can fit a protein model.\n", "The models are usually very accurate, may have crystal artefacts due to packing and low temperature and can be obtained for proteins from large to small. Some proteins are difficult to crystallize.
\n", "**NMR**\n", "In solution structure using nuclear magnetic resonance (usually carbon and hydrogen) using complicated pulse sequences. Allows to resolve dynamical properties of the protein but normally is limited to smaller proteins.
\n", "**Cryo EM**\n", "The new kid on the block of structural biology. Works very well especially for large proteins, models are fit into a reconstructed map of the protein. Resolution tends to be lower than using X-Ray crystallography but sample preparation is much easier.\n", "```"]}, {"cell_type": "markdown", "metadata": {"id": "dB7wITMAJqgE"}, "source": ["## Align trajectory\n", "\n", "Aligning a trajectory is an important step when analysing molecular dynamics simulations. We will choose a reference structure/frame and project each frame to the reference (e.g by minimizing RMSD to reference). \n", "\n", "In this case we align on the backbone nitrogen and carbon atoms. \n", "\n", "As a reference we use one structure from the experimental NMR ensemble (PDB `1l2y`)."]}, {"cell_type": "code", "execution_count": 64, "metadata": {"id": "xwAkYcTZk3sP"}, "outputs": [], "source": ["from MDAnalysis.analysis import align"]}, {"cell_type": "code", "execution_count": 65, "metadata": {"id": "CbUPXpFmjqan"}, "outputs": [], "source": ["ref = MDAnalysis.Universe(\"1L2Y.pdb\", topology_format=\"PDB\")"]}, {"cell_type": "code", "execution_count": 66, "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "s_l0llz_ksN_", "outputId": "a31bb090-f6ee-465a-844d-b5b8ca382718"}, "outputs": [{"data": {"text/plain": [""]}, "execution_count": 66, "metadata": {}, "output_type": "execute_result"}], "source": ["align.AlignTraj(u, # trajectory to align\n", " ref, # reference\n", " select='name CA', # selection of atoms to align\n", " filename='aligned.xtc', # file to write the trajectory to\n", " match_atoms=True, # whether to match atoms based on mass\n", " ).run()"]}, {"cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [], "source": ["# load new aligned trajectory\n", "with warnings.catch_warnings():\n", " warnings.simplefilter(\"ignore\")\n", " u = MDAnalysis.Universe(\"trp_cage.prmtop\", \"aligned.xtc\")"]}, {"cell_type": "markdown", "metadata": {"id": "bwkHdUb-JqgF"}, "source": ["We can also compute the RMSD to the reference structure using a differnt function. "]}, {"cell_type": "code", "execution_count": 68, "metadata": {"id": "lg4SP4Itr3TV"}, "outputs": [], "source": ["from MDAnalysis.analysis import rms"]}, {"cell_type": "code", "execution_count": 127, "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "k-bnM3Tklp5k", "outputId": "65bc38ad-f60c-45da-fcfe-9bc6ee020158"}, "outputs": [], "source": ["R = rms.RMSD(u, # universe to align\n", " ref, # reference universe or atomgroup\n", " select='backbone', # group to superimpose and calculate RMSD\n", " \n", " ref_frame=0) # frame index of the reference\n", "with warnings.catch_warnings():\n", " warnings.simplefilter(\"ignore\")\n", " R.run()"]}, {"cell_type": "markdown", "metadata": {"id": "cVwscATAJqgF"}, "source": ["Next, we plot this RMSD."]}, {"cell_type": "code", "execution_count": 71, "metadata": {"id": "hcGIrvLxtjlp"}, "outputs": [], "source": ["import pandas as pd"]}, {"cell_type": "code", "execution_count": 129, "metadata": {"colab": {"base_uri": "https://localhost:8080/", "height": 167}, "id": "k5tfsmSctl5m", "outputId": "4fc43bd6-f79a-4f61-b1fc-fbb40c5c6d3e"}, "outputs": [], "source": ["df = pd.DataFrame(R.results.rmsd,\n", " columns=['Frame', 'Time (ps)', 'RMSD (A)'])\n", "df[\"Time (ns)\"]=df[\"Time (ps)\"]/1000"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["ax = df.plot(x='Time (ns)', y=['RMSD (A)'],\n", " kind='line')\n", "ax.set_title('RMSD')\n", "ax.set_ylabel(r'RMSD ($\\AA$)');\n", "ax.set_title('RMSD to NMR structure')\n", "ax.set_xlabel('time [ns]')\n", "plt.show()"]}, {"cell_type": "markdown", "metadata": {"id": "IupJtfyzJqgG"}, "source": ["an alternative is the RMSF"]}, {"cell_type": "code", "execution_count": 76, "metadata": {"id": "bcyPN-QomN3l"}, "outputs": [], "source": ["c_alphas = u.select_atoms('protein and name CA')\n", "with warnings.catch_warnings():\n", " warnings.simplefilter(\"ignore\")\n", " R = rms.RMSF(c_alphas).run()\n", "\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["fig, ax = plt.subplots(1)\n", "\n", "ax.set_title('RMSF')\n", "ax.plot( c_alphas.resids, R.results.rmsf)\n", "ax.xaxis.set_major_locator(MaxNLocator(integer=True))\n", "ax.set_xlim(0,20)\n", "ax.set_xlabel('residue')\n", "ax.set_ylabel('RMSF [$\\AA$]')\n", "plt.show()"]}, {"cell_type": "markdown", "metadata": {"id": "ZThTHkpvJqgH"}, "source": [":::{admonition} Exercise 10\n", ":class: exercise\n", "Explain the RMSD and RMSF plots. Does the trajectory reach the same conformation as the experimental structure?\n", "Which metric is more useful for the problem at hand? Bonus: Provide a use case for the other metric. \n", ":::"]}, {"cell_type": "markdown", "metadata": {"id": "xMhwQdmVJqgI", "tags": []}, "source": ["## Visualize trajectory\n", "\n", "Now let's compare the trajectory to one of the structures from the NMR structural ensemble to see how our linear Trp cage folded. "]}, {"cell_type": "code", "execution_count": 79, "metadata": {"colab": {"base_uri": "https://localhost:8080/", "height": 317, "referenced_widgets": ["62ed3f8acd8e491ea3746919f125c101", "60e0a4215f23405c93e42eea09f6bbbd", "bc949038d9de4d3f83a66b07be9cce8e", "f9a5508a49414b429c5d649db2bd9d19", "f2cbacc28bdf4ed6834c1bec2d7d0bc6", "59d5acf929904e809c2f200a336d2e77", "10da77d882774a689be6422c137d006a", "dca757bacf144e5991731f4eceed9eeb", "09b344d0ccda4712a41de3e725661553", "af33de415dae483abeaaede6eb5a742a", "ce09571b9e4d4fa5abbb63b0eefff28b", "15da5ae50d7740d1a8630f90de1889a0", "d4cb5c1af477484e8f97cdc6f56febb6"]}, "id": "NqjR5hj-JqgI", "outputId": "364e374e-ec5b-4de0-9e26-89c86ca31ab7"}, "outputs": [{"data": {"application/vnd.jupyter.widget-view+json": {"model_id": "986a66a680904f75b452913ff452006b", "version_major": 2, "version_minor": 0}, "text/plain": ["Tab(children=(VBox(children=(HBox(children=(Play(value=0, description='Press play', max=200, show_repeat=False\u2026"]}, "execution_count": 79, "metadata": {}, "output_type": "execute_result"}], "source": ["show_trajectory([u, \"1L2Y.pdb\"])"]}, {"cell_type": "markdown", "metadata": {"id": "d5nhx3sFJqgJ"}, "source": ["## Emergence of secondary structure\n", "### Hbonds"]}, {"cell_type": "markdown", "metadata": {"id": "F__3WCHBJqgJ"}, "source": ["We know that hydrogen bonds are very important for the formation of secondary structure elements. Let's count the number of hbonds per frame and bin them in 1 ns bins by taking the mean of 10 frames. "]}, {"cell_type": "code", "execution_count": 48, "metadata": {"id": "fp5vo3Z0JqgJ"}, "outputs": [], "source": ["from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis as HBA\n"]}, {"cell_type": "code", "execution_count": 80, "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "FFQOE1FQxQgN", "outputId": "fbe685e0-4041-4a6d-b242-09e3f6c234b2"}, "outputs": [], "source": ["with warnings.catch_warnings():\n", " warnings.simplefilter(\"ignore\")\n", " hbonds = HBA(universe=u)\n", " hbonds.run()"]}, {"cell_type": "code", "execution_count": 130, "metadata": {"colab": {"base_uri": "https://localhost:8080/", "height": 423}, "id": "0ghHzXEoxb8p", "outputId": "512e6948-f4ae-4089-e4b9-241e969eace4"}, "outputs": [], "source": ["hbond_data = pd.DataFrame(hbonds.results.hbonds,\n", " columns=['frame', 'donor index',\n", " 'hydrogen index',\n", " 'acceptor index',\n", " 'distance',\n", " 'angle'])"]}, {"cell_type": "code", "execution_count": 102, "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "wIeZDQ1SJqgK", "outputId": "c7b29f70-727a-404b-faad-a58046afc6ab"}, "outputs": [], "source": ["n_hbonds_per_frame = hbond_data.groupby(\"frame\").size()\n"]}, {"cell_type": "code", "execution_count": 98, "metadata": {"id": "QtSfao_tJqgK"}, "outputs": [], "source": ["# Bin every nanosecond (i.e. 10 snapshots)\n", "statistic, bin_edges, binnumber = stats.binned_statistic(n_hbonds_per_frame.index, n_hbonds_per_frame.values, bins=20)"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["fig, ax = plt.subplots(1)\n", "\n", "times = np.linspace(0,20, num=u.trajectory.n_frames)\n", "\n", "ax.set_title('Average number of Hbonds')\n", "ax.plot(statistic)\n", "ax.set_xlabel('time [ns]')\n", "ax.set_ylabel('n hbonds')\n", "plt.show()"]}, {"cell_type": "markdown", "metadata": {"id": "F5nUFLEiJqgK"}, "source": [":::{admonition} Exercise 11\n", ":class: exercise\n", " Include the hbond graph in your report, and explain the observed trend with reference to the structural components of the Trp-cage miniprotein ?\n", ":::"]}, {"cell_type": "markdown", "metadata": {"id": "WWDhHFmdJqgK"}, "source": [":::{admonition} Exercise 12\n", ":class: exercise\n", "Perform the Q1 and Q2 analysis explained below and provide the graph of the number of contacts. Can you infer at which interval (in nanoseconds) the secondary structure forms?\n", ":::"]}, {"cell_type": "markdown", "metadata": {"id": "z_rWEH0r2sdZ"}, "source": ["Here we calculate a Q1 vs Q2 plot, where Q1 refers to fraction of native contacts along a trajectory with reference to the first frame, and Q2 represents the fraction of native contacts with reference to the last."]}, {"cell_type": "code", "execution_count": 100, "metadata": {"id": "TAkVgpd31ESs"}, "outputs": [], "source": ["from MDAnalysis.analysis import distances, contacts\n"]}, {"cell_type": "code", "execution_count": 103, "metadata": {"colab": {"base_uri": "https://localhost:8080/"}, "id": "ijPPsBdW1JAa", "outputId": "a9c16002-1472-4943-86ab-1acbff413031"}, "outputs": [], "source": ["with warnings.catch_warnings():\n", " warnings.simplefilter(\"ignore\")\n", " q1q2 = contacts.q1q2(u, 'name CA', radius=8).run()\n"]}, {"cell_type": "code", "execution_count": 107, "metadata": {"colab": {"base_uri": "https://localhost:8080/", "height": 206}, "id": "hW6ZBhBn2PhA", "outputId": "a9b7c452-faa3-4c8e-d613-4c18955afd3a"}, "outputs": [], "source": ["q1q2_df = pd.DataFrame(q1q2.results.timeseries,\n", " columns=['Frame',\n", " 'Q1',\n", " 'Q2'])\n", "q1q2_df['Frame']/=10"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["fig, ax = plt.subplots(1)\n", "\n", "q1q2_df.plot(x='Frame', ax=ax)\n", "ax.set_ylabel('Fraction of native contacts');\n", "\n", "ax.set_title('Native contacts')\n", "ax.set_xlabel('time [ns]')\n", "plt.show()"]}, {"cell_type": "markdown", "metadata": {"id": "RNOQk4NoJqgL"}, "source": ["### Dihedral\n", "\n", "The central tryptophan in the Trp cage protein is located in the alpha helix. Here we track the dihedral angle along our simulation. When the protein forms ordered secondary structure elements we expect to see the dihedral only fluctuate a little. "]}, {"cell_type": "code", "execution_count": 111, "metadata": {"id": "JICJYdlC3aSM"}, "outputs": [], "source": ["from MDAnalysis.analysis import dihedrals\n"]}, {"cell_type": "code", "execution_count": 113, "metadata": {"id": "6RVXIe2x3p7u"}, "outputs": [], "source": ["with warnings.catch_warnings():\n", " warnings.simplefilter(\"ignore\")\n", " dihs = dihedrals.Dihedral( [u.residues[6].phi_selection()]).run()\n"]}, {"cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": ["fig, ax = plt.subplots(1)\n", "\n", "\n", "ax.set_title('Trp6 $\\phi$ dihedral')\n", "ax.plot(np.linspace(0,20,num=200),dihs.results.angles.flatten())\n", "ax.set_xlabel('time [ns]')\n", "ax.set_ylabel('dihedral $[degree]$')\n", "plt.show()"]}, {"cell_type": "markdown", "metadata": {"id": "Qq9FQTzAJqgM"}, "source": [":::{admonition} Exercise 13\n", ":class: exercise\n", "Why is it useful to constrain bond lengths for larger MD simulations (typically with the SHAKE algorithm)? Which bonds would you typically constrain in such a scenario, and why?\n", ":::"]}, {"cell_type": "markdown", "metadata": {"id": "D0Zvg-6EJqgM"}, "source": [":::{admonition} Exercise 14 - Bonus\n", ":class: exercise\n", "Which properties do you need to take into account in order to select an appropriate timestep for your MD simulation? Are there any other reasons you might wish to reduce or increase this timestep?\n", ":::"]}, {"cell_type": "markdown", "metadata": {"id": "Uh75bwa-JqgN"}, "source": [":::{admonition} Exercise 15 - Bonus\n", ":class: exercise\n", "Is it better to sample 2 x 10 ns from the same starting structure or 1 x 20 ns in order to explore conformational space efficiently? \n", ":::"]}], "metadata": {"celltoolbar": "Tags", "colab": {"provenance": []}, "file_extension": ".py", "gpuClass": "standard", "hide_input": false, "kernelspec": {"display_name": "Computational Chemistry", "language": "python", "name": "compchem"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12"}, "mimetype": "text/x-python", "name": "python", "npconvert_exporter": "python", "pygments_lexer": "ipython3", "varInspector": {"cols": {"lenName": 16, "lenType": 16, "lenVar": 40}, "kernels_config": {"python": {"delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())"}, "r": {"delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) "}}, "types_to_exclude": ["module", "function", "builtin_function_or_method", "instance", "_Feature"], "window_display": false}, "version": 3, "widgets": {"application/vnd.jupyter.widget-state+json": {"state": {}, "version_major": 2, "version_minor": 0}}}, "nbformat": 4, "nbformat_minor": 4}