For a long time crystallographers and subsequently the authors of AlphaFold2 had you believe that proteins are a static group of atoms written to a .pdb file. Turns out this was a HOAX. If you don’t want to miss out on the latest trend of working with dynamic structural ensembles of proteins this blog post is exactly right for you. MDAnalysis is a python package which as the name says was designed to analyse molecular dyanmics simulation and lets you work with trajectories of protein structures easily.
Before we start install MDAnalysis through pip.
pip install MDAnalysis
The main data structure MDAnalysis uses is the Universe
. You can create a Universe
by loading a trajectory file and a topology file. The trajectory file contains the trajectory of the protein structure as a list of coordinates. Typical formats are .xtc or .dcd. The topology file provides information on the identity of atoms and their connectivity. A pdb file can be used to this purpose.
import MDAnalysis as mda
u = mda.Universe('topology.pdb', 'trajectory.xtc')
The Universe
now contains all the information of the protein trajectory and lets you perform operations on it.
Through the Universe
specific groups of residues and atoms and individual atoms can be accessed. Residues and atoms can be selected by indexing or a string selection.
# select all residues
print(u.residues)
<ResidueGroup with 259 residues>
# select all C-alphas of alanine and glutamate residues
print(u.select_atoms('resname ALA or resname GLU and name CA'))
<AtomGroup with 25 atoms>
# select first atom
u.atoms[0]
<Atom 1: N of type N of resname ASP, resid 1 and segid P000 and altLoc >
Information can be accessed from ResidueGroup
, AtomGroup
and Atom
objects.
# residue names
u.residues[:3].resnames
array(['ASP', 'ILE', 'GLN'], dtype=object)
# atom coordinates in initial frame
u.atoms[:3].positions
array([[224.13002 , 30.02 , 729.97003 ],
[224.34003 , 29.250002, 730.65 ],
[223.12 , 30.260002, 730.04004 ],
...,
[240.33002 , 61.730003, 749.9 ],
[240.64001 , 62.640003, 749.09 ],
[241.05002 , 60.720005, 750.13 ]], dtype=float32)
Accessing atom coordinates as above will return the positions in the first frame in the trajectory. If you want to access subsequent frames you need to work with the Universe.trajectory attribute. Information on frames is accessed by iterating through the trajectory or indexing it. At any stage the Universe
will only contain information of a single timestep.
### radius of gyration over time
rgyr = []
time = []
protein = u.select_atoms('protein')
for ts in u.trajectory: # loop through individual frames in the trajectory
time.append(u.trajectory.time)
rgyr.append(protein.radius_of_gyration())
### backbone RMSD against time
from MDAnalysis.analysis import rms
u.trajectory[0] # access first frame by indexing
bb_0 = u.select_atoms('backbone').positions
time = []
rmsd = []
for ts in u.trajectory:
time.append(u.trajectory.time)
bb_t = u.select_atoms('backbone').positions
rmsd.append(rms.rmsd(bb_0, bb_t))
For more informtioin and tutorials on how to use the package checkout the MDAnalysis documentation: https://www.mdanalysis.org/