MDAnalysis: Work with dynamics trajectories of proteins

For a long time crystallographers and subsequently the authors of AlphaFold2 had you believe that proteins are a static group of atoms written to a .pdb file. Turns out this was a HOAX. If you don’t want to miss out on the latest trend of working with dynamic structural ensembles of proteins this blog post is exactly right for you. MDAnalysis is a python package which as the name says was designed to analyse molecular dyanmics simulation and lets you work with trajectories of protein structures easily.

Before we start install MDAnalysis through pip.

pip install MDAnalysis

The main data structure MDAnalysis uses is the Universe. You can create a Universe by loading a trajectory file and a topology file. The trajectory file contains the trajectory of the protein structure as a list of coordinates. Typical formats are .xtc or .dcd. The topology file provides information on the identity of atoms and their connectivity. A pdb file can be used to this purpose.

import MDAnalysis as mda

u = mda.Universe('topology.pdb', 'trajectory.xtc')

The Universe now contains all the information of the protein trajectory and lets you perform operations on it.

Through the Universe specific groups of residues and atoms and individual atoms can be accessed. Residues and atoms can be selected by indexing or a string selection.

# select all residues
print(u.residues)
<ResidueGroup with 259 residues>

# select all C-alphas of alanine and glutamate residues
print(u.select_atoms('resname ALA or resname GLU and name CA'))
<AtomGroup with 25 atoms>

# select first atom
u.atoms[0]
<Atom 1: N of type N of resname ASP, resid 1 and segid P000 and altLoc >

Information can be accessed from ResidueGroup, AtomGroup and Atom objects.

# residue names
u.residues[:3].resnames
array(['ASP', 'ILE', 'GLN'], dtype=object)

# atom coordinates in initial frame
u.atoms[:3].positions
array([[224.13002 ,  30.02    , 729.97003 ],
       [224.34003 ,  29.250002, 730.65    ],
       [223.12    ,  30.260002, 730.04004 ],
       ...,
       [240.33002 ,  61.730003, 749.9     ],
       [240.64001 ,  62.640003, 749.09    ],
       [241.05002 ,  60.720005, 750.13    ]], dtype=float32)

Accessing atom coordinates as above will return the positions in the first frame in the trajectory. If you want to access subsequent frames you need to work with the Universe.trajectory attribute. Information on frames is accessed by iterating through the trajectory or indexing it. At any stage the Universe will only contain information of a single timestep.

### radius of gyration over time
rgyr = []
time = []
protein = u.select_atoms('protein')

for ts in u.trajectory: # loop through individual frames in the trajectory
    time.append(u.trajectory.time)
    rgyr.append(protein.radius_of_gyration())


### backbone RMSD against time
from MDAnalysis.analysis import rms

u.trajectory[0] # access first frame by indexing
bb_0 = u.select_atoms('backbone').positions

time = []
rmsd = []
for ts in u.trajectory:
    time.append(u.trajectory.time)
    bb_t = u.select_atoms('backbone').positions
    rmsd.append(rms.rmsd(bb_0, bb_t))

For more informtioin and tutorials on how to use the package checkout the MDAnalysis documentation: https://www.mdanalysis.org/

Author