No-one ever observed a pdb file in nature. The experimental data we build our protein models from is not quite so nicely paramterised. The vast makority of models are fit into electron density maps, mostly produced from macromolecular crystalography or cryo-em.
Now admitedly even these electron density maps are not the raw experimental data either, but they’re a lot closer than models. So its worth knowing how to handle them.
We’ll be covering a few simple ways that you can load this data into friendly python formats to fiddle with using the standard python numpy based scientific stack.
Electron density data is primarily served in the .ccp4 format. If you’re coming from crystalography you are also likely to need to work with the reflections, which is primarily in .mtz. There are three main libraries in python for this:
ccp4 | mtz | |
cctbx | Yes | Yes |
GridDataFormats | Yes (limited) | No |
clipper | Yes | Yes |
CCTBX
CCTBX is the oldest and most feature complete library for dealing with electron desnity data. It can do everything you want, but getting it to do it is going to be hard. If you want to get any good at this theres going to be a lot of emailing the bullitin boards for help.
Pros | Cons |
Oldest and most complete crystallographic library | Hard to install |
Hard to use | |
Python 2.7 only | |
Needs special version of python incompatible with many other libraries | |
Functionally no documentation: you basically need to email the author if you want to know how something works (or even what its arguments are) |
Installing cctbx can be… non-trivial, and beyond the scope of this tutorial.
Loading a ccp4 map in cctbx is realtively simple:
from iotbx.file_reader import any_file
f = any_file(file)
xmap = f.file_object
GridDataFormats
GridDataFormats is in many ways the opposite of cctbx. Only a few years old, well documented, pythonic. Unfortuantely it is not very feature complete, lacking any functionality for dealing with mtzs and only limited functionality for ccp4 files.
Pros | Cons |
Easy to install | Does not handle symmetry |
Easy to use: pythonic | A little slow |
Good documentation | No reflection data |
Grid data formats is also easy to install! Simply
conda config --add channels condo-forge
conda install griddataformats
And you’re good! Loading a map is similarly intuitive:
from gridData import CCP4
g = CCP4()
g.read(file)
Clipper
Clipper is a
Pros | Cons |
Easy-ish to install | Low level compared to cctbx |
Easy-ish to use | Less easy to use than GridDataFormats |
Complete low level functionality for reflections and maps | Little dedicated python docs |
Very fast and well tested | |
Excellent c++ docs that are applicable to python |
Clipper is very easy to get if you don’t mind the slightly older SWIG wrapped version.
pip install clipper-python
Tristan Croll’s pybind11 wrapped clipper is preferable, but requires installing from source from: https://github.com/clipper-python/clipper-python/tree/pybind11
Loading a map in clipper is very straightforward if you are a C++ programmer, but requires a little thinking if you are used to python.
import clipper_python as clipper
Xmap = clipper.Xmap()
F = clipper.CCP4MAPFile()
f.open_read(file)
f.import_xmap(xmap)
f.close_read()
xmap.export_numpy()
Summary: probably jsut used GDF or clipper
CCTBX is a real pain to work with: I’d only use it if it was to interface to legacy code I didn’t want to reimplement
If you don’t need symmetry or electron density map specific functionality and don’t mind things being a little slow use GridDataFormats
Otherwise use Clipper