Monthly Archives: March 2023

The ultimate modulefile for conda

Environment modules is a great tool for high-performance computing as it is a modular system to quickly and painlessly enable preset configurations of environment variables, for example a user may be provided with modulefile for an antiquated version of a tool and a bleeding-edge alpha version of that same tool and they can easily load whichever they wish. In many clusters the modules are created with a tool called EasyBuild, which delivered an out-of-the-box installation. This works for things like a single binary, but for conda this severely falls short as there are many many configuration changes needed.

Continue reading →

BRICS Decomposition and Synthetic Accessibility

Recently I’ve been thinking a lot about how to decompose a compound into smaller fragments specifically for a retrosynthetic purpose. My question is: given a compound, can I return building blocks that are likely to synthesize together to produce this compound simply by breaking likely bonds formed in a reaction? A method that is nearly 15 years old named, breaking of retrosynthetically interesting chemical substructures (BRICS), is one approach to do this. Here I’ll explore how BRICS can reflect synthetic accessibility.

Continue reading →

Experience at a Keystone Symposium

From 19th-22nd February I was fortunate enough to participate in the joint Keystone Symposium on Next-Generation Antibody Therapeutics and Multispecific Immune Cell Engagers, held in Banff, Canada. Now in their 51st year, the Keystone Symposia are a comprehensive programme of scientific conferences spanning the full range of topics relating to human health, from studies on fundamental bodily processes through to drug discovery.

Continue reading →

Ligands of CASF-2016

CASF-2016 is a commonly used benchmark for docking tools. Unfortunately, some of the provided ligand files cannot be loaded using RDKit (version 2022.09.1) but there is an easy remedy.

Continue reading →

Can AlphaFold predict protein-protein interfaces?

Since its release, AlphaFold has been the buzz of the computational biology community. It seems that every group in the protein science field is trying to apply the model in their respective areas of research. Already we are seeing numerous papers attempting to adapt the model to specific niche domains across a broad range of life sciences. In this blog post I summarise a recent paper’s use of the technology for predicting protein-protein interfaces.

Continue reading →

LaTeX Beamer Template with Logos

Alternative Title: The tragic story of how I got trapped making slides with latex.

Typically after giving a presentation at least one person will approach me and ask if they could have access to my custom latex template to make slides with beamer that don’t look rubbish.

TL;DR Yes you can: https://github.com/npqst/latex-beamer-template

Continue reading →

Atom mapping with RXNMapper

When recently looking at some reaction data, I was confronted with the problem of atom-to-atom mapping (AAM) and what tools are available to tackle it. AAM refers to the process of mapping individual atoms in reactants to their corresponding atoms in the products, which is important for defining a reaction template and identifying which bonds are being formed and broken. This has many downstream uses for computational chemists, such as for reaction searching and forward and retrosynthesis planning¹. The problem is that many reaction databases do not contain these mappings, and annotation by expert chemists is impractical for databases containing thousands (or more) data points.

Continue reading →

How to easily use pharmacophoric atom features to turn ECFPs into FCFPs

Today’s post builds on my earlier blogpost on how to turn a SMILES string into an extended-connectivity fingerprint using RDKit and describes an interesting and easily implementable modification of the extended-connectivity fingerprint (ECFP) featurisation. This modification is based on representing the atoms in the input compound at a different (and potentially more useful) level of abstraction.

We remember that each binary component of an ECFP indicates the presence or absence of a particular circular subgraph in the input compound. Circular subgraphs that are structurally isomorphic are further distinguished according to their inherited atom- and bond features, i.e. two structurally isomorphic circular subgraphs with distinct atom- or bond features correspond to different components of the ECFP. For chemical bonds, this distinction is made on the basis of simple bond types (single, double, triple, or aromatic). To distinguish atoms, standard ECFPs use seven features based on the Daylight atomic invariants [1]; but there is also another less commonly used and often overlooked version of the ECFP that uses pharmacophoric atom features instead [2]. Pharmacophoric atom features attempt to describe atomic properties that are critical for biological activity or binding to a target protein. These features try to capture the potential for important chemical interactions such as hydrogen bonding or ionic bonding. ECFPs that use pharmacophoric atom features instead of standard atom features are called functional-connectivity fingerprints (FCFPs). The exact sets of standard- vs. pharmacophoric atom features for ECFPs vs. FCFPs are listed in the table below.

In RDKit, ECFPs can be changed to FCFPs extremely easily by changing a single input argument. Below you can find a Python/RDKit implementation of a function that turns a SMILES string into an FCFP if use_features = True and into an ECFP if use_features = False.

# import packages
import numpy as np
from rdkit.Chem import AllChem

# define function that transforms a SMILES string into an FCFP if use_features = True and into an ECFP if use_features = False
def FCFP_from_smiles(smiles,
                     R = 2,
                     L = 2**10,
                     use_features = True,
                     use_chirality = False):
    """
    Inputs:
    
    - smiles ... SMILES string of input compound
    - R ... maximum radius of circular substructures
    - L ... fingerprint-length
    - use_features ... if true then use pharmacophoric atom features, if false then use standard DAYLIGHT atom features
    - use_chirality ... if true then append tetrahedral chirality flags to atom features
    
    Outputs:
    - np.array(feature_list) ... FCFP/ECFP with length L and maximum radius R
    """
    
    molecule = AllChem.MolFromSmiles(smiles)
    feature_list = AllChem.GetMorganFingerprintAsBitVect(molecule,
                                                         radius = R,
                                                         nBits = L,
                                                         useFeatures = use_features,
                                                         useChirality = use_chirality)
    return np.array(feature_list)

The use of pharmacophoric atom features makes FCFPs more specific to molecular interactions that drive biological activity. In certain molecular machine-learning applications, replacing ECFPs with FCFPs can therefore lead to increased performance and decreased learning time, as important high-level atomic properties are presented to the learning algorithm from the start and do not need to be inferred statistically. However, the standard atom features used in ECFPs contain more detailed low-level information that could potentially still be relevant for the prediction task at hand and thus be utilised by the learning algorithm. It is often unclear from the outset whether FCFPs will provide a substantial advantage over ECFPs in a given application; however, given how easy it is to switch between the two, it is almost always worth trying out both options.

[1] Weininger, David, Arthur Weininger, and Joseph L. Weininger. “SMILES. 2. Algorithm for generation of unique SMILES notation.” Journal of Chemical Information and Computer Sciences 29.2 (1989): 97-101.

[2] Rogers, David, and Mathew Hahn. “Extended-connectivity fingerprints.” Journal of Chemical Information and Modeling 50.5 (2010): 742-754.

Writing “vector trajectories” with cpptraj

The program cpptraj, written by Daniel Roe (https://github.com/Amber-MD/cpptraj) and distributed Open Source with the AmberTools package (https://ambermd.org/AmberTools.php), is a powerful tool for analysis of molecular dynamics simulations. In addition to all of the expected analyses like Root Mean Square Deviation and native contacts, cpptraj also includes a suite of vector algebra functions.

While this vector algebra functionality is fairly well known and easy to find in the documentation, I think it is less well known that cpptraj can write trajectories of the computed vectors. These trajectories can then be loaded into Visual Molecular Dynamics (VMD) alongside the analysed trajectory and played as a movie. This functionality is a valuable tool for debugging your vector calculations to make sure they are doing precisely what you intend. It may also prove useful for generating visualizations of vectors alongside molecular structures for publications.

The cpptraj script below reads in an Amber parameter file and coordinate file and then calculates the angle between two planes.

parm 7bbg_fixed.prmtop
trajin 7bbg_fixed.rst7
vector v1 mask :65@NH1 :65@NH2
vector v2 mask :65@NH1 :65@NE
vector v3 mask :64@CA :66@CA
vector v4 mask :66@CA :68@CA
vectormath vec1 v1 vec2 v2 crossproduct name n1
vectormath vec1 v3 vec2 v4 crossproduct name n2
vectormath vec1 n1 vec2 n2 dotangle out 7bbg_ref_plane_angle.dat

The first plane is defined by two vectors in the plane of the guanidino group of a R65 residue (v1 and v2); the second plane is defined by two vectors between CA atoms of amino acids in the alpha helix containing R65 (v3 and v4). The first two vectormath calls determine the normal vectors to the planes and the final vectormath line computes the angle between the normal vectors. Taken together, these commands compute the angle between the arginine side chain and a plane passing through the CA atoms of the alpha helix. Let’s check that the vectors {v1, v2, v3, v4} are being computed correctly.

parm 7bbg_fixed.prmtop
trajin 7bbg_fixed.rst7
vector v1 mask :65@NH1 :65@NH2
vector v2 mask :65@NH1 :65@NE
vector v3 mask :64@CA :66@CA
vector v4 mask :66@CA :68@CA
run
writedata vectors.mol2 v1 v2 v3 v4 vectraj trajfmt mol2

The resulting vector trajectory vectors.mol2 can be loaded directly into VMD without a topology. Note that in this case we only analyzed a single frame, but you can run this same procedure on DCD files, too. This is what I get when I load the vectors into VMD alongside the structure:

The vectors are shown as red/pink line segments. The right structure is identical to the left but with the alpha helix cartoon model removed. The blue spheres indicate the locations of the CA atoms used to define the plane of the helix.

I hope this vector trajectory functionality will be helpful to a few people who like to neurotically check their analyses like I do. You can download the example prmtop and rst7 files below. Note that you should rename them to remove the extra “.txt” file extension before attempting to use them for anything.

The information in this blog post is adapted from an Amber Archive post from Daniel Roe, dated 30-Oct-2018: http://archive.ambermd.org/201811/0058.html

Files for the example:

7bbg_fixed.rst7_Download

7bbg_fixed.prmtop Download

Oxford Protein Informatics Group

or "OPIG" to friends

Monthly Archives: March 2023

The ultimate modulefile for conda

BRICS Decomposition and Synthetic Accessibility

Experience at a Keystone Symposium

Ligands of CASF-2016

Can AlphaFold predict protein-protein interfaces?

LaTeX Beamer Template with Logos

Alternative Title: The tragic story of how I got trapped making slides with latex.

Atom mapping with RXNMapper

How to easily use pharmacophoric atom features to turn ECFPs into FCFPs

Writing “vector trajectories” with cpptraj