Category Archives: Protein-Ligand Docking

How to prepare a molecule for RDKit

RDKit is very fussy when it comes to inputs in SDF format. Using the SDMolSupplier, we get a significant rate of failure even on curated datasets such as the PDBBind refined set. Pymol has no such scruples, and with that, I present a function which has proved invaluable to me over the course of my DPhil. For reasons I have never bothered to explore, using pymol to convert from sdf, into mol2 and back to sdf format again (adding in missing hydrogens along the way) will almost always make a molecule safe to import using RDKit:

from pathlib import Path
from pymol import cmd

def py_mollify(sdf, overwrite=False):
    """Use pymol to sanitise an SDF file for use in RDKit.

    Arguments:
        sdf: location of faulty sdf file
        overwrite: whether or not to overwrite the original sdf. If False,
            a new file will be written in the form <sdf_fname>_pymol.sdf
            
    Returns:
        Original sdf filename if overwrite == False, else the filename of the
        sanitised output.
    """
    sdf = Path(sdf).expanduser().resolve()
    mol2_fname = str(sdf).replace('.sdf', '_pymol.mol2')
    new_sdf_fname = sdf if overwrite else str(sdf).replace('.sdf', '_pymol.sdf')
    cmd.load(str(sdf))
    cmd.h_add('all')
    cmd.save(mol2_fname)
    cmd.reinitialize()
    cmd.load(mol2_fname)
    cmd.save(str(new_sdf_fname))
    return new_sdf_fname

Post-processing for molecular docking: Assigning the correct bond order using RDKit.

AutoDock4 and AutoDock Vina are the most commonly used open-source software for protein-ligand docking. However, they both rely on a derivative of the “PDB” (Protein Data Base) file format: the “PDBQT” file (Protein Data Bank, Partial Charge (Q), & Atom Type (T)). In addition to the information contained in normal PDB files, PDBQT files have an additional column that lists the partial charge (Q) and the assigned AutoDock atom type (T) for each atom in the molecule. AutoDock atom types offer a more granular differentiation between atoms such as listing aliphatic carbons and aromatic carbons as separate AutoDock atom types.

The biggest drawback about the PDBQT format is that it does not encode for the bond order in molecules explicitly. Instead, the bond order is inferred based on the atom type, distance and angle to nearby atoms in the molecule. For normal sp3 carbons and molecules with mostly single bonds this system works fine, however, for more complex structures containing for example aromatic rings, conjugated systems and hypervalent atoms such as sulphur, the bond order is often not displayed correctly. This leads to issues downstream in the screening pipeline when molecules suddenly change their bond order or have to be discarded after docking because of impossible bond orders.

The solution to this problem is included in RDKit: The AssignBondOrdersFromTemplate function. All you need to do is load the original molecule used for docking as a template molecule and the docked pose PDBQT file into RDKIT as a PDB, without the bond order information. Then assign the original bond order from your template molecule. The following code snippet covers the necessary functions and should help you build a more accurate and reproducible protein-ligand docking pipeline:

#import RDKit AllChem
from rdkit import Chem
from rdkit.Chem import AllChem


#load original molecule from smiles
SMILES_STRING = "CCCCCCCCC" #the smiles string of your ligand
template = Chem.MolFromSmiles(SMILES_STRING)

#load the docked pose as a PDB file
loc_of_docked_pose = "docked_pose_mol.pdb" #file location of the docked pose converted to PDB file
docked_pose = AllChem.MolFromPDBFile(loc_of_docked_pose)

#Assign the bond order to force correct valence
newMol = AllChem.AssignBondOrdersFromTemplate(template, docked_pose)

#Add Hydrogens if desired. "addCoords = True" makes sure the hydrogens are added in 3D. This does not take pH/pKa into account. 
newMol_H = Chem.AddHs(newMol, addCoords=True)

#save your new correct molecule as a sdf file that encodes for bond orders correctly
output_loc = "docked_pose_assigned_bond_order.sdf" #output file name
Chem.MolToMolFile(newMol_H, output_loc)

Using normalized SuCOS scores.

If you are working in cheminformatics or utilise protein-ligand docking, then you should be aware of the SuCOS score, an open-source shape and chemical feature overlap metric designed by a former member of OPIG: Susan Leung.

The metric compares the 3D conformers of two ligands based on their shape overlap as well as their chemical feature overlap using the RDKit toolkit. Leung et al. show that SuCOS is able to select fewer false positives and false negatives when doing re-docking studies than other scoring metrics such as RMSD or Protein Ligand Interaction Fingerprints (PLIF) similarity scores and performs better at differentiating actives from decoys when tested on the DUD-E dataset.

Most importantly, SuCOS was designed with fragment based drug discovery in focus, where a smaller fragment ligand is elaborated or combined with other fragments to create a larger molecule, with hopefully stronger binding affinity. Unlike for example RMSD, SuCOS is able to quickly calculate an overlap score between a small fragment and a larger molecule, giving chemists an idea on how the fragment elaboration might interact with the protein. However, the original SuCOS algorithm was not normalized and could create scores of > 1 for some cases.

I’ve uploaded a normalised version of the original SuCOS algorithm as a GitHub fork of Susan’s original repository. You can find the normalised SuCOS algorithm here.

Hopefully this is helpful for anyone using the SuCOS algorithm and for all docking enthusiasts who are interested in an alternative way to evaluate their docked poses.

ORDER!: Returning bond order information to your docked poses

Common docking software, such as AutoDock Vina or AutoDock 4, require the ligand and receptor files to be converted into the PDBQT format. Once a correct pose has been identified, the pose will be produced also as a .pdbqt file.

Continue reading →

Doing rigid receptor docking? Consider using multiple structures!

Here it is. It’s finally happening. I’m actually writing a blog post about docking. Are the end times upon us? Perhaps. If by my next post I’m not back to my usual techie self, the horsemen may well be on their way.

If you’ve ever used, read about, or listened to a lab mate complain about protein-ligand docking, you’re likely familiar with the rigid receptor assumption. In this model, the active site of the protein is treated as completely rigid, with no side chain flexibility, and only the rotatable bonds in the ligand are allowed to move. The motivation behind this assumption is simple. The computational cost of sampling the conformational space of a ligand within a protein’s active site, and doing so with sufficient rigour so as to sample a near-native binding mode, grows rapidly with the number of rotatable bonds in the ligand. Further increasing the degrees of freedom in the system by incorporating receptor side chain flexibility compounds this problem, making the sampling of accurate binding modes for the ligand an incredibly expensive and difficult problem.

One compromise, if multiple structures with different active site conformations are available for the target protein, is to simply dock your ligands into multiple structures, and trust your scoring function (!!!) to pick out the best binding mode from across the different structures. This is a crude approximation to true flexible receptor docking which won’t capture fully any induced fit effects due to a particular ligand, but if the structures are available, this may offer a more computationally-feasible alternative to flexible docking.

A study earlier this year by Cleves and Jain illustrates this approach nicely. They dock the ligands of the DUD-E database into multiple structures for each target, in each case treating the receptor strucutre as completely rigid. Unsurprisingly, when the target is rigid and there is little structural variation in the active site across the structures, the choice of structure has little influence on the docking results. However, when the receptor is flexible, with clear structural variation across the active sites in the different structures, there is a strong impact on the poses generated by rigid-receptor docking. This effect translates directly into improved virtual screening performance when docking into multiple different structures, illustrating the value of considering the conformational space of the receptor, even when it is treated as rigid during the docking process.

Constrained docking for bump and hole methodology

Selectivity is an important trait to consider when designing small molecule probes for chemical biology. If you wish to use a small molecule to study a particular protein, but that small molecule is fairly promiscuous in its binding habits, there are risks that any effects you observe may be due to it binding other proteins with similarly shaped binding pockets, instead of your protein of interest.

Continue reading →

AutoDock 4 and AutoDock Vina

A recently just-released publication from Ngyuen et al. ing JCIM pointed out that while AutoDock Vina is faster, AutoDock 4 tends to have better correlation with experimental binding affinity.¹

[This post has been edited to provide more information about the cited paper, as well as providing additional citations.]

Ngyuyen et al. selected 800 protein-ligand complexes for 47 protein targets that had both experimental PDB structures complexed with a ligand, as well as their associated binding affinity values.

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Category Archives: Protein-Ligand Docking

How to prepare a molecule for RDKit

Post-processing for molecular docking: Assigning the correct bond order using RDKit.

Using normalized SuCOS scores.

ORDER!: Returning bond order information to your docked poses

Doing rigid receptor docking? Consider using multiple structures!

Constrained docking for bump and hole methodology

AutoDock 4 and AutoDock Vina