Now that PDBbind 2020 has been released, I want to draw some attention to an issue with using the SDF files that are supplied in the PDBbind refined set 2020.
Normally, SDF files save the chirality information of compounds in the atom block of the file which is shown belowas a snipped of the full sdf file for the ligand of PDB entry 4qsv. The column that defines chirality is marked in red.
As you can see, all columns shown here are 0. The SDF files supplied by PDBbind for some reason do NOT encode chirality information explicitly. This will be a problem when using RDKit to read the molecule and transform it into a smiles string. By using the following commands to read the ligand for 4qsv from PDBBind 2020 and write a SMILES string, we get:
import rdkit.Chem as Chem from rdkit.Chem import Draw from rdkit.Chem import AllChem sdf_mol = Chem.SDMolSupplier("4qsv_ligand.sdf")[0] print(Chem.MolToSmiles(sdf_mol)) >>> 'Cc1cn(C2CC(O)C(CO)O2)c(=O)[nH]c1=O'
As you can see, the SMILES is racemic. RDKit is unable to read the chirality from the 3D structure of the SDF file by default. This is extremely dangerous, especially for docking based studies. Most docking studies start by converting the SDF to SMILES in order to use them to generate a new, unbiased 3D conformation of the ligand as a starting pose for docking. When using converted SMILES from an SDF file from PDBbind without special changes, the conformer generation software used will be forced to generate a chiral conformer from a racemic SMILES and will choose a random enantiomer of the ligand, instead of the correct configuration. This is obviously extremely problematic and leads to non-reproducible (if you randomly hit the correct isomer) and straight up incorrect results (if you generate the wrong isomer).
There is fortunately, an easy fix: When reading in the MOL2 file of the same ligand supplied by PDBbind, we get the following:
mol2_mol = Chem.MolFromMol2File("4qsv_ligand.mol2") Chem.MolToSmiles(mol2_mol, isomericSmiles=True) >>> 'Cc1cn([C@H]2C[C@H](O)[C@@H](CO)O2)c(=O)[nH]c1=O'
This time the chirality is preserved. Unlike the SDF reader, RDKit’s MOL2 reader function is able to handle chirality correctly. Usually, the SDF format is preferred by chem-informaticians, but you might have to make an exception here.
However, if you are desperate to use an SDF file over the MOL2 file, there are 2 ways to fix the issue (but both of them are not 100% reliable).
1) RDKit AssignAtomChiralTagsFromStructure
There are a series of functions from RDKit that are able to assign chirality to 3D structures of ligands. You can find them on the Chem.rdmolops documentation of RDKit, but these are not necessarily reliable in my experience, so use them at your own risk!
For our example 4qsv, the function rdmolops.AssignAtomChiralTagsFromStructure() works and is able to assign the stereochemistry correctly. We use it this way:
Chem.rdmolops.AssignAtomChiralTagsFromStructure(sdf_mol) print(Chem.MolToSmiles(sdf_mol)) >>> 'Cc1cn([C@H]2C[C@H](O)[C@@H](CO)O2)c(=O)[nH]c1=O'
The resulting SMILES now correctly shows the chirality information and the SMILES string is identical to the one obtained when reading the MOL2 file.
Openbabel 3.1
Interestingly, Openbabel v3.1 can assign the chiral tags in the SDF file correctly (most of the time), when parsing the file. However, openbabel is not 100% reliable, and sometimes might alter other parts of the molecule, such as the bond order when passing and converting molecules. We can run the following command from the terminal to parse the old SDF and create a new SDF file.
obabel input.sdf -O output.sdf
The output SDF, now with explicit chirality tags is shown below.
When loading the new SDF into RDKit, we get the correct stereochemistry.
sdf_mol2 = Chem.SDMolSupplier("4qsv_ligand_obabel.sdf")[0] print(Chem.MolToSmiles(sdf_mol2)) >>> 'Cc1cn([C@H]2C[C@H](O)[C@@H](CO)O2)c(=O)[nH]c1=O'
I hope this is useful in navigating PDBbind and will hopefully help in increasing awareness for proper pre-processing workflows for docking and working with small molecules in general.