When docking, using software like AutoDock Vina, you must prepare your ligand by protonating the molecule, generating 3D coordinates, and converting it to a specific file format (in the case of Vina, PDBQT). Docking software typically needs the protein and ligand file inputs to be written on disk. This is limiting as generating 10,000s of files for a large virtual screen can be annoying and hinder the speed at which you dock.
Fortunately, the Forli group in Scripps Research have developed a Python package, Meeko, to prepare ligands directly from SMILES or other molecule formats for docking to AutoDock 4 or Vina, without writing any files to disk. This means you can dock directly from a single file containing all the SMILES of the ligands you are investigating!
Meeko also saves the output poses to memory as an RDKit object, instead of to disk, allowing RDKit post-processing to be done before saving to file and speeding the overall efficiency of your docking pipeline. Each of these pre- and post-processing steps could require the creation of a new file so for each docked ligand, you could end up with 2-4 intermediate files which is circumvented using this package. Meeko, in my opinion, is an invaluable tool to be added to your docking pipeline.
The repo can be found at the following link: https://github.com/forlilab/Meeko
To install, just pip install it in a Python 3.7 environment along with the Python wrapper for Vina and its dependencies.
pip install rdkit meeko vina scipy pandas
As an example of how it works, we will be docking Paxlovid, the SARS-CoV-2 small molecule therapeutic, to the SARS-CoV-2 Mpro (PDB ID: 7JKT).
paxlovid = 'CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C'
First, we will load our ligand from SMILES into RDKit, add hydrogens (without regard to pH) and generate 3D coordinates for the ligand.
lig = rdkit.Chem.MolFromSmiles(paxlovid)
protonated_lig = rdkit.Chem.AddHs(lig)
rdkit.Chem.AllChem.EmbedMolecule(protonated_lig)
The protonated, 3D RDKit type molecule is converted to a PDBQT string directly using the MoleculePreparation class from Meeko.
meeko_prep = meeko.MoleculePreparation()
meeko_prep.prepare(protonated_lig)
lig_pdbqt = meeko_prep.write_pdbqt_string()
The PDBQT file for our preprepared receptor is loaded from disk into the Vina docking class. Vina sadly does not allow us to load receptor PDBQT files from string format so this cannot be done in the same manner as the ligand. Then we set, directly from string type, the prepared ligand for docking.
v = vina.Vina(sf_name='vina', verbosity=0)
v.set_receptor('7jkv.pdbqt')
v.set_ligand_from_string(lig_pdbqt)
To specify where we are docking, we load the original ligand for our receptor, PDB: 7JKV, and take the centroid of its 3D coordinates and set it as the centre of our 25x25x25Å pose search space.
previous_ligand = next(rdkit.Chem.SDMolSupplier('7jkv_C_V7G.sdf'))
centroid = rdkit.Chem.rdMolTransforms.ComputeCentroid(previous_ligand.GetConformer())
We then compute the docking maps, run the docking, and generate 20 poses which are also written as a PDBQT string as well.
v.compute_vina_maps(center=[centroid.x, centroid.y,
centroid.z], box_size=[25, 25, 25])
v.dock(exhaustiveness=12, n_poses=20)
output_pdbqt = v.poses(n_poses=20)
We can then convert the PDBQT files of the 20 poses to RDKit objects, re-adding in the bond order (as PDBQT files do not contain bond order information) and then write to an SDF file.
pmol = PDBQTMolecule(output_pdbqt)
f = Chem.SDWriter('docking_results.sdf')
for pose in pmol:
output_rdmol = pmol.export_rdkit_mol()
output_rdmol_w_bond_order = AllChem.AssignBondOrdersFromTemplate(
lig, output_rdmol)
f.write(output_rdmol_w_bond_order)
f.close()
The best ranked docked pose in the receptor can be seen below.
So, there we have it, we have docked a SMILES string directly to an SDF file without generating any intermediate files!