Protein-Ligand interaction fingerprints (PLIFs) are becoming more widely used to compare small molecules in the context of a protein target. A fingerprint is a bit vector that is used to represent a small molecule. Fingerprints of molecules can then be compared to determine the similarity between two molecules. Rather than using the features of the ligand to build the fingerprint, a PLIF is based on the interactions between the protein and the small molecule. The conventional method of building a PLIF is that each bit of the bit vector represents a residue in the binding pocket of the protein. The bit is set to 1 if the molecule forms an interaction with the residue, whereas it is set to 0 if it does not.
Constructing a PLIF therefore consists of two parts:
- Calculating the interactions formed by a small molecule from the target
- Collating this information into a bit vector.
Step 1 can be achieved by using the Protein-Ligand Interaction Profiler (PLIP). PLIP is an easy-to-use tool, that given a pdb file will calculate the interactions between the ligand and protein. This can be done using the online web-tool or alternatively using the command-line tool. Six different interaction types are calculated: hydrophobic, hydrogen-bonds, water-mediated hydrogen bonds, salt bridges, pi-pi and pi-cation. The command-line version outputs an xml report file containing all the information required to construct a PLIF.
Step 2 involves manipulating the output of the report file into a bit vector. RDKit is an amazingly useful Cheminformatics toolkit with great documentation. By reading the PLIF into an RDKit bit vector this allows the vector to be manipulated as an RDKit fingerprint. The fingerprints can then be compared using RDKit functionality very easily, for example, using Tanimoto Similarity.
EXAMPLE:
Let’s take 3 pdb files as an example. Fragment screening data from the SGC is a great sort of data for this analysis, as it contains lots of pdb structures of small hits bound to the same target. The data can be found here. For this example I will use 3 protein-ligand complexes from the BRD1 dataset: BRD1A-m004.pdb, BRD1A-m006.pdb and BRD1A-m009.pdb.
1.PLIP First we need to run plip to generate a report file for each protein-ligand complex. This is done using:
plipcmd -f BRD1A-m004.pdb -o m004 -x
plipcmd -f BRD1A-m006.pdb -o m006 -x
plipcmd -f BRD1A-m009.pdb -o m009 -x
A report file (‘report.xml’) is created for each pdb file within the directory m004, m006 and m009.
2. Get Interactions: Using a python script the results of the report can be collated using the function “generate_plif_lists” (shown below) on each report file. The function takes in the report file name, and the residues already found to be in the binding site (residue_list). “residue_list” must be updated for each molecule to be compared as the residues used to define the binding site can vary betwen each report file. The function then returns the updated “residue_list”, as well as a list of residues found to interact with the ligand: “plif_list_all”.
import xml.etree.ElementTree as ET
################################################################################
def generate_plif_lists(report_file, residue_list, lig_ident):
#uses report.xml from PLIP to return list of interacting residues and update list of residues in binding site
plif_list_all = []
tree = ET.parse(report_file)
root = tree.getroot()
#list of residue keys that form an interaction
for binding_site in root.findall('bindingsite'):
nest = binding_site.find('identifiers')
lig_code = nest.find('hetid')
if str(lig_code.text) == str(lig_ident):
#get the plifs stuff here
nest_residue = binding_site.find('bs_residues')
residue_list_tree = nest_residue.findall('bs_residue')
for residue in residue_list_tree:
res_id = residue.text
dict_res_temp = residue.attrib
if res_id not in residue_list:
residue_list.append(res_id)
if dict_res_temp['contact'] == 'True':
if res_id not in plif_list_all:
plif_list_all.append(res_id)
return plif_list_all, residue_list
###############################################################################
plif_list_m006, residue_list = generate_plif_lists('m006/report.xml',residue_list, 'LIG')
plif_list_m009, residue_list = generate_plif_lists('m009/report.xml', residue_list, 'LIG')
plif_list_m004, residue_list = generate_plif_lists('m004/report.xml', residue_list, 'LIG')
3. Read Into RDKit: Now we have the list of binding site residues and which residues are interacting with the ligand a PLIF can be generated. This is done using the function shown below (“generate_rdkit_plif”):
from rdkit import Chem, DataStructs
from rdkit.DataStructs import cDataStructs
################################################################################
def generate_rdkit_plif(residue_list, plif_list_all):
#generates RDKit plif given list of residues in binding site and list of interacting residues
plif_rdkit = DataStructs.ExplicitBitVect(len(residue_list), False)
for index, res in enumerate(residue_list):
if res in plif_list_all:
print 'here'
plif_rdkit.SetBit(index)
else:
continue
return plif_rdkit
#########################################################################
plif_m006 = generate_rdkit_plif(residue_list, plif_list_m006)
plif_m009 = generate_rdkit_plif(residue_list, plif_list_m009)
plif_m004 = generate_rdkit_plif(residue_list, plif_list_m004)
4. Play! These PLIFs can now be compared using RDKit functionality. For example the Tanimoto similarity between the ligands can be computed:
def similarity_plifs(plif_1, plif_2):
sim = DataStructs.TanimotoSimilarity(plif_1, plif_2)
print sim
return sim
###################################################################
print similarity_plifs(plif_m006, plif_m009)
print similarity_plifs(plif_m006, plif_m004)
print similarity_plifs(plif_m009, plif_m004)
The output is: 0.2, 0.5, 0.0.
All files used to generate the PLIFs cound be found here. Happy PLIF-making!