Fragmenstein is a Python module that combine hits or position a derivative following given templates by being very strict in obeying them. This is done by creating a “monster”, a compound that has the atomic positions of the templates, which then reanimated by very strict energy minimisation. This is done in two steps, first in RDKit with an extracted frozen neighbourhood and then in PyRosetta within a flexible protein. The mapping for both combinations and placements are complicated, but I will focus here on a particular step the minimisation, primarily in answer to an enquiry, namely how does the RDKit minimisation work.
Normal run
Choosing as a test the data from Gahbauer et al. 2023 (where Fragmenstein is used to find an awesome hit), I will do a normal merge of the hits from PDB:5RSW and PDB:5RUE, while placing PDB:5SQJ against these. Parenthetically, I do not use the word “dock” as it does not test MCMC rototranslations of pre-generating conformers and instead creates a conformer and fixes it with not conformer resampling —the PyRosetta part uses a custom FastRelax script with no rotamers and with extra constraints.
In the footnote below the data is prepared.
In the footnote a dataclass instance with the three structures was created with values accession
, clean_pdbblock
and mol
. Now let’s combine them.
from fragmenstein import Victor, Igor apo_pdbblock = '\n'.join([l for l in template1.clean_pdbblock.split('\n') if 'ATOM' in l]) Igor.init_pyrosetta() vicky = Victor([template1.mol, template2.mol], pdb_block=apo_pdbblock) vicky.combine(long_name='merger') print(vicky.summarize())
The latter gave a potential of -14.1 kcal/mol with 23 constrained atoms and none unconstrained and an RMSD of 0.23Å and run in 17s. Sounds great, so let’s have a gander. Parenthetically, Fragmenstein has nglview display options, but unfortunately this no longer works in Colab and with newer IPython widget module.
view = py3Dmol.view(width=800, height=400) view.addModel(apo_pdbblock, 'pdb') view.setStyle({'chain': 'A'}, {'cartoon': {'color': 'turquoise'}}) for hit in vicky.hits: view.addModel(Chem.MolToMolBlock(hit), 'mol') view.setStyle({'model': -1}, {'stick': {'colorscheme': 'whiteCarbon'}}) view.addModel(Chem.MolToMolBlock(vicky.minimized_mol), 'mol') view.setStyle({'model': -1}, {'stick': {'colorscheme': 'yellowCarbon'}}) view.zoomTo({'model': -1}) view.show()
Now let’s place. Parenthetically, I should say that I use Fragmenstein like this when testing hypotheses, for large scale mergers I submit HPC jobs running the fragmenstein pipeline
.
vicky = Victor([template1.mol, template2.mol], pdb_block=apo_pdbblock) vicky.place(derivative.SMILES, long_name='placed')
The placement worked well (1Å RMSD), although I ought to mention that Laboratory (the pipeline, expand_isomers
of .place
) makes Victor try all stereoisomers, as a result the resulting stereoisomer will likely be a product of the monster step. About the classes, the class Monster
handles the making of the stitched together compound, Igor
the PyRosetta minimisation (Fritz
the OpenMM minimisation), Victor
orchestrates the two, while Laboratory
does multiple mergers or/and placements. There is also Walter
which sets up the scene with synthetic tests —although it is .
But what if 5RSW
did not exist? What happens? Let’s throw Fragmenstein under the bus as it has no idea how to best place the other part.
This is because in order to compromise with the atomic positions especially between mismatching templates it does not do a easy constrained embedding via RDKit AllChem.EmbedMolecule
or even AllChem.ConstrainedEmbed
, as this would fail. Parenthetically, a water could be provided as a hit along with a simple map override. Sometimes, the list in Victor(...).monster.mol_options
might have a satisfactory solution.
However, here we want to dissect the workings of Monster. Truth be told, the above is actually a third run of the code as the first run gave 0.9Å relative to the actual conformer of the derivative compound (below) and the second 1.2Å, both of which are good. The fact that each run gives a slightly different outcome for the unconstrained part is rather meaningless without knowing the crystallographic conformer: one needs to balance risk when shortlisting compounds and, if that second hit or the charged native ligand did not exist, few scientists would be that profligate to take that gamble.
Neighbourhood
Monster can do a RDKit based minimisation using MMFF in a frozen neighbourhood.
This takes a fraction of a second, but is imperfect so a second minimisation in the protein is done. It is imperfect as it in vacuo, the neighbourhood is fixed and the score is internal energy. The latter is independent of entropy whereas in PyRosetta the hybrid score function is better at accounting for this as its not pure physics-based and instead has fitted values (fudge factors in purist lingo) even if it is in implicit solvent (Lazaridis-Karplus solvation potential) and the torsions are not explicitly there —for a quick overview of thermodynamics see this previous post).
On the operations, the first step is getting the neighbourhood.
Victor, which handles the apo PDB block, calls monster.get_neighborhood(apo, cutoff, mol)
with mol
being optional as otherwise monster.positioned_mol
is used (code). The PDB block is read into RDKit (it lacked altLoc see footnote) and then the method gets every atom in the protein Chem.Mol within a atom-to-atom cutoff distance and then expands any aromatic atoms to the full aromatic ring.
# Extract the neighbourhood! neigh: Chem.Mol = vicky.monster.get_neighborhood(vicky.apo_pdbblock, 8.) # `Monster.get_neighborhood(None, vicky.apo_pdbblock, 8., mol)` would have done the same # view it view = py3Dmol.view(width=800, height=400) view.addModel(Chem.MolToMolBlock(neigh), 'mol') view.setStyle({'model': -1}, {'stick': {'colorscheme': 'cyanCarbon'}}) view.addModel(Chem.MolToMolBlock(vicky.monster.positioned_mol), 'mol') view.setStyle({'model': -1}, {'stick': {'colorscheme': 'magentaCarbon'}}) view.zoomTo({'model': -1}) view.show()
Minimise
The next step in minimisation where the neighbourhood atoms are, in molecular mechanics parlance, “frozen” (i.e. unable to move). Parenthetically, crystallographers use the word “constrained” for this and “restrained” for a constrained atom (say with a harmonic tether): personally I feel it is best avoid using that terminology.
The minimisation is done by monster.mmff_minimize
, which substitutes the dummy atoms for a carbon (which would crash otherwise), then the following happens (with some extra steps such as reruns):
# Given atom_indices_to_freeze: List[int] atom_indices_to_tether: List[int] mol: Chem.Mol ff_max_displacement: float ff_constraint: float # Run p: AllChem.ForceField.MMFFMolProperties = AllChem.MMFFGetMoleculeProperties(mol, 'MMFF94') ff: AllChem.ForceField.ForceField = AllChem.MMFFGetMoleculeForceField(mol, p) ff.Initialize() for i in atom_indices_to_freeze: ff.AddFixedPoint(i) for i in atom_indices_to_tether: ff.MMFFAddPositionConstraint(i, maxDispl=ff_max_displacement, forceConstant=ff_constraint) ff.Minimize() ff.CalcEnergy()
Looking at the C-code for the function ff.MMFFAddPositionConstraint
, one sees that this is a harmonic force: force_constant times the square of the difference between distance and max_displacement_constant (zero if distance is less than max_displacement_constant).
The whole operating acts in place on the Chem.Mol.
The last step is extracting the compound from the combined mol–neighbourhood system with is done via monster.extract_from_neighborhood
. The latter actually removes the neighbourhood before calling Chem.GetMolFrags
because when things go super-duper badly wrong there may be new bonds.
Example
So lets use an externally generated conformers and find the best using this code.
First, let’s fix the template and make a bunch of embedded conformers of the derivative constrained against it (below target
). Fragmenstein is fine with imperfect matches, but for this walkthrough we will use Chem.Mol.GetSubstructMatch
.
from fragmenstein import Monster # no hydroxy and carboxy > amide in target monstah = Monster([template1.mol]) monstah.place_smiles('Nc1ccc(C(=O)N)cc1') mod_template: Chem.Mol = monstah.positioned_mol
Next let’s generate conformers but with certain atoms fixed. Note that AllChem.EmbedMultipleConfs
will drift so needs realigning.
target = Chem.MolFromSmiles( derivative.SMILES ) mod_conf: Chem.Conformer = mod_template.GetConformer() coords = {ti: mod_conf.GetAtomPosition(qi) for qi, ti in enumerate(target.GetSubstructMatch( mod_template ))} AllChem.EmbedMultipleConfs(target, coordMap=coords, numConfs=10) atom_map = [(ti, qi) for qi, ti in enumerate(fauxstah.positioned_mol.GetSubstructMatch(mod_template))] # fix drift for ci in range(target.GetNumConformers()): assert rdMolAlign.AlignMol(target, mod_template, prbCid=ci, atomMap=atom_map) < .5 # By definition this will never happen, but do be aware of rounding errors
So let’s minimise!
from typing import List from rdkit.Chem import rdMolAlign from fragmenstein import MinizationOutcome # just a dataclass for typehinting # make a faux monster fauxstah = Monster.__new__(Monster) fauxstah.positioned_mol = Chem.Mol(target) conf: Chem.Conformer solutions: List[MinizationOutcome] = [] for conf in target.GetConformers(): fauxstah.positioned_mol.RemoveAllConformers() fauxstah.positioned_mol.AddConformer( conf ) neighborhood: Chem.Mol = fauxstah.get_neighborhood(apo_pdbblock, cutoff=8.) out = fauxstah.mmff_minimize(fauxstah.positioned_mol, neighborhood=neighborhood, ff_max_displacement=0.1, ff_constraint=5.0, allow_lax=True) solutions.append(out) solutions = sorted(solutions, key=lambda mo: mo.U_post) # let's cheat! RMSD against the actual crystallographic one. for opt in [opt.mol for opt in solutions]: print(rdMolAlign.CalcRMS(derivative.mol, opt))
The third entry of the internal energy sorted conformers is the closest (1.0Å). Obviously, this is cheating as that info would not be available.
Let’s have a gander of the most similar:
The solutions from the conformer hack
Footnote: data generation
Fragmenstein in the demo submodule already has some examples, but here I want to operate from first principles.
First define the accession codes, lets use a dataclass
instance to hold the data:
from dataclasses import dataclass from rdkit import Chem @dataclass class Template: accession: str chemcomp: str SMILES: str raw_pdbblock: str = None clean_pdbblock: str = None mol: Chem.Mol = None derivative = Template(accession='5SQJ', chemcomp='QRU', SMILES='c1cc2c(c(c1)O)C[C@H]([C@H]2NC(=O)c3ccc(cc3)NC(=O)NC4CC4)C(=O)[O-]') template1 = Template(accession='5RUE', chemcomp='BHA', SMILES='c1cc(c(cc1N)O)C(=O)[O-]') template2 = Template(accession='5RSW', chemcomp='6FZ', SMILES='c1ccc2c(c1)CC(C2)C(=O)[O-]')
Second, let’s retrieve the PDB blocks and remove pesky altloc (RDKit hates them):
import requests from typing import Callable def remove_altloc(pdbblock: str) -> str: """Removes all altlocs and actually segi duplicates. Test line: ATOM 1 N MET A 1 0.000 0.000 0.000 1.00 60.69 N """ lines: List[str] = [] seen: List[Tuple[str, str, int]] = [] for line in pdbblock.split('\n'): if 'ANISOU' in line: continue # skip if line[:4] != 'ATOM' and line[:6] != 'HETATM': lines.append(line) continue atom_info = line[12:16].strip(), line[21].strip(), int(line[22:26].strip()) if atom_info not in seen: lines.append(f'{line[:16]} {line[17:]}') seen.append(atom_info) else: # skip pass return '\n'.join(lines) def fetch(pdb_acc: str) -> str: response: requests.Response = requests.get(f'https://files.rcsb.org/download/{pdb_acc}.pdb') response.raise_for_status() return response.text model: Template for model in (derivative, template1, template2): model.raw_pdbblock=fetch(model.accession) model.clean_pdbblock = remove_altloc( model.raw_pdbblock)
Let’s extract the hits
from fragmenstein import Monster, Victor for model in (template1, template2, derivative): model.mol = Victor.extract_mol(model.accession, block=model.clean_pdbblock, ligand_resn=model.chemcomp, smiles=model.SMILES, removeHs=True, proximityBonding=False, throw_on_error=True) model.mol.SetProp(model.chemcomp)
And lastly let’s have a gander.
import py3Dmol def get_view(pdbblock: str, resn: str, chain='A', width=800, height=400) -> py3Dmol.view: view = py3Dmol.view(width=width, height=height) view.addModel(pdbblock, 'pdb') view.setStyle({'chain': chain}, {'cartoon': {'color': 'turquoise'}}) view.setStyle({'chain': {'$ne': chain}}, {}) view.setStyle({'within':{'distance':'5', 'sel':{'resn':resn}}}, {'stick': {}}) view.setStyle({'resn':resn}, {'stick': {'colorscheme': 'yellowCarbon'}}) view.zoomTo({'resn':resn}) return view model = derivative view: py3Dmol.view = get_view(model.clean_pdbblock, model.chemcomp) view.show()