Tag Archives: Python

Comparing pose and affinity prediction methods for follow-up designs from fragments

In any task in the realm of virtual screening, there need to be many filters applied to a dataset of ligands to downselect the ‘best’ ones on a number of parameters to produce a manageable size. One popular filter is if a compound has a physical pose and good affinity as predicted by tools such as docking or energy minimisation. In my pipeline for downselecting elaborations of compounds proposed as fragment follow-ups, I calculate the pose and ΔΔG by energy minimizing the ligand with atom restraints to matching atoms in the fragment inspiration. I either use RDKit using its MMFF94 forcefield or PyRosetta using its ref2015 scorefunction, all made possible by the lovely tool Fragmenstein.

With RDKit as the minimizer the protein neighborhood around the ligand is fixed and placements take on average 21s whereas with PyRosetta placements, they take on average 238s (and I can run placements in parallel luckily). I would ideally like to use RDKit as the placement method since it is so fast and I would like to perform 500K within a few days but, I wanted to confirm that RDKit is ‘good enough’ compared to the slightly more rigorous tool PyRosetta (it allows residues to relax and samples more conformations with the longer runtime I think).

Continue reading

Pyrosetta for RFdiffusion

I will not lie: I often struggle to find a snippet of code that did something in PyRosetta or I spend hours facing a problem caused by something not working as I expect it to. I recently did a tricky project involving RFdiffusion and I kept slipping on the PyRosetta side. So to make future me, others, and ChatGTP5 happy, here are some common operations to make working with PyRosetta for RFdiffusion easier.

Continue reading

Quickly (and lazily) scale your data processing in Python

Do you use pandas for your data processing/wrangling? If you do, and your code involves any data-heavy steps such as data generation, exploding operations, featurization, etc, then it can quickly become inconvenient to test your code.

  • Inconvenient compute times (>tens of minutes). Perhaps fine for a one-off, but over repeated test iterations your efficiency and focus will take a hit.
  • Inconvenient memory usage. Perhaps your dataset is too large for memory, or loads in but then causes an OOM error during a mid-operation memory spike.
Continue reading

Mapping derivative compounds to parent hits

Whereas it is easy to say in a paper “Given the HT-Sequential-ITC results, 42 led to 113, a substituted decahydro-2,6-methanocyclopropa[f]indene”, it is frequently rather trickier algorithmically figure out which atoms map to which. In Fragmenstein, for the placement route, for example, a lot goes on behind the scenes, yet for some cases human provided mapping may be required. Here I discuss how to get the mapping from Fragmenstein and what goes on behind the scenes.

Continue reading

Using JAX and Haiku to build a Graph Neural Network


JAX

Last year, I had an opportunity to delve into the world of JAX whilst working at InstaDeep. My first blopig post seems like an ideal time to share some of that knowledge. JAX is an experimental Python library created by Google’s DeepMind for applying accelerated differentiation. JAX can be used to differentiate functions written in NumPy or native Python, just-in-time compile and execute functions on GPUs and TPUs with XLA, and mini-batch repetitious functions with vectorization. Collectively, these qualities place JAX as an ideal candidate for accelerated deep learning research [1].

JAX is inspired by the NumPy API, making usage very familiar for any Python user who has already worked with NumPy [2]. However, unlike NumPy, JAX arrays are immutable; once they are assigned in memory they cannot be changed. As such, JAX includes specific syntax for index manipulation. In the code below, we create a JAX array and change the 1^{st} element to a 4:

Continue reading

Under-rated or overlooked, these libraries might be helpful.

Discovering a library that massively simplifies the exact thing you just did right after you’ve finished doing the thing you needed to do has to be one of the top 14 worst things about writing code. You might think it’s a part of the life we’ve all chosen, but it doesn’t have to be. Beyond the popular libraries you already know lies a treasure trove of under appreciated packages waiting to be wielded. Being the saint I am, I’ve scoured the depths of pypi.org to find some underrated and hopefully useful packages to make your life a little easier.

Continue reading

Working with PDB Structures in Pandas

Pandas is one of my favourite data analysis tools working in Python! The data frames offer a lot of power and organization to any data analysis task. Here at OPIG we work with a lot of protein structure data coming from PDB files. In the following article I will go through an example of how I use pandas data frames to analyze PDB data.

Continue reading

Taking Equivariance in deep learning for a spin?

I recently went to Sheh Zaidi‘s brilliant introduction to Equivariance and Spherical Harmonics and I thought it would be useful to cement my understanding of it with a practical example. In this blog post I’m going to start with serotonin in two coordinate frames, and build a small equivariant neural network that featurises it.

Continue reading

Finding and testing a reaction SMARTS pattern for any reaction

Have you ever needed to find a reaction SMARTS pattern for a certain reaction but don’t have it already written out? Do you have a reaction SMARTS pattern but need to test it on a set of reactants and products to make sure it transforms them correctly and doesn’t allow for odd reactants to work? I recently did and I spent some time developing functions that can:

  1. Generate a reaction SMARTS for a reaction given two reactants, a product, and a reaction name.
  2. Check the reaction SMARTS on a list of reactants and products that have the same reaction name.
Continue reading

Let your library design blosum

During the lead optimisation stage of the drug discovery pipeline, we might wish to make mutations to an initially identified binding antibody to improve properties such as developability, immunogenicity, and affinity.

There are many ways we could go about suggesting these mutations including using Large Language Models e.g. ESM and AbLang, or Inverse Folding methods e.g. ProteinMPNN and AntiFold. However, some of our recent work (soon to be pre-printed) has shown that classical non-Machine Learning approaches, such as BLOSUM, could also be worth considering at this stage.

Continue reading