Conformational Diversity in Proteins, Revisited

A while ago I blogged about CoDNaS, the Conformational Diversity of the Native State protein conformation database (Monzon et al., 2013). It’s worth revisiting to highlight more recent developments.

Continue reading →

Conference summary: Generative AI in Life Science

This year I attended the second edition of Generative AI in Life Science (GenLife – https://genlife.dk/) and it was an enriching experience that I thoroughly enjoyed. Held in Copenhagen, the event brought together researchers from different areas of AI applied to the life sciences and provided a fantastic platform for networking, learning and sharing ideas. The programme included a mix of long and short talks from experts in the field, but also had a significant presence of emerging PIs, making the conference a perfect place to discover emerging groups in the field. Here I have collected some highlights of the talks I have enjoyed the most at the conference.

Continue reading →

My take on the Collaborations Workshop (CW) 2024

At the end of April, I attended the CW 2024. This yearly hybrid event organised by the Software Sustainability Institute (SSI) has been running since 2011! The event brings people together to discuss best practices and the future of software in research. This year’s event themes were (1) AI/ML tools for Science, (2) Citizen Science and (3) Environmental sustainability.

As a Research Software Engineer (RSE) working with OPIG, I felt a great curiosity to attend and find out what I could bring of use to the group, as most people work on AI/ML applications. In this blog post, I share a few bits of the event which resonated with me and I found most interesting and relevant to share with my group.

Continue reading →

Interactive visualization of protein–ligand complexes with Py3Dmol

I recently had a problem where I wanted to provide an interactive visualization of multiple different protein–ligand complexes, requiring minimal setup by the user, allowing them to zoom in and out and change the visualization style, without just providing multiple PDB files or a PyMOL session.

Continue reading →

Comparing pose and affinity prediction methods for follow-up designs from fragments

In any task in the realm of virtual screening, there need to be many filters applied to a dataset of ligands to downselect the ‘best’ ones on a number of parameters to produce a manageable size. One popular filter is if a compound has a physical pose and good affinity as predicted by tools such as docking or energy minimisation. In my pipeline for downselecting elaborations of compounds proposed as fragment follow-ups, I calculate the pose and ΔΔG by energy minimizing the ligand with atom restraints to matching atoms in the fragment inspiration. I either use RDKit using its MMFF94 forcefield or PyRosetta using its ref2015 scorefunction, all made possible by the lovely tool Fragmenstein.

With RDKit as the minimizer the protein neighborhood around the ligand is fixed and placements take on average 21s whereas with PyRosetta placements, they take on average 238s (and I can run placements in parallel luckily). I would ideally like to use RDKit as the placement method since it is so fast and I would like to perform 500K within a few days but, I wanted to confirm that RDKit is ‘good enough’ compared to the slightly more rigorous tool PyRosetta (it allows residues to relax and samples more conformations with the longer runtime I think).

Continue reading →

Fine-tune generated molecular poses with a force field

Some molecular pose generation methods benefit from an energy relaxation post-processing step.

Predicted pose before energy minimization — Example of a small molecule pose before and after energy minimization. The pose before minimization is shown in white, the optimized prediction is shown in pink, and a crystal pose is shown as reference in light blue. Note how the aromatic rings are flattened and the leftmost bond is shortened by the optimization.

Here is a quick way to do this using OpenMM via a short script I prepared:

Continue reading →

Organise Your ML Projects With Hydra

One of the most annoying parts of ML research is keeping track of all the various different experiments you’re running – quickly changing and keeping track of changes to your model, data or hyper-parameters can turn into an organisational nightmare. I’m normally a fan of avoiding too many different libraries/frameworks as they often break down if you to do anything even a little bit custom and days are often wasted trying to adapt yourself to a new framework or adapt the framework to you. However, my last codebase ended up straying pretty far into the chaotic side of things so I thought it might be worth trying something else out for my next project. In my quest to instil a bit more order, I’ve started using Hydra, which strikes a nice balance between giving you more structure to organise a project, while not rigidly insisting on it, and I’d highly recommend checking it out yourself.

Continue reading →

Environmentally sustainable computing

Did you know that it is approximated that you, a scientist, have a carbon footprint which is between 2 and 12 times higher than the set carbon budget per person to keep global warming below 1.5 °C [1]?

Background

Global temperatures are rising. This has direct effects on the planet and contributes to increasing humanitarian emergencies. These include more frequent and intense heatwaves, wildfires, and floods [2]. The impact of climate change is already severe, with around 20 million internal displaced persons in 2023 alone due to those disasters [3].

Global warming and climate change are caused by the emissions of carbon dioxide and methane, known as carbon emissions. There are different ways in which you could minimise your carbon footprint. For example, I try to reduce the energy usage in the house, try eating mainly plant-based, and travel by train instead of by plane to family and for holidays and conferences. However, up until organising a Green Lecture with the Department of Statistics Green Team I never thought of my computational PhD as a major contributor to my carbon footprint. That doesn’t mean the work I, and all other scientists, do is not important and necessary. But the lecture on principles for environmentally sustainable research given by Loic Lannelongue made me aware of carbon costs of computing, which I would like to share with you.

Continue reading →

The War of the Roses: Tea Edition

Picture the following: the year is 1923, and it’s a sunny afternoon at a posh garden party in Cambridge. Among the polite chatter, one Muriel Bristol—a psychologist studying the mechanisms by which algae acquire nutrients—mentions she has a preference for tea poured over milk, as opposed to milk poured over tea. In a classic example of women not being able to express even the most insignificant preference without an opinionated man telling them they’re wrong, Ronald A. Fisher, a local statistician (later turned eugenicist who dismissed the notion of smoking cigarettes being dangerous as ‘propaganda’, mind you) decides to put her claim to the test with an experiment. Bristol is given eight cups of tea and asked to classify them as milk first or tea first. Luckily, she correctly identifies all eight of them, and gets to happily continue about her life (presumably until the next time she dares mention a similarly outrageous and consequential opinion like a preferred toothpaste brand or a favourite method for filing papers). Fisher, on the other hand, is incentivized to develop Fisher’s exact test, a statistical significance test used in the analysis of contingency tables.

Continue reading →

Paths that you need to know for compiling

Compiling and running applications on Linux involves more than just writing code. Developers must also understand the intricacies of environment variables and command-line tools that dictate where compilers and runtime environments look for necessary files. In this post, we will cover some of them.

Default Search Paths

Header Files: Compilers like gcc and g++ typically look for header files in standard directories such as /usr/include or /usr/local/include. These are the places where most system and third-party libraries install their header files.
Libraries: For libraries, the linker (ld) searches in directories like /usr/lib, /usr/local/lib, and sometimes in more specific directories that depend on the machine’s architecture (like /usr/lib/x86_64-linux-gnu on 64-bit systems).

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Conformational Diversity in Proteins, Revisited

Conference summary: Generative AI in Life Science

My take on the Collaborations Workshop (CW) 2024

Interactive visualization of protein–ligand complexes with Py3Dmol

Comparing pose and affinity prediction methods for follow-up designs from fragments

Fine-tune generated molecular poses with a force field

Organise Your ML Projects With Hydra

Environmentally sustainable computing

Background

The War of the Roses: Tea Edition

Paths that you need to know for compiling

Default Search Paths