How Unusual Is Your Generated Molecule? Let The CCDC Tell You

In this post I’ll walk through how to set up the CCDC Python API and use the CSD Geometry Analyser to evaluate the geometric quality of molecules from three representative structure-based de novo design models. I’ve put together a small GitHub repo with the full analysis code where we look at bond lengths, angles, torsions, and ring conformations across the three methods, and compare these against their PoseBusters validity scores to see what each metric is really capturing.

Continue reading

Peering Inside the Black Box: A Beginner’s Introduction to Mechanistic Interpretability

Over the last few years, large language models (LLMs) have gone from being curiosities tucked away in research labs to something most of us interact with on a daily basis; whether for drafting emails, debugging code, or simply pondering the meaning of life at 2am. And yet, for all our reliance on these systems, a rather inconvenient truth lingers in the background: nobody, not even the people who built them, can fully explain what is going on inside.

This is where mechanistic interpretability comes in.

In essence, mechanistic interpretability is the approach of explaining complex machine learning systems through the behaviour of their functional units (Kästner and Crook, 2024) by reverse-engineering them into their more elementary computations (Rai et al., 2025). The aim is not simply to know that a model gives the right answer, but to pull apart the underlying machinery and uncover the causal relationships between input and output. Think of it as neuroscience for neural networks, except we can read every neuron at any moment, rewind, replay, and intervene mid-thought.

Continue reading

A timeline of sampling methods of diffusion models

When approaching the methods used in de-novo protein design, one is quickly confronted with a plethora of overlapping formulations of what looks superficially like “the same thing”. One paper trains an ϵ\boldsymbol{\epsilon}-prediction network with a simple MSE loss; another trains a score network with a stochastic-differential-equation justification; a third trains a clean-data predictor under yet another schedule. Each formulation carries its own notation, its own variance schedule, and its own sampler. Qualitatively, this zoo of formulations is doing the same thing: it starts from some unstructured noise and iteratively refines it to eventually produce a protein structure similar (but different!) to other proteins we have experimentally determined in the past. What is not immediately obvious to a newcomer is that all of these formulations are historical descendants of a small number of foundational ideas, and that essentially every architectural and algorithmic decision in a modern protein-design diffusion model has a specific paper of origin and a specific motivation for being there.

This post is my attempt to put these formulations onto a single timeline. I trace the trajectory of the field through four foundational works: DDPM (Ho et al., 2020), DDIM (Song et al., 2021a), the score-based SDE unification (Song et al., 2021b), and EDM (Karras et al., 2022), explaining at each step what specific problem with the previous formulation the next paper was attacking and how the new formulation generalises or simplifies the old one. The goal is coherent motivation rather than exhaustive coverage; the reader interested in implementation details is referred to the original papers and the references at the end.

Continue reading

Spin Lattices and Proteins – How state-based discretisations have enabled modern protein modelling

I got into protein modelling not long before AlphaFold2 first released. At that time some of the prevailing methods for protein structure prediction came from highly interpretable energy functionals that arose from a particularly beautiful intersection of statistical mechanics and biology. These “Potts” models are going to be the centre of a larger discussion in this blog on state-based discretisations of proteins, how they’ve shaped modern deep learning methods and whether there is still more to learn from them.

In the age of black box deep learning, does the Potts model still have a place?

The Potts/Ising Model

The Ising model is a well established popular theoretical physics model of ferromagnetism. Simply put, given a lattice of atoms each capable of adopting 1 of 2 spins (up and down) ferromagnetism arises when their spins align and their associated magnetic moments point in the same direction. The Ising model tries to parameterise the local and non-local relationships between atoms and their spin states such that we can learn the Hamiltonian of the system and its different configurations under the magnetic field. The Hamiltonian takes the following form for a system of N atoms


$$
E = -\sum_{i}^Nh_ix_i – \sum_{i<j}^N J_{ij}x_i x_j,
$$

where J is the “coupling energy” between any two atoms x_i and x_j, and h represents the magnetic field, or more appropriately for our purposes it can be framed as a single-site field dictating how an individual atom independently acts within the model. You might recognise the form this binary spin model takes as it arises naturally across the sciences including in Hopfield networks and graphical models.

Everything is an Ising-like model if you’re brave enough

Continue reading

Will TurboQuant save us from the RAM apocalypse?

The LLM boom is causing a global shortage of the very same computer memory it needs to sustain itself. Reports suggest OpenAI’s Stargate project alone could consume up to 40% of global DRAM output. Frontier labs like Google DeepMind need to make their models more memory-efficient.


One such technique is TurboQuant, released by Google. TurboQuant is an example of an online “quantisation” method. LLMs represent information using large tensors of numerical values, where each number typically uses 64 or 32 bits. However, many values do not require full numerical precision, so we can “round” them using fewer bits and less memory. We can see this in the example below:

The rounded value now requires 4x less memory. Source

Some quantisation methods are applied offline before inference begins. TurboQuant is ‘online’ because it compresses the KV cache dynamically during inference.

Continue reading

The Open Immune Window: Notes on Sweaty Workouts and Vanishing Immune Cells

Here is a question for you: is an intense, sweaty workout in the gym building up your immune health, or is it just opening a window of opportunity for a pathogen to ruin your week? To understand this, we first have to look at energy. The immune system is incredibly energy-hungry, constantly patrolling and repairing the body. When you exercise hard, your body is forced into a rapid game of resource allocation, diverting precious energy away from baseline functions to fuel your contracting muscles.

This brings us to a rather scary observation in sports science that I stumbled on one day reading random headlines. If you draw blood one to two hours after a hard run or heavy exertion, your immune cell count (specifically lymphocytes) absolutely plummets. Apparently for decades, scientists looked at this massive drop in the blood and concluded that our immune system temporarily crashed after exercise, leaving an “open window” of 3 to 72 hours where we were highly vulnerable to infections. Which leads us back to the main question – is a hard workout actually making you sick?

Thankfully, no. It turns out those missing immune cells didn’t just die off. Driven by the acute spike in adrenaline from your workout, those cells rapidly exit your bloodstream and migrate directly into peripheral tissues, specifically mucosal barriers like your lungs and gut. Think about it: during a hard workout, you are hyperventilating and exposing your airway to massive amounts of external air. Your body isn’t suppressing its defenses; it’s actively deploying its best troops exactly where a pathogen is most likely to enter. It is a state of heightened immune surveillance, not suppression.

So why do athletes often get the sniffles after a big race? Often, it is just non-infectious airway inflammation from heavy breathing, combined with the psychological stress and lack of sleep that accompany big events. Your workout actually acts as a natural immune adjuvant, making you more resilient. If you want to dive deeper into this topic, I highly recommend checking out the paper Debunking the Myth of Exercise-Induced Immune Suppression by Campbell and Turner (Frontiers in Immunology, 2018).

Revealing Nature’s Quantum Compass – Kickoff Day

Yesterday marked the kickoff for the BBSRC’s funded Strategic Longer and Larger (sLoLa) scheme “Revealing Nature’s Quantum Compass”1. The sLoLa grants are a laudable endeavor by the UK government to fund “ambitious research projects that will deepen our understanding of life’s most fundamental processes”. It is wonderful to see the UK government taking seriously the importance of blue sky basic research, appreciating that asking deep questions is what drives scientific progress, often leading to unexpected breakthroughs with application down the line.

At the kickoff event, principal investigators presented on what their research can bring to the table. Much like entering a bakery2 where everything smells delicious and it seems impossible to choose, an overwhelming range of experimental and computational techniques were presented, each bringing to bear their own unique approach to tackling the outstanding problem: mechanistically, how is that birds (and other animals) can navigate distances up to thousands of kilometers using the Earth’s magnetic field. Alongside this, my own group is interested in how we can develop biotechnologies that take advantage of magnetic field sensitive biochemistry, which has a host of applications near and long term.

The challenge of linking the biochemistry of a single protein known to be magnetic field sensitive to a behavioral phenotype will require a highly interdisciplinary approach, and excitingly for this community, machine learning is being involved from the start. Prof. Degiacomi, a member of the core team, presented how his lab is developing ML techniques to reduce the computational burden of linking experimental results to protein dynamics informed by molecular dynamics simulation. On the flip-side, I hope such techniques will develop into methods we can use for design. Similar to enzymes, the proteins we are interested have a function depending on mechanisms far more complex than only structure and binding (not to trivialize either of these!). Magnetic field sensing in this context depends on creating an environment in which quantum entanglement can exist, and being able to transduce the state of this quantum entanglement into into a biological signal – thus far this second step in particular has remained highly elusive.

Ultimately, the day concluded with much enthusiasm and excitement for all that is to come. Watch this space!

  1. https://www.ox.ac.uk/news/2025-11-19-new-project-aims-reveal-nature-s-quantum-compass ↩︎
  2. Yes, I just returned from a symposium in Germany ↩︎

Three Resources I Keep Coming Back to for Learning Deep Learning

There is no shortage of AI content online, but over time I have found myself returning to the same handful of resources again, and I wanted to share the three that have helped me the most.

AI Summer

This one I would recommend to anyone who is earlier in their journey. AI Summer at theaisummer.com is a free platform run by Sergios Karagiannakos and Nikolas Adaloglou, and it covers everything from the basics of neural networks through to building and deploying real ML systems. The tone is friendly and practical, and there are proper code examples throughout. It is one of those rare resources that manages to be beginner-friendly without feeling watered down.

Continue reading

What I wish I knew before applying and moving to Oxford from the US

The first time I ever visited the UK was when I moved to Oxford for my PhD (or DPhil in Oxford speak). I was nervous, excited, and thought I could assimilate easily after growing up watching Sherlock, Midsomer Murders, and Doc Martin. After all, my native language is English, how different really is the UK? Oh, how wrong I was.

Continue reading

A Golden Age of Nanomedicine

As someone who spent their entire academic career, from B.Sc. to M.Sc. to Ph.D., within a Kavli Institute for Nanoscience Discovery (first in Delft and now in Oxford), I’ve had the privilege of seeing firsthand just how beautifully intricate the nanoscale world can be. Now, as my research focuses on lipid nanoparticles for genetic therapeutics and vaccines, I would like to use this platform to advocate for what I believe is one of the most transformative frontiers in modern medicine: the rational design of nanomaterials for therapeutic delivery.

Continue reading