On the Joys of vim-like Browsing

Reflections on Pointlessness

One of the great delights in this life is pointless optimisation. Point-ful optimisation has its place of course; it is right and proper and sensible, and, well, useful, and it also does, when first achieved, yield considerable satisfaction. But I have found I soon adjust to the newly more efficient (and equally drab) normality, and so the spell fades quickly.

Not so with pointless optimisation. Pointless optimisation, once attained, is a preternaturally persistent source of joy that keeps on giving indefinitely. Particularly if it involves acquiring a skill of some description; if the task optimised is frequent; and if the time so saved could not possibly compensate for the time and effort sunk into the optimisation process. Words cannot convey the triumph of completing a common task with hard-earned skill and effortless efficiency, knowing full-well it makes no difference whatsoever in the grand scheme of things.

Continue reading

Attention Is All You Need – A Moral Case

It turns out that giving neural networks attention gives you some pretty amazing results. The attention mechanism allowed neural language models to ingest vast amounts of data in a highly parallelised manner, efficiently learning what to pay the most attention to in a contextually aware manner. This computational breakthrough launched the LLM-powered AI revolution we’re living through. But what if attention isn’t just a computational trick? What if the same principle that allows transformers to focus on what matters from a sea of information also lies at the heart of consciousness, perception, and even morality itself? (Ok, maybe this is a bit of a stretch, but hear me out.)

To understand the connection, we need to look at how perception really works. Modern neuroscience reveals that experience is fundamentally subjective and generative. We’re not passive receivers of objective reality through our senses, we’re active constructors of our own experience. According to predictive processing theory, our minds constantly generate models of reality, and our sensory input is then used to provide an ‘error’ of these predictions. But the extraordinary point here is that we never ‘see’ these sensory inputs, only our mind’s best guess of how the world should be, updated by sensory feedback. As consciousness researcher Anil Seth puts it “Reality is a controlled hallucination… an action-oriented construction, rather than passive registration of an objective external reality”, or in the words of Anaïs Nin, half a century earlier, “We do not see things as they are, we see things as we are.”

Continue reading

Eye of the World by Robert Jordan: A Concise Review

I was recently devastated to hear that Amazon Prime has cancelled the Wheel of Time TV Show, a fantasy epic based on the novels of Robert Jordan. I recently binge-watched the entire show and found it to improve throughout, with the third and most recent season being the best.

In my grief, I turned to something dark – reading the books instead.

I have recently finished the first book (of 12) and thought I would give my thoughts on the story and the storytelling of Jordan as a concise book review so I can get my final Blopig out of the way.

Continue reading

ChatGPT can now use RDKit!

All chemistry LLM enthusiasts were treated to a pleasant surprise on Friday when Greg Brockman tweeted that ChatGPT now has access to RDKit. I’ve spent a few hours playing with the updated models and I have summarized some of my findings in this blog.

Continue reading

Estimating uncertainty in MD observables using block averaging

When running molecular dynamics (MD) simulations, we are usually interested in measuring an ensemble average of some metric (e.g., RMSD, RMSF, radius of gyration, …) and use this to draw conclusions about the investigated system. While calculating the average value of a metric is straightforward (we can simply measure the metric in each frame and average it) calculating a statistical uncertainty is a little more tricky and often forgotten. The main challange when trying to calculate an uncertainty of MD oveservables is that individual frames of the simulation are not samped independently but they are time correlated (i.e., frame N depends on frame N-1). In this blog post, I will breifly introduce block averaging, a statistical technique to estimate uncertainty in correlated data.

Continue reading

Memory Efficient Clustering of Large Protein Trajectory Ensembles

Molecular dynamics simulations have grown increasingly ambitious, with researchers routinely generating trajectories containing hundreds of thousands or even millions of frames. While this wealth of data offers unprecedented insights into protein dynamics, it also presents a formidable computational challenge: how do you extract meaningful conformational clusters from datasets that can easily exceed available system memory?

Traditional approaches to trajectory clustering often stumble when faced with large ensembles. Loading all pairwise distances into memory simultaneously can quickly consume tens or hundreds of gigabytes of RAM, while conventional PCA implementations require the entire dataset to fit in memory before decomposition can begin. For many researchers, this means either downsampling their precious simulation data or investing in expensive high-memory computing resources.

The solution lies in recognizing that we don’t actually need to hold all our data in memory simultaneously. By leveraging incremental algorithms and smart memory management, we can perform sophisticated dimensionality reduction and clustering on arbitrarily large trajectory datasets using modest computational resources. Let’s explore how three key strategies—incremental PCA, mini-batch clustering, and intelligent memory management—can transform your approach to analyzing large protein ensembles.

Continue reading

Open Source Pharma: From Idealism to Pragmatic Solutions

In an industry dominated by patents, proprietary data, and the race to get a first-in-class drug, the concept of open source drug development once seemed like an impossible dream. Yet as traditional pharma continues to leave many global health needs unaddressed—particularly for diseases affecting low and middle income countries1,2—the open source model has evolved from idealistic theory to pragmatic reality. In this post, I’ll lead us through how open source drug development has overcome key obstacles of funding and intellectual property (IP) management to deliver real-world solutions.

Continue reading

An insight into mega-conferences – attending ESCMID Global 2025

I suppose it really hit me when the Viennese border control officer asked, “Ah, you must be here for the conference?” That’s when I realised: this wasn’t just any event. ESCMID Global isn’t your average gathering of lab coat enthusiasts, but rather one of the largest clinical infectious disease conferences in the world. Over 16,000 attendees packed into Vienna for their 35th annual congress.

So, was flying across Europe to attend the Glastonbury of conferences, minus the mud, plus the microbes, worth it?

Well… it depends on what you’re hoping to get out of it. If you’re an academic, you might find that a lot of the sessions lean heavily towards the clinical side of things. On the plus side, it made it easier to narrow down my schedule – with over a dozen sessions happening at any one time, a bit of decisiveness goes a long way. Personally, I found the big-name, high-level keynotes and annual updates from organisations like EUCAST the most accessible and informative.

Continue reading

Attending LMRL @ ICLR 2025

I recently attended the Learning Meaningful Representations of Life (LMRL) workshop at ICLR 2025. The goal of LMRL is to highlight machine learning methods which extract meaningful or useful properties from unstructured biological data, with an eye towards building a virtual cell. I presented my paper which demonstrates how standard Transformers can learn to meaningfully represent 3D coordinates when trained on protein structures. Each paper submitted to LMRL had to include a “meaningfulness statement” – a short description of how the work presents a meaningful representation.

Continue reading

Interested in Research Software Engineering? We’re hiring!

I’ve been working in OPIG as a Research Software Engineer for several years now, and it’s been a fantastic experience. Sadly, my time here is coming to an end, which means we’re looking to hire a new Research Software Engineer to take over! As a computational research group, we write a lot of scientific software and strive to ensure everything we do is open-source and as accessible as possible to both academic and industrial users. Many of these tools are still in use long after the people who wrote them have left the group, and are actively maintained to ensure they remain useful to researchers. Supporting the development and deployment of all of these tools are our Research Software Engineers. This is a great opportunity to work at the intersection of academia and industry, where you will be able to both contribute to world-leading research and maximise the impact of that research by ensuring the tools produced are both accessible and sustainable.

This is a full-time, permanent position in OPIG, based in the Department of Statistics at the University of Oxford. For more details, or to apply, you can find the job details here.