Demystifying the thermodynamics of ligand binding

Chemoinformatics uses a curious jumble of terms from thermodynamics, wet-lab techniques and statistical terminology, which is at its most jarring, it could be argued, in machine learning. In some datasets one often sees pIC50, pEC50, pKi and pKD, in discussion sections a medchemist may talk casually of entropy, whereas in the world of molecular mechanics everything is internal energy. Herein I hope to address some common misconceptions and unify these concepts.

Continue reading →

What the heck are TPUs?

I recently became curious about TPUs, a specialised hardware for training Machine- and Deep-Learning models, where TPU stands for Tensor Processing Unit. This fancy chip can provide very high gains for anyone aiming to perform really massive parallelisation of AI tasks such as training, fine-tuning, and inference.

In this blog post, I will touch on what a TPU is, why it could be useful for AI applications when compared to GPUs and briefly discuss associated opportunity costs.

What’s a TPU?

Continue reading →

OPIG: A decade of Scientific Shenanigans. What’s changed?

2013 was a big year: Andy Murray clinched the Wimbledon title, NASA’s Curiosity Rover discovered water-bearing minerals on Mars, and ‘twerk’ and ‘selfie’ made their way into the dictionary, something equally significant happened—the birth of BLOPIG.com. Intrigued by how the group has changed over the last decade, I started on a journey to unearth the some of the publications from then till now. Questioning their focus, methods, and evolution of the groups’ research over the past decade. This blog post is what I found.

While delving into each publication of the past decade genuinely seemed like an interesting idea, the imminent threat to my PhD progress forced me to adopt the most 2023-appropriate approach: outsourcing the task to AI. After collecting abstracts from all the group’s papers, I enlisted the help of everyone’s’ favourite hallucinator to summarise the works and (hopefully) highlight the shifts in their research.

So after a relatively long, sequence of prompts, this is (apparently) what we do?

Continue reading →

Deploying a Flask app part II: using an Apache reverse proxy

I recently wrote about serving a Flask web application on localhost using gunicorn. This is sufficient to get an app up and running locally using a production-ready WSGI server, but we still need to add a HTTP proxy server in front to securely handle HTTP requests coming from external clients. Here we’ll cover configuring a simple reverse proxy using the Apache web server, though of course you could do the same with another HTTP server such as nginx.

Continue reading →

Understanding GPU parallelization in deep learning

Deep learning has proven to be the season’s favourite for biology: every other week, an interesting biological problem is solved by clever application of neural networks. Yet, as more challenges get cracked, modern research shifts more and more in the direction of larger models — meaning that increasing computational resources are required for training. Unsurprisingly, NVIDIA, the main manufacturer of GPUs, experienced a significant jump in their stock price earlier this year.

Access to compute is not enough to train good neural networks. As soon as multiple cards enter into play, researchers need to use a completely different paradigm where data and model weights are distributed across different devices — and sometimes even different computers. Though these tools start to be crucial for successful computational biology research, they are generally unknown to researchers. Hence, in this blogpost, I would like to provide a really brief introduction to multi-GPU training.

Continue reading →

Understanding positional encoding in Transformers

Transformers are a very popular architecture in machine learning. While they were first introduced in natural language processing, they have been applied to many fields such as protein folding and design.
Transformers were first introduced in the excellent paper Attention is all you need by Vaswani et al. The paper describes the key elements, including multiheaded attention, and how they come together to create a sequence to sequence model for language translation. The key advance in Attention is all you need is the replacement of all recurrent layers with pure attention + fully connected blocks. Attention is very efficeint to compute and allows for fast comparisons over long distances within a sequence.
One issue, however, is that attention does not natively include a notion of position within a sequence. This means that all tokens could be scrambled and would produce the same result. To overcome this, one can explicitely add a positional encoding to each token. Ideally, such a positional encoding should reflect the relative distance between tokens when computing the query/key comparison such that closer tokens are attended to more than futher tokens. In Attention is all you need, Vaswani et al. propose the slightly mysterious sinusoidal positional encodings which are simply added to the token embeddings:

Continue reading →

Conference feedback: AI in Chemistry 2023

Last month, a drift of OPIGlets attended the royal society of chemistry’s annual AI in chemistry conference. Co-organised by the group’s very own Garrett Morris and hosted in Churchill College, Cambridge, during a heatwave (!), the two days of conference featured aspects of artificial intelligence and deep machine learning methods to applications in chemistry. The programme included a mixture of keynote talks, panel discussion, oral presentations, flash presentations, posters and opportunities for open debate, networking and discussion amongst participants from academia and industry alike.

Continue reading →

Antibody Engineering & Therapeutics Europe 2024

Back in June this year, I went to Amsterdam to give a talk at “Antibody Engineering & Therapeutics Europe 2024”. I had a great time at the conference, and it presented many opportunities to gain some insights into research that is directly relevant to me, as well as research to broaden my horizons a little beyond the CDR loops. While I would love to go through all the fantastic talks, I’m opting to give some takeaways on only a subset:

Continue reading →

SSH, the boss-fight level: Jupyter notebooks from compute nodes

Secure shell (SSH) is an essential tool for remote operations. However, not everything with it is smooth-sailing. Especially, when you want to do things like reverse–port-forwarding via a proxy-hump or two a Jupyter notebook to your local machine from a compute node on a no-home container . Even if it sounds less plausible than the exploits on Mr Robot, it actually can work and requires zero social-engineering or sneaking in server rooms to install Raspberry Pis while using a baseball cap as a disguise.

Continue reading →

Conference feedback — with a difference

At OPIG Group Meetings, it’s customary to give “Conference Feedback” whenever any of us has recently attended a conference. Typically, people highlight the most interesting talks—either to them or others in the group.

Having just returned from the 6th RSC-BMCS / RSC-CICAG AI in Chemistry Symposium, it was my turn last week. But instead of the usual perspective—of an attendee—I spoke briefly about how to organize a conference.

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Demystifying the thermodynamics of ligand binding

What the heck are TPUs?

What’s a TPU?

OPIG: A decade of Scientific Shenanigans. What’s changed?

Deploying a Flask app part II: using an Apache reverse proxy

Understanding GPU parallelization in deep learning

Understanding positional encoding in Transformers

Conference feedback: AI in Chemistry 2023

Antibody Engineering & Therapeutics Europe 2024

SSH, the boss-fight level: Jupyter notebooks from compute nodes

Conference feedback — with a difference