With the recent release of ChatGPT, many studies have already been uploaded to biorxiv examining the potential uses of the chatbot’s outputs. One such paper compared ChatGPT-generated scientific abstracts to the original abstracts. Upon seeing the title, I immediately got my hopes up that my abstract-writing days were over. So is this the case?
Continue readingSAbBox in 2023: ImmuneBuilder and more!
For several years now, we have distributed the SAbDab database and SAbPred tools as a virtual machine, SAbBox, via Oxford University Innovation. This virtual machine allows a user to utilise the tools and database locally, allowing for high-throughput analysis and keeping confidential data within a local network. Initially distributed under a commercial licence, the platform proved popular and, in 2020, we introduced a free academic licence to enable our academic colleagues to use our tools and database locally.
Following requests from users, in 2021 we released a new version of the platform packaged as a Singularity container. This included all of the features of SAbBox, allowing Linux users to take advantage of the near bare-metal performance of Singularity when running SAbPred tools. Over the past year, we have made lots of improvements to both SAbBox platforms, and have more work planned for the coming year. I’ll briefly outline these developments below.
Continue readingThe Boltzmann Distribution and Gender Stereotypes
Journalist Caitlin Moran recently tweeted the following:
“I feel like every day now, I read/hear something saying “We don’t talk about what’s POSITIVE about masculinity; what’s GOOD about men and boys.” So: what IS the best stuff about boys, and men? Honest, celebratory question.”
What followed was a collection of replies acknowledging and celebrating various traits seen typically as ‘male’, including certain activities, such as knowing about sports or cars, or a desire to do DIY type work, and characteristics such as physical strength, no-nonsense attitudes and a ‘less complicated’ style of friendship between men.
Whilst I condone Moran’s efforts to turn recent discussions surrounding masculinity on their head and frame it in a positive light, to me the the responses offered and discussion that followed felt somewhat stifling. I am biologically male and identify as male, but do not feel like I personally adhere to most of these stereotypes. I am not physically strong, I know very little about cars and sports, and find there be just as much nuance and drama in male-male friendships as there is in friendships between other genders.
Continue readingThree months in industry and returning to my PhD
Being in my third year of my DPhil, I decided that I should try to see what the world of industry looks like. Thankfully, I was lucky enough to be able to complete an industrial internship at Exscientia here in Oxford where I spent most of my time on scientific software engineering. I expected this to not be too different from what my work looks like here at OPIG, but quickly came to realise that this is not the case. What followed were three months of building a software package, getting to know all the new people around me, and getting used to all the new tools and infrastructure. Below are a few things that I am very happy to take back with me.
Read more: Three months in industry and returning to my PhDA PhD can often be very solitary work. You are the expert in your project, and also have the highest stake in your project. At times, the freedom of what to explore next this affords is fantastic, but can make things difficult when problems arise. In industry, projects are a lot more collaborative. Your work direction will be aligned heavily with company needs, and depending on company size there might be specialised teams to support you in specific aspects of the project.
Emphasis placed on code quality is also a stark difference. Internal software written for company use has to be readable and well-documented. The codebase must also be standardised to maintain consistency. This will make life for newcomers easy and ensures that if the author of some software leaves the company the next person can easily take on the task of maintaining their code. Here, academia is catching up. Scientific software engineering is becoming more focused on maintainability (one of the core values of the SABS programme), but sadly Github is still full of legacy code that was written in ways that make maintaining the code difficult after the author stops being involved with it.
Lastly, on a more personal note, it was also fantastic to be surrounded by people in a team who work with the same techniques as me. In my PhD, I am one of two people at OPIG regularly using molecular dynamics simulations but I also spend a lot of my time working in the Biochemistry Department with the Higgins Group which is an experimental structural biology group. This being the case, my internship has been a fantastic way of picking up some additional techniques from people who are already familiar with them. I would highly recommend giving yourself the opportunity to do this if possible, either via something like an industrial internship, or a research visit to a collaborating academic group.
The past three months have been invaluable. They have given me the opportunity to see what industry is like and given me experience with new skills that I can take back to my PhD. Best of all, I got to meet a fantastic team who were always ready to take time out of their days to help and who made the time I spent at Exscientia as fun as it was!
Quality Stats
Disclaimer – the title is a Quality Street pun only and bears no relation to the quality of the data or analysis presented below. This whole blog post is basically to discredit the personal chocolate preferences of a group member who shall remain nameless. Safe to say though, they Vostly overestimated people’s love for the Toffee Finger. Long live the Orange Creme.

The exotic zoo of antibodies
When I think of antibodies, I usually think of the standard human Y-shaped IgG. It is easy to forget that the world of antibodies is extremely diverse, both in the constant domain, with many different isotypes (i.e. IgA, IgD, IgE, and IgM), and in the variable domain (i.e. with or without a light chain and CDR lengths). This is before we even start looking at engineered antibodies, like the ones illustrated in a previous blog post by Alissa.
Of the many different antibodies, in this blog post, I want to highlight some of the exotic naturally occurring antibodies which might not have gotten much attention yet, but which each have interesting features.
The standard antibody (i.e. humans, mouse)
This is the standard antibody which we will compare with. A protein complex of two paired heavy and light chains forming the well-known Y shape. At the tips, a binding site that consists mainly of the three CDR’s on each chain. Nice and simple.

Interesting facts:
Continue readingDoes ChatGPT know how to translate images?
Yesterday I spent a couple of hours playing with ChatGPT. I know, we have some other recent posts about it. It’s so amazing that I couldn’t resist writing another. Apologies for that.
The goal of this post is to determine if I can effectively use ChatGPT as a programmer/mathematician assistant. OK. It was not my original intention, but let’s pretend it was, just to make this post more interesting.
So, I started asking a few very simple programming answers like the following:
Can you implement a function to compute the factorial of a number using a cache? Use python.
And this is what I got.

A clear and efficient implementation of the factorial. This is the kind of answer you would expect from a first year CS student.
Continue readingTwo useful modules to help you find the best ML model for your task
FLAML and LazyPredict are two packages designed to quickly train and test machine learning models from scikit-learn so that you can determine which is the best type of model for learning from your data.
Continue readingFestival of Biologics 2022 – November 2-4 Basel, Switzerland
In November I attended the Festival of Biologics (FoB) 2022 conference in Basel, Switzerland. Originally a set of different conferences (now called agendas) that has merged into a single conference, FoB focuses on anything related to biologics. One of the agendas is an antibody specific agenda, derived from the former European Antibody Congress. This year the antibodies agenda had more than 100 talks across multiple tracks, covering many different aspects of using antibodies as therapeutics, making it an exciting conference for an antibody enthusiast. However, while FoB does include talks on machine learning and bioinformatics, most are focused solely on experimental work. Another drawback is that the majority of the talks are by industry, with the few academic speakers almost all also representing a company. This meant that of the few talks about computational methods and tools for protein design, most felt more like a commercial rather than a research presentation. Nonetheless, FoB is still an interesting conference to attend when you are working on applied research for antibody therapeutics. It is an amazing opportunity to hear about which antibody specific problems companies are trying to overcome, which are deemed solved and which are the future problems to solve.
Continue readingBad chemistry in old protein-ligand binding complex data set
The Astex Diverse set [1] is a dataset containing the crystallized poses of 85 protein-ligand complexes. It was introduced in 2007 to address problems in previous datasets such as incorrect ligand representation.
Loading the 85 ligand files with today’s version of the cheminformatics toolkit RDKit [2] is, however, not as straightforward as you might expect.
Continue reading