Author Archives: Benjamin McMaster

Conference Summary: AIRR Community Meeting VII – Learnings and Perspectives

At the start of June, we (Lewis and Benjie) attended the AIRR Community meeting in beautiful and sunny Porto, Portugal. This meeting was focused on collecting and analysing adaptive immune receptor repertoires. This comprised of two rivalling factions at the conference: the antibody (Ab) people or the T cell antigen receptor (TCR) people. The split was nearly fifty-fifty between these two topics throughout the conference. Overall, the conference was a fairly comfortable size, with approximately a hundred people in attendance, making it easy to visit all of the posters and talk with many people in your area, without feeling too niche. There was a wide variety of content formats throughout the conference including posters, scientific talks, lightning talks, software demos, and hands-on tutorials. In the following section, we highlight some of our favourite sessions to give a flavour of what this meeting entails.

Continue reading

Optimising for PR AUC vs ROC AUC – an intuitive understanding

When training a machine learning (ML) model, our main aim is usually to get the ‘best’ model out the other end in an unbiased manner. Of course, there are other considerations such as quick training and inference, but mostly we want to be good at predicting the right answer.

A number of factors will affect the quality of our final model, including the chosen architecture, optimiser, and – importantly – the metric we are optimising for. So, how should we pick this metric?

Continue reading

Let your library design blosum

During the lead optimisation stage of the drug discovery pipeline, we might wish to make mutations to an initially identified binding antibody to improve properties such as developability, immunogenicity, and affinity.

There are many ways we could go about suggesting these mutations including using Large Language Models e.g. ESM and AbLang, or Inverse Folding methods e.g. ProteinMPNN and AntiFold. However, some of our recent work (soon to be pre-printed) has shown that classical non-Machine Learning approaches, such as BLOSUM, could also be worth considering at this stage.

Continue reading

AI Can’t Believe It’s Not Butter

Recently, I’ve been using a Convolutional Neural Network (CNN), and other methods, to predict the binding affinity of antibodies from their sequence. However, nine months ago, I applied a CNN to a far more important task – distinguishing images of butter from margarine. Please check out the GitHub link below to learn moo-re.

https://github.com/lewis-chinery/AI_cant_believe_its_not_butter

Quality Stats

Disclaimer – the title is a Quality Street pun only and bears no relation to the quality of the data or analysis presented below. This whole blog post is basically to discredit the personal chocolate preferences of a group member who shall remain nameless. Safe to say though, they Vostly overestimated people’s love for the Toffee Finger. Long live the Orange Creme.

Continue reading

Vaccines and vino

Recently, I was fortunate enough to attend and present at GSK’s PhD and Postdoc workshop in Siena, Italy. The workshop spanned two days and I had a brilliant time there – Siena itself is beautiful, I ate fantastic food, and I learnt a huge amount about all stages of vaccine production.

Unfortunately, due to confidentiality, I can’t go into great detail about others’ current research, however I have provided a short overview of the five main areas the workshop focused on below.

Continue reading

ISMB 2022 – July 10-14 Madison, Wisconsin

Madison, Wisconsin, a place known for its superb selection of craft beverages, for having Wisconsin’s Best Cheese Curds, and, most importantly, for hosting the 2022 annual international conference on Intelligent Systems for Molecular Biology (ISMB). Fortunately, we (Lewis and Tobias) got to attend this year’s ISMB and get a taste of Madison. The 2022 conference is the 30th ISMB conference and has grown to become the world’s largest bioinformatics/computational biology conference with nearly 600 presented talks. We therefore got to hear a wide range of different and interesting talks.

Continue reading

Entering a Stable Relationship with your Neural Network

Over the past year, I have been working on building a graph-based paratope (antibody binding site) prediction tool – Paragraph. Fortunately, I have had moderate success with this and you can now check out the preprint of this work here.

However, for a long time, I struggled with a highly unstable network, where different random seeds yielded very different results. I believe this instability was largely due to the high class imbalance in my data – only ~10% of all residues in the Fv (variable region of the antibody) belong to the paratope.

I tried many different things in an attempt to stabilise my training, most of which failed. I will share all of these ideas with you though – successful or not – as what works for one person/network is never guaranteed to work for another. I hope that the below may provide some ideas to try out for others facing similar issues. Where possible, I also provide some example hyperparameter values that could act as sensible starting points.

Continue reading

Monty Python

Every now and then I decide to overthink a problem I thought I understood and get confused – last week, it was the Monty Hall problem. 

For those unfamiliar with the thought experiment, the basic premise is that you are on a game show and are presented with three doors. Behind one of the doors is a car, while behind the other two are goats. 

With zero initial information, you make a guess as to which door you think the car is behind (we assume you have enough goats already). Before looking behind your chosen door, the host opens one of the remaining two doors and reveals a goat. The host then asks you if you would like to change your guess. What should you do? 

Continue reading