5th Artificial Intelligence in Chemistry Symposium

The lineup for the Royal Society of Chemistry’s 5th “Artificial Intelligence in Chemistry” Symposium (Thursday-Friday, 1^st-2^nd September 2022) is now complete for both oral and poster presentations. It really is a fantastic selection of topics and speakers and it is clear this event is now a highlight of the scientific calendar. Our very own Prof. Charlotte M. Deane, MBE will be giving a keynote.

5th RSC-BMCS/RSC-CICAG Airtificial Intelligence in Chemistry Symposium, 1st-2nd September, Churchill College, Cambridge + Zoom broadcast.

It marks a return to in-person meetings: it will be held at Churchill College, Cambridge, with a conference dinner at Trinity Hall.

More details are here: https://www.rscbmcs.org/events/aichem22/.

Registration for in person attendance is open until Monday 29th August 17:00 (BST).

It is also possible to register for virtual attendance; the meeting will be broadcast on Zoom.

#AIChem22, AI in Chemistry, RSC

The evolution, evolvability and engineering of gene regulatory DNA

Catching up on the literature is one of the highlights of my job as a scientist. True, sometimes you can be overwhelmed by the amount of information you don’t have; or wonder if we really need another paper showing that protein-ligand scoring functions don’t work. And yet, sometimes you find excellent research that you can’t but regard with a mixture of awe and envy. At a recent group meeting, I discussed one such paper from the research group of Aviv Regev at MIT, where the authors perform an impressive combination of computation and experiment to consider some basic questions in gene regulation and evolution. Here is why I think it’s excellent.

The authors are interested in promoters, small sequences of DNA that precede genes, which are known to regulate how frequently their partners will be expressed. In short, these promoters are binding sites for transcription factors, a family of proteins that in turn recruit RNA polymerase to transcribe DNA to RNA. In turn, albeit not directly, the rate of gene transcription determines the rate at which a protein is produced. If this sounds simple, however, that is where our understanding stops. The human genome encodes some 1.6k different transcription factors (~6-7% of protein-coding genes) and their underworkings are still not well-understood.

Continue reading →

The SARS-CoV-2 protein spike glycosylation not only shields but primes binding by providing structural stability too

Yep, it is very well known that the sugar coating (aka glycosylation) of viruses makes them invisible to the immune system, a strategy so effective that like in the case of HIV, whose spike is almost entirely covered by glycans, makes it so difficult to target by the human immune system.

Unsurprisingly, coronaviruses such as SARS, MERS, and SARS-CoV-1(2) not only benefit from this evolutionary strategy but there is evidence now that sugars provide stability to their spikes to be effective binders by glueing the spike chains, hence making them infectious.

This is the major finding of this paper that introduces very interesting results from all-atom MD simulations of a fully glycosylated model of the SARS-CoV-2 spike protein embedded in a realistic viral membrane. Researchers aimed to look into the stability of the protein spike (A, B, and C) chains in the “open” and “closed” conformation and how these changed upon key residue mutations to test how glycans sitting in the inter-chain space affect stability. It also aimed at quantifying glycans’ shielding effect from molecules ranging from 2 to 15 Angstroms, i.e., from small-sized to peptide- and antibody-sized molecules.

Continue reading →

Cool ideas in Deep Learning and where to find more about them

I was planning on doing a blog post about some cool random deep learning paper that I have read in the last year or so. However, I keep finding that someone else has already written a way better blog post than what I could write. Instead I have decided to write a very brief summary of some hot ideas and then provide a link to some other page where someone describes it way better than me.

The Lottery Ticket Hypothesis

This idea has to do with pruning a model, which is when you remove a parts of your model to make it more computationally efficient while barely loosing accuracy. The lottery ticket hypothesis also has to do with how weight are initialized in neural networks and why larger models often achieve better performance.

Anyways, the hypothesis says the following: “Dense, randomly-initialized, feed-forward networks contain subnetworks (winning tickets) that—when trained in isolation—reach test accuracy comparable to the original network in a similar number of iterations.” In their analogy, the random initialization of a models weights is treated like a lottery, where some combination of a subset of these weight is already pretty close to the network you want to train (winning ticket). For a better description and a summary of advances in this field I would recommend this blog post.

SAM: Sharpness aware minimization

The key idea here has to do with finding the best optimizer to train a model capable of generalization. According to this paper, a model that has converged to a sharp minima will be less likely to generalize than one that has converged to a flatter minima. They show the following plot to provide an intuition of why this may be the case.

In the SAM paper (and ASAM for adaptive) the authors implement an optimizer that is more likely to converge to a flat minima. I found this blog post by the authors of ASAM gives a very good description of the field.

Continue reading →

Monoclonal antibody PRNP100 therapy for Creutzfeldt–Jakob disease

Recently, University College London Hospitals (UCLH) received a “Specials License” to allow the treatment of six patients suffering from Creutzfeldt–Jakob Disease (CJD), by way of a novel antibody known as PRN100. The results of this treatment have now been published in The Lancet.

There is currently no cure for CJD, yet over 100 people per year develop it either spontaneously or through external means including (but not limited to) growth hormones, cataract surgery or infected neurosurgical implements [1]. “There is no UK legislation which implements a compassionate use programme as set out in Article 83 of the relevant EU regulation. But the UK has implemented an exemption process known as the “Specials” in light of the requirement to be able to deal with special needs.” [2]

As there is no known cure, the request for use of PRN100 was put before the court as in Law “Some treatment decisions are so serious that the court has to make them.”

Continue reading →

Exploring topological fingerprints in RDKit

Finding a way to express the similarity of irregular and discrete molecular graphs to enable quantitative algorithmic reasoning in chemical space is a fundamental problem in data-driven small molecule drug discovery.

Virtually all algorithms that are widely and successfully used in this setting boil down to extracting and comparing (multi-)sets of subgraphs, differing only in the space of substructures they consider and the extent to which they are able to adapt to specific downstream applications.

A large body of recent work has explored approaches centred around graph neural networks (GNNs), which can often maximise both of these considerations. However, the subgraph-derived embeddings learned by these algorithms may not always perform well beyond the specific datasets they are trained on and for many generic or resource-constrained applications more traditional “non-parametric” topological fingerprints may still be a viable and often preferable choice .

This blog post gives an overview of the topological fingerprint algorithms implemented in RDKit. In general, they count the occurrences of a certain family of subgraphs in a given molecule and then represent this set/multiset as a bit/count vector, which can be compared to other fingerprints with the Jaccard/Dice similarity metric or further processed by other algorithms.

Continue reading →

Tackling horizontal and vertical limitations

A blog post about reviewing papers and preparing papers for publication.

We start with the following premise: all papers have limitations. There is not a single paper without limitations. A method may not be generally applicable, a result may not be completely justified by the data or a theory may make restrictive assumptions. To cover all limitations would make a paper infinitely long, so we must stop somewhere.

A lot of limitations fall into the following scenario. The results or methods are presented but they could have extended them in some way. Suppose, we obtain results on a particular cell type using an immortalized cell-line. Are the results still true, if we performed the experiments on primary or patient-derived cells? If the signal from the original cells was sufficiently robust then we would hope so. However, we can not be one hundred percent sure. A similar example is a method that can be applied to a certain type of data. It may be possible to extend the method to be applied to other data types. However, this may require some new methodology. I call this flavor of limitations vertical limitations. They are vertical in the sense that they build upon an already developed result in the manuscript. For certain journals, they will require that you tackle vertical limitations by adapting the original idea or method to demonstrate broad appeal or that idea could permeate multiple fields. Most of the time, however, the premise of an approach is not to keep extending it. It works. Leave it alone. Do not ask for more. An idea done well does not need more.

Continue reading →

Oxford MRC DTP Symposium 2022

The Oxford Medical Research Council Doctoral Training Partnership (MRC DTP), the program through which my DPhil is funded, hosts an annual Symposium to highlight research being conducted by DTP students and offer insights into the career paths of external speakers.

This year, I was on the committee organising the Symposium and was involved in selecting student presenters, as well as deciding on and inviting external speakers. It was a great experience!

Panel on careers in biotech featuring Loïc Roux, Ochre Bio (centre); Helena Meyer-Berg, Sirion Biotech (centre right); and Claire Shingler, Oxford BioEscalator (right).

Here are my key takeaways from the Symposium:

Continue reading →

Entering a Stable Relationship with your Neural Network

Over the past year, I have been working on building a graph-based paratope (antibody binding site) prediction tool – Paragraph. Fortunately, I have had moderate success with this and you can now check out the preprint of this work here.

However, for a long time, I struggled with a highly unstable network, where different random seeds yielded very different results. I believe this instability was largely due to the high class imbalance in my data – only ~10% of all residues in the Fv (variable region of the antibody) belong to the paratope.

I tried many different things in an attempt to stabilise my training, most of which failed. I will share all of these ideas with you though – successful or not – as what works for one person/network is never guaranteed to work for another. I hope that the below may provide some ideas to try out for others facing similar issues. Where possible, I also provide some example hyperparameter values that could act as sensible starting points.

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

5th Artificial Intelligence in Chemistry Symposium

The evolution, evolvability and engineering of gene regulatory DNA

The SARS-CoV-2 protein spike glycosylation not only shields but primes binding by providing structural stability too

Cool ideas in Deep Learning and where to find more about them

The Lottery Ticket Hypothesis

SAM: Sharpness aware minimization

Monoclonal antibody PRNP100 therapy for Creutzfeldt–Jakob disease

Exploring topological fingerprints in RDKit

Tackling horizontal and vertical limitations

Oxford MRC DTP Symposium 2022

Entering a Stable Relationship with your Neural Network