Category Archives: Molecular Design

Happy 10th Birthday, Blopig!

OPIG recently celebrated its 20th year; and on 10 January 2023 I gave a talk just a day before the 10th anniversary of BLOPIG’s first blog post. It’s worth reflecting on what’s stayed the same and what’s changed since then.

Continue reading

Some Musings on AI in Art, Music and Protein Design

When I started my PhD in late 2018, AI hadn’t really entered the field of de novo protein design yet – at least not in a big way. Rosetta’s approach of continually ranking new side chain rotamers on a fixed backbone was still the gold standard for the ‘structure-to-sequence’ problem. And of course before long we had AI making waves in the structure prediction field, eventually culminating in the AlphaFold2 we all know and love. 

Now, towards the end of my PhD, we are seeing the emergence of new generative models that learn from existing pdb structures to produce sequences that will (or at least should) fold into viable, sensible and crucially natural-looking shapes. ProtGPT2 is a good example (https://www.nature.com/articles/s41467-022-32007-7), but there are several more. How long before these models start reliably generating not only shapes but functions too? Jury’s out, but it’s looking more and more feasible. Safe to say the field as a whole has evolved massively during my time as a graduate student.

Continue reading

Code your own molecule sketcher in 4 easy steps!

Drawing molecules on your laptop usually requires access to proprietary software such as ChemDraw (link) or free websites such as PubChem’s online sketcher (link). However, if you are feeling adventurous, you can build your personal sketcher in React/Typescript using the Ketcher package!

Ketcher is an open-source package that allows easy implementation of a molecule sketcher into a web application. Unfortunately, it does require TypeScript so the script to run it cannot be imported directly into an HTML page. Therefore we will set up a simple React app to get it working.

The sketcher is very sleek and has a vast array of functionality, such as choosing any atom from the periodic table and being able to directly import molecules from either SMILES or Mol2/SDF file format into the sketcher. These molecules can then be edited and saved to a new file in the chemical file format of your choosing.

Continue reading

5th Artificial Intelligence in Chemistry Symposium

The lineup for the Royal Society of Chemistry’s 5th “Artificial Intelligence in Chemistry” Symposium (Thursday-Friday, 1st-2nd September 2022) is now complete for both oral and poster presentations. It really is a fantastic selection of topics and speakers and it is clear this event is now a highlight of the scientific calendar. Our very own Prof. Charlotte M. Deane, MBE will be giving a keynote.

5th RSC-BMCS/RSC-CICAG Airtificial Intelligence in Chemistry Symposium, 1st-2nd September, Churchill College, Cambridge + Zoom broadcast.

It marks a return to in-person meetings: it will be held at Churchill College, Cambridge, with a conference dinner at Trinity Hall.

More details are here: https://www.rscbmcs.org/events/aichem22/.

Registration for in person attendance is open until Monday 29th August 17:00 (BST).

It is also possible to register for virtual attendance; the meeting will be broadcast on Zoom.

CryoEM is now the dominant technique for solving antibody structures

Last year, the Structural Antibody Database (SAbDab) listed a record-breaking 894 new antibody structures, driven in no small part by the continued efforts of the researchers to understand SARS-CoV-2.

Fig. 1: The aggregate growth in antibody structure data (all methods) over time. Taken from http://opig.stats.ox.ac.uk/webapps/newsabdab/sabdab/stats/ on 25th May 2022.

In this blog post I wanted to highlight the major driving force behind this curve – the huge increase in cryo electron microscopy (cryoEM) data – and the implications of this for the field of structure-based antibody informatics.

Continue reading

From code to molecules: The future of chemical synthesis

In June, after I finish my PhD, I will be joining Chemify, a new startup based in Glasgow that aims to make chemical synthesis universally accessible, reproducible and fully automated using AI and robotics. After previously talking about “Why you should care about startups as a researcher” and a quick guide on “Commercialising your research: Where to start?” on this blog, I have now joined a science-based startup fresh out of university myself.

Chemify is a spinout from the University of Glasgow originating from the group of Prof. Lee Cronin. The core of the technology is the chemical programming language χDL (pronounced “chi DL”) that, in combination with a natural language processing AI that reads and understands chemical synthesis procedures, can be used to plan and autonomously executed chemical reactions on robotic hardware. The Cronin group has also already build the modular robotic hardware needed to carry out almost any chemical reaction, the “Chemputer”. Due to the flexibility of both the Chemputer and the χDL language, Chemify has already shown that the applications go way beyond simple synthesis and can be applied to drug formulation, the discovery of new materials or the optimisation of reaction conditions.

Armed with this transformational software and hardware, Chemify is now fully operational and is hiring exceptional talent into their labs in Glasgow. I am excited to see how smart, AI-driven automation techniques like Chemify will change how small scale chemical synthesis and chemical discovery more broadly is done in the future. I’m super excited to be part of the journey.

3 Key Questions to Think About When Designing Proteins Computationally

We have reached the era of design, not just ‘hunting’. Particularly exciting to me is the de novo design of proteins, which have a wide and ever increasing range of applications from therapeutics to consumer products, biomanufacturing to biomaterials. Protein design has been a) enabled by decades of research that contributed to our understanding of protein sequence, structure & function and b) accelerated by computational advances – capturing the information we have learned from proteins and representing it for computers and machine learning algorithms.

In this blog post, I will discuss three key methodological considerations for computational protein design:

  1. Sequence- vs structure-based design
  2. ML- vs physics-based design
  3. Target-agnostic vs target-aware design
Continue reading

A quantitative way to measure targeted protein degradation

Whenever we order consumables in the Chemistry department, the whole lab gets an email notification once they arrive. So I can understand why I got some puzzled reactions from my colleagues when one such email arrived saying that my ‘artichoke’ was ready to collect from stores. Had I been sneakily doing my grocery shopping on a university research budget?

Artichoke is, in fact, the name of a plasmid designed by the Ebert lab (https://www.addgene.org/73320/), which I have been using in some of my research on targeted protein degradation. The premise is simple enough: genes for two different fluorescent proteins, one of which is fused to a protein-of-interest.

Continue reading

Highlights from the European Antibody Congress 2021

Last month, I was fortunate enough to be able to attend (in person!) and present at the Festival of Biologics European Antibody Congress (9-11 November, 2021) in Basel, Switzerland. The Festival of Biologics is an annual conference, which brings together researchers from industry and academia. It was an excellent opportunity to learn about exciting research and meet people working in the antibody development field.

Here are some of my highlights from the European Antibody Congress, with a focus on antibody design and engineering:

Continue reading

Benchmarks in De Novo Drug Design

I recently came across a review of “De novo molecular drug design benchmarking” by Lauren L. Grant and Clarissa S. Sit where they highlighted the recently proposed benchmarking methods including Fréchet ChemNet Distance [1], GuacaMol [2], and Molecular Sets (MOSES) [3] together with its current and future potential applications as well as the steps moving forward in terms of validation of benchmarking methods [4].

From this review, I particularly wanted to note about the issues with current benchmarking methods and the points we should be aware of when using these methods to benchmark our own de novo molecular design methods. Goal-directed models are referring to de novo molecular design methods optimizing for a particular scoring function [2].

Continue reading