Conference summary: Generative AI in Life Science | Oxford Protein Informatics Group

This year I attended the second edition of Generative AI in Life Science (GenLife – https://genlife.dk/) and it was an enriching experience that I thoroughly enjoyed. Held in Copenhagen, the event brought together researchers from different areas of AI applied to the life sciences and provided a fantastic platform for networking, learning and sharing ideas. The programme included a mix of long and short talks from experts in the field, but also had a significant presence of emerging PIs, making the conference a perfect place to discover emerging groups in the field. Here I have collected some highlights of the talks I have enjoyed the most at the conference.

Advancing in peptide sequencing

Timothy Jenkins from the Technical University of Copenhagen (DTU) gave a fascinating talk on small peptide sequencing. Timothy began by showing that, despite progress, small peptide sequencing is still a challenging field, especially when performed at scale. He then presented results from his group work in collaboration with InstaDeep (preprint here), where they have developed a new AI tool called InstaNovo, which allows database-free sequencing of peptides from large scale experiments, setting a new state-of-the-art accuracy for the field. Finally, Timothy showed some unpublished results on how they are using InstaNovo and other tools to create new synthetic antibodies for the simultaneous treatment of venom from different snakes.

Optimal transport and flow matching with applications to protein design

Alexander Tong from MILA presented their progress in equivariant flow matching model generation. The presentation focused on the result of their recent preprint where they introduce FoldFlow, a series of SE(3) equivariant generative models for designing novel protein structures, which offer stable and fast training, accurate protein backbone modeling and sampling of equilibrium conformations in proteins, with comparable accuracy to state-of-the-art methods. I found the detailed insights into the theory and architecture they have developed particularly useful for understanding the principle behind flow matching.

Structure-based epitope-specific antibody design

Another talk I would like to highlight is that of Dina Schneidman from the Hebrew University, who talked about her group’s progress in epitope-specific antibody design using language model embeddings (NeurIPS paper). Aside from the accuracy of the model and the results shown, I was most intrigued by the use of CLIP during training. CLIP is a neural network architecture that allows you to build a model using additional embeddings from ground truth structures or sequences to improve the training of your model. The advantage of CLIP is that the extra embeddings used during training are not needed during inference, resulting in a more accurate model thanks to the additional guidance, but that can work on novel targets where the embeddings are not available.

Life2Vec

While biomolecules were the main target of the presentation at GenLife, Germans Savcisens from DTU (now at Northeastern University) showed how information from the danish national registry dataset, which includes information on health, education, occupation and much more, can be used to create a model that generates an embedding vector of life events, which they have used to predict a variety of outcomes, including the risk factor for early mortality and the results of personality tests. Not the usual topic I have seen at an AI conference, but a really exciting way of using decades of information collected from citizens. All details here.

Author

Matteo Cagiada

View all posts