Category Archives: Proteins

Protein Property Prediction Using Graph Neural Networks

Proteins are fundamental biological molecules whose structure and interactions underpin a wide array of biological functions. To better understand and predict protein properties, scientists leverage graph neural networks (GNNs), which are particularly well-suited for modeling the complex relationships between protein structure and sequence. This post will explore how GNNs provide a natural representation of proteins, the incorporation of protein language models (PLLMs) like ESM, and the use of techniques like residual layers to improve training efficiency.

Why Graph Neural Networks are Ideal for Representing Proteins

Graph Neural Networks (GNNs) have emerged as a promising framework to fuse primary and secondary structure representation of proteins. GNNs are uniquely suited to represent proteins by modeling atoms or residues as nodes and their spatial connections as edges. Moreover, GNNs operate hierarchically, propagating information through the graph in multiple layers and learning representations of the protein at different levels of granularity. In the context of protein property prediction, this hierarchical learning can reveal important structural motifs, local interactions, and global patterns that contribute to biochemical properties.

Continue reading

Walk through a cell

In 2022, Maritan et al. released the first ever macromolecular model of an entire cell. The cell in question is a bacterial cell from the genus Mycoplasma. If you’re a biologist, you likely know Mycoplasma as a common cell culture contaminant.

Now, through the work of app developer Timothy Davison, you can interactively explore this cell model from the comfort of your iPhone or Apple Vision Pro. Here are three reasons why I like CellWalk:

1. It’s pretty

The visuals of CellWalk are striking. The app offers a rich depiction of the cell, allowing the user to zoom from the whole cell to individual atoms. I spent a while clicking through each protein I could see to see if I could guess what it was or what it did. Zooming out, CellWalk offers a beautiful tripartite cross section of the cell, showing first the lipid membrane, then a colourful jumble-bag of all its cellular proteins, and then finally the spaghetti-like polynucleic acids.

Tripartite cross section of a Mycoplasma cell. Screengrab taken from the CellWalk app on my phone.
Continue reading

Deliberately misfolding prions to find the golden thread.

Prion are both fascinating and terrifying. They occur naturally and have a purpose, but what that purpose is we’re still not entirely sure. Gene-knockout mice which no longer code for the prion protein do live, but they ain’t born typical.

The endogenous form of the prion protein (PrPC) can, through currently unknown mechanisms, take a different conformation, the pathogenic PrPSc. PrPSc is responsible for fatal, rapidly progressing neurodegenerative disorders which in many cases can jump species.

At OPIG, we recently discussed a remarkably rigorous series of experiments outlined in the paper “A Protein Misfolding Shaking Amplification-based method for the spontaneous generation of hundreds of bona fide prions” Whilst deliberately creating new pathogenic prions may seem and odd thing to wish to achieve, the authors aimed to determine if there was a golden thread linking “infectivity determinants, interspecies transmission barriers or the structural influence of specific amino acids”.

Continue reading

Conference summary: Generative AI in Life Science

This year I attended the second edition of Generative AI in Life Science (GenLife – https://genlife.dk/) and it was an enriching experience that I thoroughly enjoyed. Held in Copenhagen, the event brought together researchers from different areas of AI applied to the life sciences and provided a fantastic platform for networking, learning and sharing ideas. The programme included a mix of long and short talks from experts in the field, but also had a significant presence of emerging PIs, making the conference a perfect place to discover emerging groups in the field. Here I have collected some highlights of the talks I have enjoyed the most at the conference.

Continue reading

Inverse Vaccines

One of the nice things about OPIG, is that you can talk about something which is outside of your wheelhouse without feeling that the specialists in the group are going to eat your lunch. Last week, I gave an overview of the Hubbell group‘s Nature paper Synthetically glycosylated antigens for the antigen-specific suppression of established immune responses. I am not an immunologist by any stretch of the imagination, but sometimes you come across a piece of really interesting science and just want to say to people: Have you seen this, look at this, it’s really clever!

Continue reading

The stuff MDAnalysis didn’t implement: CPU Parallel HOLE conductance analysis

Some time ago, I needed to find a way to computationally estimate conductance values for every protein frame from several molecular dynamics (MD) trajectories.

In a previous post, I wrote about how to clean the resulting instant conductance timeseries from outliers. But, I never described how I generated these timeseries.

In this post, I will show how you can parallelise the computation of instant conductance given an MD trajectory. I will touch on the difficulties of this process. And why I had to implement a custom tool for it given that MDAnalysis seems to already have implemented a routine of this sort. Finally, I will provide two Python scripts that you can easily adapt to run your parallel calculations – for which I’ll provide some important notes you don’t wanna skip.

Violin plots of conductance distributions from 64 molecular dynamic trajectories with 1000 frames each.
Continue reading

What can you do with the OPIG Immunoinformatics Suite? v3.0

OPIG’s growing immunoinformatics team continues to develop and openly distribute a wide variety of databases and software packages for antibody/nanobody/T-cell receptor analysis. Below is a summary of all the latest updates (follows on from v1.0 and v2.0).

Continue reading

9th Joint Sheffield Conference on Cheminformatics

Over the next few days, researchers from around the world will be gathering in Sheffield for the 9th Joint Sheffield Conference on Cheminformatics. As one of the organizers (wearing my Molecular Graphics and Modeling Society ‘hat’), I can say we have an exciting array of speakers and sessions:

  • De Novo Design
  • Open Science
  • Chemical Space
  • Physics-based Modelling
  • Machine Learning
  • Property Prediction
  • Virtual Screening
  • Case Studies
  • Molecular Representations

It has traditionally taken place every three years, but despite the global pandemic it is returning this year, once again in person in the excellent conference facilities at The Edge. You can download the full programme in iCal format, and here is the conference calendar:

Continue reading

Can AlphaFold predict protein-protein interfaces?

Since its release, AlphaFold has been the buzz of the computational biology community. It seems that every group in the protein science field is trying to apply the model in their respective areas of research. Already we are seeing numerous papers attempting to adapt the model to specific niche domains across a broad range of life sciences. In this blog post I summarise a recent paper’s use of the technology for predicting protein-protein interfaces.

Continue reading