Since its release, AlphaFold has been the buzz of the computational biology community. It seems that every group in the protein science field is trying to apply the model in their respective areas of research. Already we are seeing numerous papers attempting to adapt the model to specific niche domains across a broad range of life sciences. In this blog post I summarise a recent paper’s use of the technology for predicting protein-protein interfaces.
Continue readingCategory Archives: Networks
Universal graph pooling for GNNs
Graph neural networks (GNNs) have quickly become one of the most important tools in computational chemistry and molecular machine learning. GNNs are a type of deep learning architecture designed for the adaptive extraction of vectorial features directly from graph-shaped input data, such as low-level molecular graphs. The feature-extraction mechanism of most modern GNNs can be decomposed into two phases:
- Message-passing: In this phase the node feature vectors of the graph are iteratively updated following a trainable local neighbourhood-aggregation scheme often referred to as message-passing. Each iteration delivers a set of updated node feature vectors which is then imagined to form a new “layer” on top of all the previous sets of node feature vectors.
- Global graph pooling: After a sufficient number of layers has been computed, the updated node feature vectors are used to generate a single vectorial representation of the entire graph. This step is known as global graph readout or global graph pooling. Usually only the top layer (i.e. the final set of updated node feature vectors) is used for global graph pooling, but variations of this are possible that involve all computed graph layers and even the set of initial node feature vectors. Commonly employed global graph pooling strategies include taking the sum or the average of the node features in the top graph layer.
While a lot of research attention has been focused on designing novel and more powerful message-passing schemes for GNNs, the global graph pooling step has often been treated with relative neglect. As mentioned in my previous post on the issues of GNNs, I believe this to be problematic. Naive global pooling methods (such as simply summing up all final node feature vectors) can potentially form dangerous information bottlenecks within the neural graph learning pipeline. In the worst case, such information bottlenecks pose the risk of largely cancelling out the information signal delivered by the message-passing step, no matter how sophisticated the message-passing scheme.
Continue readingIssues with graph neural networks: the cracks are where the light shines through
Deep convolutional neural networks have lead to astonishing breakthroughs in the area of computer vision in recent years. The reason for the extraordinary performance of convolutional architectures in the image domain is their strong ability to extract informative high-level features from visual data. For prediction tasks on images, this has lead to superhuman performance in a variety of applications and to an almost universal shift from classical feature engineering to differentiable feature learning.
Unfortunately, the picture is not quite as rosy yet in the area of molecular machine learning. Feature learning techniques which operate directly on raw molecular graphs without intermediate feature-engineering steps have only emerged in the last few years in the form of graph neural networks (GNNs). GNNs, however, still have not managed to definitively outcompete and replace more classical non-differentiable molecular representation methods such as extended-connectivity fingerprints (ECFPs). There is an increasing awareness in the computational chemistry community that GNNs have not quite lived up to the initial hype and still suffer from a number of technical limitations.
Continue readingPrediction of Parkinson subtypes at COXIC 2020
Last week I attended the COXIC seminar (joint seminar Oxford – Imperial focused on networks and complex systems) organised by Florian Klimm from Imperial College London (and former OPIG member!). We had several interesting at the seminar. However, one of them caught my eye more than the rest. It was the talk of Dr Sanjukta Krishnagopal (UCL) titled Predicting Parkinson’s Sub-types through Trajectory Clustering in Bipartite Networks, of which I will give a quick insight. Hope you like it (at least) as much as I did!
This blogpost is based on these two articles:
- Sanjukta Krishnagopal, Rainer Von Coelln, Lisa Shulman, Michelle Girvan. “Identifying and predicting Parkinson’s disease subtypes through trajectory clustering via bipartite networks” PloS one (2020)
- Sanjukta Krishnagopal. “Multi-later Trajectory Clustering Network Algorithm for Disease Subtyping” Biomedical Physics & Engineering Express (2020)
Robust gene coexpression networks using signed distance correlation
Even within well-studied organisms, many genes lack useful functional annotations. One way to generate such functional information is to infer biological relationships between genes/proteins, using a network of gene coexpression data that includes functional annotations. However, the lack of trustworthy functional annotations can impede the validation of such networks. Hence, there is a need for a principled method to construct gene coexpression networks that capture biological information and are structurally stable even in the absence of functional information.
In my latest paper, we introduce the concept of signed distance correlation as a measure of dependency between two variables and apply it to generate gene coexpression networks. Distance correlation offers a more intuitive approach to network construction than commonly used methods such as Pearson correlation. We propose a framework to generate self-consistent networks using signed distance correlation purely from gene expression data, with no additional information. We analyse data from three different organisms to illustrate how networks generated with our method are more stable and capture more biological information compared to networks obtained from Pearson or Spearman correlations.
EEGor on Proteins: A Brain-based Perspective on Crowd-sourced Protein Structure Prediction
EEG-based Brain-Computer Interfaces (BCIs) are becoming increasingly popular, with products such as the Muse Headband and g-tec’s Unicorn Hybrid Black taking off, while in the protein folding space, Fold It and distributed/crowd computing efforts like Fold@home, don’t seem to be talked about as much as they once were.
Game-ification is still just as effective a tool to harness human ingenuity as it once was, so perhaps what is needed is a new approach to crowd-folding efforts that can tap into the full potential of the human mind to manipulate and visualise new 3D structures, by drawing inspiration directly from the minds of users…
Science in the Time of COVID-19
If you are reading this blog, I am sure you will agree that science and research are essential, and even more in the context of a pandemic. Concepts such as PCR, antibody, and herd immunity are slowly getting into people’s vocabulary. This fact makes me quite happy but puts in evidence the lack of scientific knowledge among the population.
Continue readingRobust networks to study omics data
One of the challenges that biology-related sciences are facing is the exponential increase of data. Nowadays, thanks to all the sequencing techniques which are available, we are generating more data than the amount we can study. We all love all the genomic, epigenomic, transcriptomic, proteomic, … , glycomic, lipidomic, and metagenomic studies because of the rich they are. However, most of the times, the analysis of the results uses only a fraction of all the generated data. For example, it is quite frequent to study the transcriptome of an organism in different environments and then just focus on identifying which 2 or 3 genes are upregulated. This type of analyses do not exploit the data to its maximum extent and here is where network analysis makes its appearance!
When OPIGlets leave the office
Hi everyone,
My blogpost this time around is a list of conferences popular with OPIGlets. You are highly likely to see at least one of us attending or presenting at these meetings! I’ve tried to make it as exhaustive as possible (with thanks to Fergus Imrie!), listing conferences in upcoming chronological order.
(Most descriptions are slightly modified snippets taken from the official websites.)
Just Call Me EEGor
Recently, I was lucky enough to assist in (who am I kidding…obstruct) a sleep and anaesthesia study aimed at monitoring participants by Electroencephalogram (EEG) in various states of consciousness. The study, run by Dr Katie Warnaby of The Anaesthesia Neuroimaging Research Group at The Nuffield Department of Clinical Neuroscience, makes use of both EEG and functional Magnetic Resonance Imaging (fMRI). The research aim is to learn about the effects anaesthesia has on the brain and and in so doing help us both understand ourselves and understand how to most effectively monitor patients undergoing surgery.
Continue reading