Data can often naturally be represented in a graph format and being able to directly employ a deep learning architecture on that data without finding a different representation is an appealing idea. Graph neural networks (GNNs) have become a standard part of the ML toolbox but navigating the world of different architectures available out-of-the-box can be a daunting task. A great place to start looking for architectures is with PyTorch Geometric, which provides an extensive list of readily available GNN layers and tutorials on how to use them in your standard PyTorch models. There are many things to consider when choosing a GNN layer, but the two considerations that I think are a great place to start are expressiveness and edge feature handling. In general, it is hard to predict what will work best for the task at hand and hence it’s optimal to try a wide range of different layers. This blogpost is meant as a brief introduction for what I would find useful to know before I started using GNNs, and a starting point for exploring the GNN literature.
Expressiveness
Broadly, GNN layers fit into three categories based on the way propagation is performed: Convolutional, attentional and message-passing. These definitions describe how the feature vector of a node is updated in a GNN layer based on the feature vectors of its neighbouring nodes. The expressive power of the layers increases from convolutional, to attentional to message passing, in fact attentional layers are special cases of message-passing layers and convolutional layers are special cases of attentional layers. In convolutional layers, the importance of a neighbouring node has to be fixed and precalculated, while in attentional layers the importance is feature-dependent and is learned. I recommend reading Petar Velicovic’s paper Everything is Connected: Graph Neural Networks, as well as the chapter on GNNs in Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges by Bronstein et al. to get a short summary of the different flavours of GNNs with mathematical rigour.
Even though message-passing layers are more expressive does not necessarily mean that they are always the right choice for you. They can for example require a lot of memory, can be harder to train and can be less interpretable. Attentional GNN layers are my layers of choice when working with molecular data, because they are more scalable than message-passing layers and I’ve found that they outperform standard convolutional layers. It’s important to emphasise that these three categories don’t sufficiently explain all the variation in the available GNN layers in PyTorch Geometric and so trying a few different layers from each category is preferable.
Edge feature handling
Where do edge feature vectors fit into the picture? When working with molecular data, edges can represent covalent or non-covalent bonds, and describing those interactions can be imperative to the task at hand, and therefore it’s important that the selection of GNN layers reflects that. The GNN cheatsheet shows a list of GNN layers available in PyTorch Geometric and whether the layers support edge feature vectors. It’s important to also look into how those edge features are incorporated into the node feature vector propagation. For example, below is the definition of the propagation rules for two attentional GNN layers: GAT developed by Velicovic et al. (2017), and TransformerConv developed by Shi et al. (2020). Note that in GATs, the edge features only influence the node feature updates through the attention coefficient, whilst in TransformerConv the edge features are also included directly in the node feature update. Additionally, edge features are distinct from node features in PyTorch Geometric since they are only used to update node features but remain static throughout the GNN layers, i.e. they are not updated themselves.