Recently, I was able to attend Martin Grohe’s talk on The Logic of Graph Neural Networks. Professor Grohe of RWTH Aachen University, is a titan of the fields of Logic and Complexity theory. Even so, he is modest about his achievements, and I was tickled when it was pointed out to me that the theorem he refers to as “a little complex”, one of his crowning achievements, involves a four-hundred page long book of a proof.
The theorem relates to the Weisfeiler-Lehmann (WL) algorithm, an algorithm for determining whether two graphs are equivalent (i.e. isomorphic). The algorithm has deep connections with combinatorics, complexity theory and first order logic. A system of logic that is remarkably similar to the relations present in ontologies such as the Gene Ontology (GO), which is commonly used to compare and predict protein function. Kernelised methods and other WL-based metrics present a new and possibly logically “complete” way to potentially compare the functions of proteins and infer their similarity.
But what is the WL algorithm? As stated above the WL algorithm is a graph equivalence test. Two graphs G and H are said to be equivalent if they have the same number of connected components and their edge connectivity is retained. The WL is an incomplete test of the equivalence of G and H that works by iterative recolouring according to the colour of their neighbours, starting with a colour (or number) corresponding to the number of neighbours. Below is a flow diagram depicting two iterations of the WL algorithm performed on the nodes of the graph (i.e. WL-1)…
The WL algorithm is incomplete in the general case in that different stable colouring histograms necessarily imply that G and H are not equivalent, however if the histograms are the same this does not imply that G and H are in fact equivalent. However, for specific graph classes Grohe and others have shown that, higher order abstractions (involving the edges and edge-tuples of G and H) are a complete test of equivalence. As as directed acyclic graphs (DAGs), ontologies represent a very particular class of graphs that may be just the sort that are easily accessible to lower order WL-type algorithms. Successful WL-based and graphical methods already exist in the chemoinformatics space for testing the differences between complexes. Now be the right time to start applying the wealth of WL-based algorithms (many of which attempt to circumvent some of its computational shortcomings) that are coming out right now to large scale protein networks!