Monthly Archives: March 2025

The Sprawl: Slogs in Scribing and Software

“Dead shopping malls rise like mountains beyond mountains. And there’s no end in sight.”

Régine Chassagne

Sometimes I wonder would my PhD have been simpler if I had broken up the findings into three smaller papers. In the end there were 7 main figures, 7 supplementary figures, 5 supplementary tables and one supplementary data section in one solitary publication. The contents of a 3 year 3 month tour through the helper T cell response to the inner proteins of the flu virus. The experimental worked comprised crystal structures, cell assays, tetramer staining and TCR sequencing. During the following years as it was batted back and forth between last authors, different journals and reviewers I continually reworked the figures and added extra bioinformatic analyses. I was fortunate that others in the lab kindly performed some in vivo experiments which helped cement the findings. It all started in January 2014, but the paper wasn’t published until July 2020. There are many terms which could be used to describe how the process of writing and re-writing felt as it dragged on through my 3 year post doc, for the purpose of this very public blog I will refer to it as, “a slog.

Continue reading

Combining Multiple Comparisons Similarity plots for statistical tests

Following on from my previous blopig post, Garrett gave the very helpful suggestion of combining Multiple Comparisons Similarity (MCSim) plots to reduce information redundancy. For example, this an MCSim plot from my previous blog post:

This plot shows effect sizes from a statistical test (specifically Tukey HSD) between mean absolute error (MAE) scores for different molecular featurization methods on a benchmark dataset. Red shows that the method on the y-axis has a greater average MAE score than the method on the x-axis; blue shows the inverse. There is redundancy in this plot, as the same information is displayed in both the upper and lower triangles. Instead, we could plot both the effect size and the p-values from a test on the same MCSim.

Continue reading

Geometric Deep Learning meets Forces & Equilibrium

Introduction

Graphs provide a powerful mathematical framework for modelling complex systems, from molecular structures to social networks. In many physical and geometric problems, nodes represent particles, and edges encode interactions, often acting like springs. This perspective aligns naturally with Geometric Deep Learning, where learning algorithms leverage graph structures to capture spatial and relational patterns.

Understanding energy functions and the forces derived from them is fundamental to modelling such systems. In physics and computational chemistry, harmonic potentials, which penalise deviations from equilibrium positions, are widely used to describe elastic networks, protein structures, and even diffusion processes. The Laplacian matrix plays a key role in these formulations, linking energy minimisation to force computations in a clean and computationally efficient way.

By formalising these interactions using matrix notation, we gain not only a compact representation but also a foundation for more advanced techniques such as Langevin dynamics, normal mode analysis, and graph-based neural networks for physical simulations.

Continue reading