IWD 2021 and the Gender Pay Gap

Throughout the pandemic, the statistics on division of childcare and home-schooling responsibilities have been shocking: mothers are taking on 150% more homeschooling than fathers (1), while 71% of working mothers’ furlough applications were rejected (2).  A third of working mothers reported having lost  some or all work due to a lack of childcare during the pandemic, with this figure rising to 44% for  BAME mothers. On top of this, 90% of the UK’s 2 million single parents are women (3). These unequal divisions are threatening to undo decades of progress towards gender equality.

In April 2019, the pay gap between men and women in the UK was 17.3% (4), and at the current rate of gender pay gap reduction, the gap will not be closed until 2052 (5).  The causes of this gap continue to be unequal caring responsibilities,  more women in low-paid work and (illegal) discrimination. BAME women are also subject to the ethnicity pay gap.  While this varies regionally and by ethnicity, in London in 2018 the overall figure was 23% (6).

Continue reading

Better understanding of correlation

Although correlation is often used as the linear relationship between two sets of points, I will in the following text use it more broadly to mean any relationship between two sets of points.

You have tasked yourself with finding the correlation between the different features in your dataset. Your purpose could be to remove highly correlated features or just improve your understanding of your data. Nonetheless, calculating and using the Pearson Correlation Coefficient (PCC) or the Spearman’s rank Correlation Coefficient (SCC) to get an overview of the correlations might be the first thing that comes to your mind.

Unfortunately, both of these are limited to linear (PCC) or monotonic (SCC) relationships. In datasets with many and complex features, many of them will be highly correlated, just not linearly (or monotonic). Instead these correlations can be non-linear which, as seen in the third row in the below figure, does not get detected with PCC.

Figure: PCC of different sets of x and y points. https://en.wikipedia.org/wiki/Correlation_and_dependence
Continue reading

ORDER!: Returning bond order information to your docked poses

John Bercow Order Remix - YouTube

Common docking software, such as AutoDock Vina or AutoDock 4, require the ligand and receptor files to be converted into the PDBQT format. Once a correct pose has been identified, the pose will be produced also as a .pdbqt file.

Continue reading

Commercialising your research: Where to start?

If you look at some of the biggest technology companies in the world, from Google and Facebook to hardware companies like Dell or even biotech unicorns like Oxford’s own Oxford Nanopore, all of them started on university campuses. If you are a researcher interested in finding out how to make the first steps to commercialise your research here is a quick guide:

Continue reading

Peer Review: reviewing as an early career researcher

Peer review is an important component of academic research and publishing, but it can feel like an opaque process, especially for those not directly involved. I am very fortunate to have been able to participate in the peer review of multiple papers, despite being very early in my career, through support from my supervisors and a mentoring program run by Sense about Science with Nature Communications. Here are some of the things I have learned.

Continue reading

The Coronavirus Antibody Database: 10 months on, 10x the data!

Back in May 2020, we released the Coronavirus Antibody Database (‘CoV-AbDab’) to capture molecular information on existing coronavirus-binding antibodies, and to track what we anticipated would be a boon of data on antibodies able to bind SARS-CoV-2. At the time, we had found around 300 relevant antibody sequences and a handful of solved crystal structures, most of which were characterised shortly after the SARS-CoV epidemic of 2003. We had no idea just how many SARS-CoV-2 binding antibody sequences would come to be released into the public domain…

10 months later (2nd March 2021), we now have tracked 2,673 coronavirus-binding antibodies, ~95% with full Fv sequence information and ~5% with solved structures. These datapoints originate from 100s of independent studies reported in either the academic literature or patent filings.

The entire contents CoV-AbDab database as of 2nd March 2021.
Continue reading

On The Logic of GOing with Weisfeiler-Lehman

Recently, I was able to attend Martin Grohe’s talk on The Logic of Graph Neural Networks. Professor Grohe of RWTH Aachen University, is a titan of the fields of Logic and Complexity theory. Even so, he is modest about his achievements, and I was tickled when it was pointed out to me that the theorem he refers to as “a little complex”, one of his crowning achievements, involves a four-hundred page long book of a proof.

The theorem relates to the Weisfeiler-Lehmann (WL) algorithm, an algorithm for determining whether two graphs are equivalent (i.e. isomorphic). The algorithm has deep connections with combinatorics, complexity theory and first order logic. A system of logic that is remarkably similar to the relations present in ontologies such as the Gene Ontology (GO), which is commonly used to compare and predict protein function. Kernelised methods and other WL-based metrics present a new and possibly logically “complete” way to potentially compare the functions of proteins and infer their similarity.

The Gene Ontology follows a simple set of rules, very similar to first order logic. From the GO Database Description
Continue reading

Ten Simple Rules For Solving Any Problem

Welcome! Take three deep breaths, each time expel the air through your nose with force. Now you are ready for this adventure. Let us dive right in and reflect on the premise of this blog post. 

I, personally, dislike the word “solve”. What does it even mean? And that is already the second time I have used that word. The word solve implies all kinds of nonsense, such as completion or the existence of a solution. Let us recast it as: new insights, positive reframing or simply “ah-ha!”. These “problems” can be anything too: emotional ones (external or internal), scientific or research ones, artistic ones, writing ones. If you feel like it, just call it a problem.

Whoever is writing this blog post, he certainly does talk a lot … We should really get going or we will run out of time. You know what, let us start again. Welcome to:

Continue reading

Plotly for interactive 3D plotting

An recently wrote a post on how to use the seaborn library. I really like seaborn and use it a lot for 2D plots. However, recently I have been dealing with 3D data and have found plotly to be best. When used in a jupyter notebook, it allows you to easily generate 3D interactive plots. This is extremely useful to visualize structural data.

Continue reading