Picture the following: the year is 1923, and it’s a sunny afternoon at a posh garden party in Cambridge. Among the polite chatter, one Muriel Bristol—a psychologist studying the mechanisms by which algae acquire nutrients—mentions she has a preference for tea poured over milk, as opposed to milk poured over tea. In a classic example of women not being able to express even the most insignificant preference without an opinionated man telling them they’re wrong, Ronald A. Fisher, a local statistician (later turned eugenicist who dismissed the notion of smoking cigarettes being dangerous as ‘propaganda’, mind you) decides to put her claim to the test with an experiment. Bristol is given eight cups of tea and asked to classify them as milk first or tea first. Luckily, she correctly identifies all eight of them, and gets to happily continue about her life (presumably until the next time she dares mention a similarly outrageous and consequential opinion like a preferred toothpaste brand or a favourite method for filing papers). Fisher, on the other hand, is incentivized to develop Fisher’s exact test, a statistical significance test used in the analysis of contingency tables.
Continue readingPaths that you need to know for compiling
Compiling and running applications on Linux involves more than just writing code. Developers must also understand the intricacies of environment variables and command-line tools that dictate where compilers and runtime environments look for necessary files. In this post, we will cover some of them.
Default Search Paths
- Header Files: Compilers like
gcc
andg++
typically look for header files in standard directories such as/usr/include
or/usr/local/include
. These are the places where most system and third-party libraries install their header files. - Libraries: For libraries, the linker (
ld
) searches in directories like/usr/lib
,/usr/local/lib
, and sometimes in more specific directories that depend on the machine’s architecture (like/usr/lib/x86_64-linux-gnu
on 64-bit systems).
Pyrosetta for RFdiffusion
I will not lie: I often struggle to find a snippet of code that did something in PyRosetta or I spend hours facing a problem caused by something not working as I expect it to. I recently did a tricky project involving RFdiffusion and I kept slipping on the PyRosetta side. So to make future me, others, and ChatGTP5 happy, here are some common operations to make working with PyRosetta for RFdiffusion easier.
Continue readingExploring multilingual programming
Python is a prominent language in the ML and scientific computing space, and for good reason. Python is easy-to-learn and readable, and it offers a vast selection of libraries such as NumPy for numerical computation, Pandas for data manipulation, SciPy for scientific computing, TensorFlow, and PyTorch for deep learning, along with RDKit and Open Babel for cheminformatics. It is understandably an appealing choice for developers and researchers alike. However, a closer look at many common Python libraries reveals their foundations in C++.
Revisiting C++ Advantages
Many of Python libraries including TensorFlow, PyTorch, and RDKit are all heavily-reliant on C++. C++ allows developers to manage memory and CPU resources more effectively than Python, making it a good choice when handling large volumes of data at a fast pace. A previous post on this blog discusses C++’s speed, its utility in GPU programming through CUDA, and the complexities of managing its libraries. Despite the steeper learning curve and verbosity compared to Python, the performance benefits of C++ are undeniable, especially in contexts where execution speed and memory management are critical.
Rust: A New Contender for High-Performance Computing
Continue readingMounting a remote file system with SSHFS
If you’re working with data stored on a remote server, you might not want to (or even have the space to) copy data to your local file system when you work on it. Instead, we can use SSHFS to mount a remote file system via SSH, allowing us to read and write data on the remote file system without manually copying files.
Continue readingQuickly (and lazily) scale your data processing in Python
Do you use pandas for your data processing/wrangling? If you do, and your code involves any data-heavy steps such as data generation, exploding operations, featurization, etc, then it can quickly become inconvenient to test your code.
- Inconvenient compute times (>tens of minutes). Perhaps fine for a one-off, but over repeated test iterations your efficiency and focus will take a hit.
- Inconvenient memory usage. Perhaps your dataset is too large for memory, or loads in but then causes an OOM error during a mid-operation memory spike.
Mapping derivative compounds to parent hits
Whereas it is easy to say in a paper “Given the HT-Sequential-ITC results, 42 led to 113, a substituted decahydro-2,6-methanocyclopropa[f]indene”, it is frequently rather trickier algorithmically figure out which atoms map to which. In Fragmenstein, for the placement route, for example, a lot goes on behind the scenes, yet for some cases human provided mapping may be required. Here I discuss how to get the mapping from Fragmenstein and what goes on behind the scenes.
Continue readingConference Summary: MGMS Adaptive Immune Receptors Meeting 2024
On 5th April 2024, over 60 researchers braved the train strikes and gusty weather to gather at Lady Margaret Hall in Oxford and engage in a day full of scientific talks, posters and discussions on the topic of adaptive immune receptor (AIR) analysis!
Continue readingHow to write a review paper as a first year PhD student
As a first year PhD student, it is not an uncommon thing to be asked to write a review paper on your subject area. It is both a great way to get acquainted with your research field and to get the background portion of your thesis completed early. However, it can seem like a daunting task to go from knowing almost nothing about your research field to producing something of interest for experts who have spent years studying your subject matter.
In my first year, I was exactly in this position and I found very little online to help guide this process. Thus, here is my reflective look at writing a review paper that will hopefully help someone else in the future.
Continue readingHow can FemTech help close the gender health gap?
An excellent previous blog post from Sarah [1] describes the gender data gap and touches on the fact that women experience poorer healthcare outcomes. This arises from, amongst other things, the historical exclusion of women from clinical trials and this idea of the ‘male default’, where, for example, drug dosages and diagnostic thresholds are benchmarked against men, or even surgical instruments are designed to fit male hands [2]. I thought I would follow up on Sarah’s blog post and discuss how FemTech can help to close this gender health gap.
Continue reading