Fail fast

While scrolling through my Instagram reels feed, I came across a reel of Jensen Huang, NVIDIA’s CEO, talking about the need to fail fast, which motivated me to write a post. ‘Fail fast’ is a recent piece of advice I have been hearing since I embarked on my PhD; fail fast on the research directions that we plan to pursue so that we can understand the difficulties and limitations of the research problems and methods used which will in turn give us more time to finetune our problem and develop more nuanced approaches. Since childhood, most of us have been taught that failures eventually lead to success and that persevering towards success is critical. However, one thing that I could not come to terms with is the narrative of several failures ‘magically’ leading to success. If you were destined to be successful, why would you even fail? And also, for every failure-to-success story we hear, there are many other stories of failure that we don’t.

Continue reading →

Making your figures more accessible

You might have created the most esthetic figures for your last presentation with a beautiful colour scheme, but have you considered how these might look to someone with colourblindness? Around 5% of the gerneral population suffer from some kind of color vision deficiency, so making your figures more accessible is actually quite important! There are a range of online tools that can help you create figures that look great to everyone.

Continue reading →

Plotext: The Matplotlib Lookalike That Breaks Free from X Servers

Imagine this: you’ve spent days computing intricate analyses, and now it’s time to bring your findings to life with a nice plot. You fire up your cluster job, scripts hum along, and… matplotlib throws an error, demanding an X server it can’t find. Frustration sets in. What a waste of computation! What happened? You just forgot to add the -X to your ssh command, or it may be just that X forwarding is not allowed in your cluster. So you will need to rerun your scripts, once you have modified them to generate a file that you can copy to your local machine rather than plotting it directly.

But wait! Plotext to the rescue! This Python package provides an interface nearly identical to matplotlib, allowing you to seamlessly transition your plotting code without sacrificing functionality. But why choose Plotext over the familiar matplotlib? The key lies in its text-based backend. This means it is just printing characters in your console to generate the plots, making it ideal for cluster environments where X servers are often absent or restricted. What do those plots look like? Here is an example:

Continue reading →

In defence of chaos

I commend you on your skepticism, but even the skeptical mind must be prepared to accept the unacceptable when there is no alternative. If it looks like a duck, and quacks like a duck, we have at least to consider the possibility that we have a small aquatic bird of the family Anatidæ on our hands.
Douglas Adams

It’s not every day that someone recommends a new whizzbang note-taking software. It’s every second day, or third if you’re lucky. They all have their bells and whistles: Obsidian turns your notes into a funky graph that pulses with information, the web of complexity of your stored knowledge entrapping your attention as you dazzle in its splendour while also the little circles jostle and bounce in decadent harmony. Notion’s aesthetic simplicity belies its comprehensive capabilities, from writing your notes so you don’t need to, to exporting to the web so that the rest of us can read what you didn’t write because you didn’t need to. To pronounce Microsoft OneNote requires only five syllables, efficiently cramming in two extra words while only being one bit slower to say than the mysterious rock competitor. Apple notes can be shared with all the other Apple people who live their happy Apple lives in happy Apple land – and sometimes this even works!

Continue reading →

Working with PDB Structures in Pandas

Pandas is one of my favourite data analysis tools working in Python! The data frames offer a lot of power and organization to any data analysis task. Here at OPIG we work with a lot of protein structure data coming from PDB files. In the following article I will go through an example of how I use pandas data frames to analyze PDB data.

Continue reading →

Navigating the world of GNN layers with PyTorch Geometric

Data can often naturally be represented in a graph format and being able to directly employ a deep learning architecture on that data without finding a different representation is an appealing idea. Graph neural networks (GNNs) have become a standard part of the ML toolbox but navigating the world of different architectures available out-of-the-box can be a daunting task. A great place to start looking for architectures is with PyTorch Geometric, which provides an extensive list of readily available GNN layers and tutorials on how to use them in your standard PyTorch models. There are many things to consider when choosing a GNN layer, but the two considerations that I think are a great place to start are expressiveness and edge feature handling. In general, it is hard to predict what will work best for the task at hand and hence it’s optimal to try a wide range of different layers. This blogpost is meant as a brief introduction for what I would find useful to know before I started using GNNs, and a starting point for exploring the GNN literature.

Continue reading →

250 Trips on the Oxford Tube: what I’ve learnt

The Oxford Tube is a bus service that shuttles people between Oxford and London taking approximately 1 hour and 30 minutes. I have now taken the bus over 250 times which is approximately 375 hours or a fortnight of my life.

In this time spent on the bus, I have discovered some tips and tricks that make the journey ever so slightly more bearable. I shall share them so that others can optimise their experience. Enjoy!

Continue reading →

Optimising Transformer Training

Training a large transformer model can be a multi-day, if not multi-week, ordeal. Especially if you’re using cloud compute, this can be a very expensive affair, not to mention the environmental impact. It’s therefore worth spending a couple days trying to optimise your training efficiency before embarking on a large scale training run. Here, I’ll run through three strategies you can take which (hopefully) shouldn’t degrade performance, while giving you some free speed. These strategies will also work for any other models using linear layers.

I wont go into too much of the technical detail of any of the techniques, but if you’d like to dig into any of them further I’d highly recommend the Nvidia Deep Learning Performance Guide.

Training With Mixed Precision

Training with mixed precision can be as simple as adding a few lines of code, depending on your deep learning framework. It also potentially provides the biggest boost to performance of any of these techniques. Training throughput can be increase by up to three-fold with little degradation in performance – and who doesn’t like free speed?

Continue reading →

How to make ML your entire personality

In our silly little day-to-day lives in over in stats, we forget how accustomed we all are to AI being used in many of the things we do. Going home for the holidays, though, I was reminded that the majority of people (at least, the majority of my family members) don’t actually make most of their choices according to what a random, free AI tool suggests for them. Unfortunately, though, I do! Here are some of my favourite non-ChatGPT free tools I use to make sure everyone knows that working in ML is, in fact, my entire personality.

Continue reading →

A Seq2Seq model for ETF forecasting

Owing to the misguided belief that I can achieve the impossible, I decided to build a model with the goal of beating the stock market.

Strap in, we’re about to get rich.

Machine learning is increasingly being employed by hedge funds to help mitigate risk and identify patterns and opportunities, whether this is for optimisation of algo trading strategies, fraud detection, high-frequency trading, or sentiment analysis. Arguably the most obvious, difficult, and naïve application of fintech ML is direct stock market forecasting – sounds like the perfect place to start.

Target

First things first, we need to decide on a stock to forecast. Volatility provides opportunities, but predictable volatility is even better. We need a security that swings in response to actual, reported events, and one whose trends roughly move somehow with other stocks – our hypothesis being that wider events in the market can be used to forecast a single security. SPDR GLD seems like a reasonable option – gold is such a popular hedge against global instability it’s price usually moves in the opposite direction to stocks such as DJIA or SP500 and moves with global disaster.

Gold price (/oz) in Pounds from 1980-2024

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Fail fast

Making your figures more accessible

Plotext: The Matplotlib Lookalike That Breaks Free from X Servers

In defence of chaos

Working with PDB Structures in Pandas

Navigating the world of GNN layers with PyTorch Geometric

250 Trips on the Oxford Tube: what I’ve learnt

Optimising Transformer Training

Training With Mixed Precision

How to make ML your entire personality

A Seq2Seq model for ETF forecasting

Target