Optimising Transformer Training

Training a large transformer model can be a multi-day, if not multi-week, ordeal. Especially if you’re using cloud compute, this can be a very expensive affair, not to mention the environmental impact. It’s therefore worth spending a couple days trying to optimise your training efficiency before embarking on a large scale training run. Here, I’ll run through three strategies you can take which (hopefully) shouldn’t degrade performance, while giving you some free speed. These strategies will also work for any other models using linear layers.

I wont go into too much of the technical detail of any of the techniques, but if you’d like to dig into any of them further I’d highly recommend the Nvidia Deep Learning Performance Guide.

Training With Mixed Precision

Training with mixed precision can be as simple as adding a few lines of code, depending on your deep learning framework. It also potentially provides the biggest boost to performance of any of these techniques. Training throughput can be increase by up to three-fold with little degradation in performance – and who doesn’t like free speed?

Continue reading

How to make ML your entire personality

In our silly little day-to-day lives in over in stats, we forget how accustomed we all are to AI being used in many of the things we do. Going home for the holidays, though, I was reminded that the majority of people (at least, the majority of my family members) don’t actually make most of their choices according to what a random, free AI tool suggests for them. Unfortunately, though, I do! Here are some of my favourite non-ChatGPT free tools I use to make sure everyone knows that working in ML is, in fact, my entire personality.

Continue reading

A Seq2Seq model for ETF forecasting

Owing to the misguided belief that I can achieve the impossible, I decided to build a model with the goal of beating the stock market.

Strap in, we’re about to get rich.

Machine learning is increasingly being employed by hedge funds to help mitigate risk and identify patterns and opportunities, whether this is for optimisation of algo trading strategies, fraud detection, high-frequency trading, or sentiment analysis. Arguably the most obvious, difficult, and naïve application of fintech ML is direct stock market forecasting – sounds like the perfect place to start.

Target

First things first, we need to decide on a stock to forecast. Volatility provides opportunities, but predictable volatility is even better. We need a security that swings in response to actual, reported events, and one whose trends roughly move somehow with other stocks – our hypothesis being that wider events in the market can be used to forecast a single security. SPDR GLD seems like a reasonable option – gold is such a popular hedge against global instability it’s price usually moves in the opposite direction to stocks such as DJIA or SP500 and moves with global disaster.

Gold price (/oz) in Pounds from 1980-2024

Continue reading

Tip and Tricks to correct a Cuda Toolkit installation in Conda

On the eastern side of Oxfordshire are the Cotswolds, a pleasant hill range with a curious etymology: the hills of the goddess Cuda (maybe, see footnote). Cuda is a powerful yet wrathful goddess, and to be in her good side it does feel like druidry. The first druidic test is getting software to work: the wild magic makes the rules of this test change continually. Therefore, I am writing a summary of what works as of Late 2023.

Continue reading

AlphaGeometry: are computers taking over math?

Last week, Google DeepMind announced AlphaGeometry, a novel deep learning system that is able to solve geometry problems of the kind presented at the International Mathematics Olympiad (IMO). The work is described in a recent Nature paper, and is accompanied by a GitHub repo including full code and weights.

This paper has caused quite a stir in some circles. Well, at least the kind of circles that you tend to get in close contact with when you work at a Department of Statistics. Like folks in structural in biology wondered three years ago, those who earn a living by veering into the mathematical void and crafting proofs, were wondering if their jobs may also have a close-by expiration date. I found this quite interesting, so I decided to read the paper and try to understand it — and, to motivate myself, I set to present this paper at an upcoming journal club, and also write this blog post.

So, let’s ask, what has actually been achieved and how powerful is this model?

What has been achieved

The image that has been making the rounds this time is the following benchmark:

Continue reading

The stuff MDAnalysis didn’t implement: CPU Parallel HOLE conductance analysis

Some time ago, I needed to find a way to computationally estimate conductance values for every protein frame from several molecular dynamics (MD) trajectories.

In a previous post, I wrote about how to clean the resulting instant conductance timeseries from outliers. But, I never described how I generated these timeseries.

In this post, I will show how you can parallelise the computation of instant conductance given an MD trajectory. I will touch on the difficulties of this process. And why I had to implement a custom tool for it given that MDAnalysis seems to already have implemented a routine of this sort. Finally, I will provide two Python scripts that you can easily adapt to run your parallel calculations – for which I’ll provide some important notes you don’t wanna skip.

Violin plots of conductance distributions from 64 molecular dynamic trajectories with 1000 frames each.
Continue reading

Taking Equivariance in deep learning for a spin?

I recently went to Sheh Zaidi‘s brilliant introduction to Equivariance and Spherical Harmonics and I thought it would be useful to cement my understanding of it with a practical example. In this blog post I’m going to start with serotonin in two coordinate frames, and build a small equivariant neural network that featurises it.

Continue reading

PhD as a mother

As a mother currently pursuing my doctorate, I often encounter the belief that higher education is not the ideal time for parenthood. In this post, I want to share my personal experience, offering a different perspective.

A year ago, I began my doctorate with a two-and-a-half-month-old baby. When I received the acceptance email from Oxford, I was thrilled – a dream come true. However, this raised a question: could I pursue this dream while pregnant? I believed in balancing motherhood and academic aspirations, and my advisor’s encouragement reinforced this belief. We, as a family, moved from Israel to England, adjusting to this new chapter.

It hasn’t been easy. Physically, post-pregnancy recovery and sleepless nights were tough. Emotionally, I constantly struggle with guilt over balancing academic and maternal responsibilities. If I focus on my daughter, I worry about neglecting my research; if I concentrate on my studies, I feel like a bad mother. The logistics of managing a household, especially when being the primary caregiver, added another layer of complexity. Motherhood often feels isolating, as not everyone around me can relate to my situation.

Yet, doctoral studies offered unexpected advantages. The flexibility allows me to align my work with my daughter’s schedule, often during nights or weekends. This means I can compensate for lost time without impacting others, unlike in a regular job. Interestingly, this flexibility leads to more time spent with my daughter than if I had a typical job. Moreover, the challenges of motherhood put academic obstacles into perspective. The best part of my day is always the hug from my daughter after a day of work.

As I keep moving forward with my PhD, here are some key tips that have helped me so far:

  1. Flexible Scheduling: Organize daily tasks, including household chores, within specific hours to enhance efficiency.
  2. Creating a Supportive Environment: Having a support system, be it your partner or friends, is crucial. Address practical issues early on, like daycare and babysitters, and don’t be shy to ask for help.
  3. Aligning Expectations with Your Supervisor: Communicate your limitations early to avoid misunderstandings.
  4. Practice Compassion: Acknowledge that you can’t do everything and be kind to yourself.

In the race of life, there never seems to be a “right” time for children. Whether it’s career progression or personal aspirations, the timing is always challenging. However, if you feel ready, that is the right time for you.

OPIGmas, 2023

Our annual, end-of-Michaelmas OPIG celebrations took place this at the start of December in the MCR (Middle Common Room) at Lady Margaret Hall.

OPIGmas is a much-anticipated combination of pot luck, Secret Santa, and party games.

Perhaps Jay’s megaphone topped the list of gag gifts…

Continue reading

On National AI strategies


Recently, I have become quite interested in how countries have been shaping their national AI strategies or frameworks. Since the launch of ChatGPT, several concerns have been raised about AI safety and how such groundbreaking AI technologies could augment or adversely affect our daily lives. To address the public’s concerns and set standards and practices for AI development, some countries have recently released their national AI frameworks. As a budding academic researcher in this space who is keen to make AI more useful for medicine and healthcare, there are two key aspects from the few frameworks I have looked at (specifically the US, UK and Singapore) that are of interest to me, namely, the multi-stakeholder approach and focus on AI education which I will delve further into in this post.

Continue reading