Author Archives: Broncio Aguilar-Sanjuan

About Broncio Aguilar-Sanjuan

Research Software Engineer at Oxford Protein Informatics Group (OPIG) 🐷

Five-word stories about a world where AI dominates the world

“For sale: baby shoes, never worn.” ~ Ernest Hemingway??

This is a six-word story famously misattributed to Ernest Hemingway. According to Wikipedia, this story first appeared in 1906, when Hemingway was 7 years old, and later attributed to him in 1991, 30 years after his death. So, no chance it was his.

Regardless of its origin, I found this type of story very creative.

In this blog post, as the title says, I will dare to push the boundary to present 5-word stories on the topic of AI taking over the world, BUT with a humorous spin.

Continue reading →

My take on the Collaborations Workshop (CW) 2024

At the end of April, I attended the CW 2024. This yearly hybrid event organised by the Software Sustainability Institute (SSI) has been running since 2011! The event brings people together to discuss best practices and the future of software in research. This year’s event themes were (1) AI/ML tools for Science, (2) Citizen Science and (3) Environmental sustainability.

As a Research Software Engineer (RSE) working with OPIG, I felt a great curiosity to attend and find out what I could bring of use to the group, as most people work on AI/ML applications. In this blog post, I share a few bits of the event which resonated with me and I found most interesting and relevant to share with my group.

Continue reading →

The stuff MDAnalysis didn’t implement: CPU Parallel HOLE conductance analysis

Some time ago, I needed to find a way to computationally estimate conductance values for every protein frame from several molecular dynamics (MD) trajectories.

In a previous post, I wrote about how to clean the resulting instant conductance timeseries from outliers. But, I never described how I generated these timeseries.

In this post, I will show how you can parallelise the computation of instant conductance given an MD trajectory. I will touch on the difficulties of this process. And why I had to implement a custom tool for it given that MDAnalysis seems to already have implemented a routine of this sort. Finally, I will provide two Python scripts that you can easily adapt to run your parallel calculations – for which I’ll provide some important notes you don’t wanna skip.

Violin plots of conductance distributions from 64 molecular dynamic trajectories with 1000 frames each.

Continue reading →

What the heck are TPUs?

I recently became curious about TPUs, a specialised hardware for training Machine- and Deep-Learning models, where TPU stands for Tensor Processing Unit. This fancy chip can provide very high gains for anyone aiming to perform really massive parallelisation of AI tasks such as training, fine-tuning, and inference.

In this blog post, I will touch on what a TPU is, why it could be useful for AI applications when compared to GPUs and briefly discuss associated opportunity costs.

What’s a TPU?

Continue reading →

Writing a BLOPIG Post With ChatGPT: A Personal Take on Using AI for Assisted Writing

Disclaimer: I used ChatGPT to improve the writing style of this article, in combination with some personal curation before obtaining a final version.

You’ve probably heard it all already, from ChatGPT writing code and doing proofreading for you to a rap battle between OPIG’s Antibodies and Small Molecules groups, and more.

Whether you like it or not, ChaGPT has unleashed people’s creative side regarding applications and attempts to find shortcuts. Questionable? Absolutely!

In this BLOPIG post, I show how I used ChatGPT to easily write a post summarising some material of my own intellectual property, which I presented as part of my group meeting talk. Mainly, I list some personal thoughts on the ethical concerns around using ChatGPT to assist your writing.

To start off, I passed on content from my own publication draft to ChatGPT, asking to generate a blog post in plain English for BLOPIG. The outcome:

Not bad.

But, it made me realise a number of things:

With great power comes great responsibility [Uncle Ben – Spiderman].
You are responsible for the ethics that go into using ChatGPT. Are you faking expertise? Are you being actually lazy or just being efficient? Think twice (or many more times) if you’re doing the right thing.
It can significantly reduce the number of writing iterations but don’t take it at face value.
Can you actually trust the plain output? No.
Never take its output as the ground truth, as Large Language Models such as ChatGPT often produce biased writing outputs.
Keep in mind that whatever you produce as a scientist will be picked up by others, and prone to drive misinformation, if incorrect. It is OK to reduce mechanical iterations, but it’s NOT OK to skip quality control.
Be open about it.
You don’t want to set the wrong example for your colleagues. So, mention if you use it, how you used it, and it is fine to encourage efficiency, but not incentivising a culture of scientific misconduct and plagiarism. Don’t skip the step of producing quality ideas on your own. This is such a concern that publishers like Elsevier have already reacted by publishing guidelines contemplating this possibility. While Nature Springer is working on ways to spot AI-generated outputs.

The bottom line

What are the dos and don’ts of using ChatGPT?

Yes, use it to have fun. Yes, use it to proofread or polish your writing. Yes, use it to summarise your own ideas. No, don’t use it to do the analysis and interpretation of your results. No, don’t copy and paste its direct output into your publication. No, don’t hide that you used it. Finally, NO, you can’t add ChatGPT as a contributing author!

Cleaning outliers in conductance timeseries from molecular dynamics

Have you ever had an annoying dataset that looks something like this?

or even worse, just several of them

In this blog post, I will introduce basic techniques you can use and implement with Python to identify and clean outliers. The objective will be to get something more eye-pleasing (and mostly less troublesome for further data analysis) like this

Continue reading →

How to make your own singularity container zero fuss!

In this blog post, I’ll show you guys how to make your own shiny container for your tool! Zero fuss(*) and in FOUR simple steps.

As an example, I will show how to make a singularity container for one of our public tools, ANARCI, the antibody numbering tool everyone in OPIG and external users are familiar with – If not, check the web app and the GitHub repo here and here.

(*) Provided you have your own Linux machine with sudo permissions, otherwise, you can’t do it – sorry. Same if you have a Mac or Windows – sorry again.
BUT, there are workarounds for these cases such as using the remote singularity builder here, for which you only need to sign up and create an account, and the use of Virtual Machines (VMs), as described here.

Continue reading →

The SARS-CoV-2 protein spike glycosylation not only shields but primes binding by providing structural stability too

Yep, it is very well known that the sugar coating (aka glycosylation) of viruses makes them invisible to the immune system, a strategy so effective that like in the case of HIV, whose spike is almost entirely covered by glycans, makes it so difficult to target by the human immune system.

Unsurprisingly, coronaviruses such as SARS, MERS, and SARS-CoV-1(2) not only benefit from this evolutionary strategy but there is evidence now that sugars provide stability to their spikes to be effective binders by glueing the spike chains, hence making them infectious.

This is the major finding of this paper that introduces very interesting results from all-atom MD simulations of a fully glycosylated model of the SARS-CoV-2 spike protein embedded in a realistic viral membrane. Researchers aimed to look into the stability of the protein spike (A, B, and C) chains in the “open” and “closed” conformation and how these changed upon key residue mutations to test how glycans sitting in the inter-chain space affect stability. It also aimed at quantifying glycans’ shielding effect from molecules ranging from 2 to 15 Angstroms, i.e., from small-sized to peptide- and antibody-sized molecules.

Continue reading →