Category Archives: Software & Services

LLM Coding Tools – An Overview

We’ve come a long way since GitHub Copilot first showed us what AI-assisted coding could look like. These days, there’s a whole ecosystem of LLM coding tools out there, each with their own strengths and approaches. In this blog, I’ll give you a quick overview to help you figure out which one might work best for your workflow.

Level 1: Interactive Code Assistance

Continue reading →

Diagnostics on the Cutting Edge, Software in the Stone Age: A Microbiology Story

The need to treat and control infectious diseases has challenged humanity for millennia, driving a series of remarkable advancements in diagnostic tools and techniques. One of the earliest known legal texts, the Code of Hammurabi, references the visual and tactile diagnosis of leprosy. For centuries, the distinct smell of infected wounds was used to identify gangrene, and in Ancient Greece and Rome, the balance of the four humors (blood, phlegm, black bile, and yellow bile) was a central theory in diagnosing infections.

The invention of the compound microscope in 1590 by Hans and Zacharias Janssen, and its refinements by Robert Hooke and Antonie van Leeuwenhoek, marked a turning point as it enabled the direct observation of microorganisms, thereby linking diseases to their microbial origins. Louis Pasteur’s introduction of liquid media aided Joseph Lister in identifying microbes as the source of surgical infections, whilst Robert Koch’s experiments with Bacillus anthracis firmly established the connection between specific microbes and diseases.

Continue reading →

Do not forget to add your data folder to .gitignore

It is good practice not to commit a data folder to version control if the data is available elsewhere and you do not want to track changes of the data. But do not forget to also add an entry for this folder to .gitignore because otherwise git iterates over all the files in the folder when checking for file changes, which may take a long time if there are many files.

Continue reading →

Making your code pip installable

aka when to use a CutomBuildCommand or a CustomInstallCommand when building python packages with setup.py

Bioinformatics software is complicated, and often a little bit messy. Recently I found myself wading through a python package building quagmire and thought I could share something I learnt about when to use a custom build command and when to use a custom install command. I have also provided some information about how to copy executables to your package installation bin. **ChatGPT wrote the initial skeleton draft of this post, and I have corrected and edited.

Next time you need to create a pip installable package yourself, hopefully this can save you some time!

Continue reading →

My take on the Collaborations Workshop (CW) 2024

At the end of April, I attended the CW 2024. This yearly hybrid event organised by the Software Sustainability Institute (SSI) has been running since 2011! The event brings people together to discuss best practices and the future of software in research. This year’s event themes were (1) AI/ML tools for Science, (2) Citizen Science and (3) Environmental sustainability.

As a Research Software Engineer (RSE) working with OPIG, I felt a great curiosity to attend and find out what I could bring of use to the group, as most people work on AI/ML applications. In this blog post, I share a few bits of the event which resonated with me and I found most interesting and relevant to share with my group.

Continue reading →

Environmentally sustainable computing

Did you know that it is approximated that you, a scientist, have a carbon footprint which is between 2 and 12 times higher than the set carbon budget per person to keep global warming below 1.5 °C [1]?

Background

Global temperatures are rising. This has direct effects on the planet and contributes to increasing humanitarian emergencies. These include more frequent and intense heatwaves, wildfires, and floods [2]. The impact of climate change is already severe, with around 20 million internal displaced persons in 2023 alone due to those disasters [3].

Global warming and climate change are caused by the emissions of carbon dioxide and methane, known as carbon emissions. There are different ways in which you could minimise your carbon footprint. For example, I try to reduce the energy usage in the house, try eating mainly plant-based, and travel by train instead of by plane to family and for holidays and conferences. However, up until organising a Green Lecture with the Department of Statistics Green Team I never thought of my computational PhD as a major contributor to my carbon footprint. That doesn’t mean the work I, and all other scientists, do is not important and necessary. But the lecture on principles for environmentally sustainable research given by Loic Lannelongue made me aware of carbon costs of computing, which I would like to share with you.

Continue reading →

What can you do with the OPIG Immunoinformatics Suite? v3.0

OPIG’s growing immunoinformatics team continues to develop and openly distribute a wide variety of databases and software packages for antibody/nanobody/T-cell receptor analysis. Below is a summary of all the latest updates (follows on from v1.0 and v2.0).

Continue reading →

9th Joint Sheffield Conference on Cheminformatics

Over the next few days, researchers from around the world will be gathering in Sheffield for the 9th Joint Sheffield Conference on Cheminformatics. As one of the organizers (wearing my Molecular Graphics and Modeling Society ‘hat’), I can say we have an exciting array of speakers and sessions:

De Novo Design
Open Science
Chemical Space
Physics-based Modelling
Machine Learning
Property Prediction
Virtual Screening
Case Studies
Molecular Representations

It has traditionally taken place every three years, but despite the global pandemic it is returning this year, once again in person in the excellent conference facilities at The Edge. You can download the full programme in iCal format, and here is the conference calendar:

Continue reading →

Ligands of CASF-2016

CASF-2016 is a commonly used benchmark for docking tools. Unfortunately, some of the provided ligand files cannot be loaded using RDKit (version 2022.09.1) but there is an easy remedy.

Continue reading →

SAbBox in 2023: ImmuneBuilder and more!

For several years now, we have distributed the SAbDab database and SAbPred tools as a virtual machine, SAbBox, via Oxford University Innovation. This virtual machine allows a user to utilise the tools and database locally, allowing for high-throughput analysis and keeping confidential data within a local network. Initially distributed under a commercial licence, the platform proved popular and, in 2020, we introduced a free academic licence to enable our academic colleagues to use our tools and database locally.

Following requests from users, in 2021 we released a new version of the platform packaged as a Singularity container. This included all of the features of SAbBox, allowing Linux users to take advantage of the near bare-metal performance of Singularity when running SAbPred tools. Over the past year, we have made lots of improvements to both SAbBox platforms, and have more work planned for the coming year. I’ll briefly outline these developments below.

Continue reading →

Oxford Protein Informatics Group

or "OPIG" to friends

Category Archives: Software & Services

LLM Coding Tools – An Overview

Level 1: Interactive Code Assistance

Diagnostics on the Cutting Edge, Software in the Stone Age: A Microbiology Story

Do not forget to add your data folder to .gitignore

Making your code pip installable

aka when to use a CutomBuildCommand or a CustomInstallCommand when building python packages with setup.py

My take on the Collaborations Workshop (CW) 2024

Environmentally sustainable computing

Background

What can you do with the OPIG Immunoinformatics Suite? v3.0

9th Joint Sheffield Conference on Cheminformatics

Ligands of CASF-2016

SAbBox in 2023: ImmuneBuilder and more!