It is good practice not to commit a data folder to version control if the data is available elsewhere and you do not want to track changes of the data. But do not forget to also add an entry for this folder to .gitignore
because otherwise git iterates over all the files in the folder when checking for file changes, which may take a long time if there are many files.
Category Archives: Software & Services
Making your code pip installable
aka when to use a CutomBuildCommand or a CustomInstallCommand when building python packages with setup.py
Bioinformatics software is complicated, and often a little bit messy. Recently I found myself wading through a python package building quagmire and thought I could share something I learnt about when to use a custom build command and when to use a custom install command. I have also provided some information about how to copy executables to your package installation bin. **ChatGPT wrote the initial skeleton draft of this post, and I have corrected and edited.
Next time you need to create a pip installable package yourself, hopefully this can save you some time!
Continue readingMy take on the Collaborations Workshop (CW) 2024
At the end of April, I attended the CW 2024. This yearly hybrid event organised by the Software Sustainability Institute (SSI) has been running since 2011! The event brings people together to discuss best practices and the future of software in research. This year’s event themes were (1) AI/ML tools for Science, (2) Citizen Science and (3) Environmental sustainability.
As a Research Software Engineer (RSE) working with OPIG, I felt a great curiosity to attend and find out what I could bring of use to the group, as most people work on AI/ML applications. In this blog post, I share a few bits of the event which resonated with me and I found most interesting and relevant to share with my group.
Continue readingEnvironmentally sustainable computing
Did you know that it is approximated that you, a scientist, have a carbon footprint which is between 2 and 12 times higher than the set carbon budget per person to keep global warming below 1.5 °C [1]?
Background
Global temperatures are rising. This has direct effects on the planet and contributes to increasing humanitarian emergencies. These include more frequent and intense heatwaves, wildfires, and floods [2]. The impact of climate change is already severe, with around 20 million internal displaced persons in 2023 alone due to those disasters [3].
Global warming and climate change are caused by the emissions of carbon dioxide and methane, known as carbon emissions. There are different ways in which you could minimise your carbon footprint. For example, I try to reduce the energy usage in the house, try eating mainly plant-based, and travel by train instead of by plane to family and for holidays and conferences. However, up until organising a Green Lecture with the Department of Statistics Green Team I never thought of my computational PhD as a major contributor to my carbon footprint. That doesn’t mean the work I, and all other scientists, do is not important and necessary. But the lecture on principles for environmentally sustainable research given by Loic Lannelongue made me aware of carbon costs of computing, which I would like to share with you.
Continue readingWhat can you do with the OPIG Immunoinformatics Suite? v3.0
OPIG’s growing immunoinformatics team continues to develop and openly distribute a wide variety of databases and software packages for antibody/nanobody/T-cell receptor analysis. Below is a summary of all the latest updates (follows on from v1.0 and v2.0).
Continue reading9th Joint Sheffield Conference on Cheminformatics
Over the next few days, researchers from around the world will be gathering in Sheffield for the 9th Joint Sheffield Conference on Cheminformatics. As one of the organizers (wearing my Molecular Graphics and Modeling Society ‘hat’), I can say we have an exciting array of speakers and sessions:
- De Novo Design
- Open Science
- Chemical Space
- Physics-based Modelling
- Machine Learning
- Property Prediction
- Virtual Screening
- Case Studies
- Molecular Representations
It has traditionally taken place every three years, but despite the global pandemic it is returning this year, once again in person in the excellent conference facilities at The Edge. You can download the full programme in iCal format, and here is the conference calendar:
Continue readingLigands of CASF-2016
CASF-2016 is a commonly used benchmark for docking tools. Unfortunately, some of the provided ligand files cannot be loaded using RDKit (version 2022.09.1) but there is an easy remedy.
Continue readingSAbBox in 2023: ImmuneBuilder and more!
For several years now, we have distributed the SAbDab database and SAbPred tools as a virtual machine, SAbBox, via Oxford University Innovation. This virtual machine allows a user to utilise the tools and database locally, allowing for high-throughput analysis and keeping confidential data within a local network. Initially distributed under a commercial licence, the platform proved popular and, in 2020, we introduced a free academic licence to enable our academic colleagues to use our tools and database locally.
Following requests from users, in 2021 we released a new version of the platform packaged as a Singularity container. This included all of the features of SAbBox, allowing Linux users to take advantage of the near bare-metal performance of Singularity when running SAbPred tools. Over the past year, we have made lots of improvements to both SAbBox platforms, and have more work planned for the coming year. I’ll briefly outline these developments below.
Continue readingNaga101: A Guide to Getting Started with (OPIG) Slurm Servers
Over the past months, I’ve been working with a few new members of OPIG, which left me answering (and asking) lots of questions about working with Slurm. In this blog post, I will try to cover key, practical basics to interacting with servers that are set up on Slurm.
Over the past months, I’ve been working with a few new members of OPIG, which left me answering (and asking) lots of questions about working with Slurm. In this blog post, I will try to cover key, practical basics to interacting with servers that are set up on Slurm.
Slurm is a workload manager or job scheduler for Linux, meaning that it helps with allocating resources (eg CPUs and GPUs) on a server to users’ jobs.
To note, all of the commands and files shown here are run from a so-called ‘head’ node, from which you access Slurm servers.
1. Entering an interactive session
Unlike many other servers, you cannot access a Slurm server via ‘ssh’. Instead, you can enter an interactive (or ‘debug’) session – which, in OPIG, is limited to 30 minutes – via the srun command. This is incredibly useful for copying files, setting up environments and checking that your code runs.
srun -p servername-debug --pty --nodes=1 --ntasks-per-node=1 -t 00:30:00 --wait=0 /bin/bash
2. Submitting jobs
While the srun command is easy and helpful, many of the jobs we want to run on a server will take longer than the debug queue time limit. You can submit a job, which can then run for a longer (although typically still capped) time but is not interactive, via sbatch.
Continue reading5th Artificial Intelligence in Chemistry Symposium
The lineup for the Royal Society of Chemistry’s 5th “Artificial Intelligence in Chemistry” Symposium (Thursday-Friday, 1st-2nd September 2022) is now complete for both oral and poster presentations. It really is a fantastic selection of topics and speakers and it is clear this event is now a highlight of the scientific calendar. Our very own Prof. Charlotte M. Deane, MBE will be giving a keynote.
It marks a return to in-person meetings: it will be held at Churchill College, Cambridge, with a conference dinner at Trinity Hall.
More details are here: https://www.rscbmcs.org/events/aichem22/.
Registration for in person attendance is open until Monday 29th August 17:00 (BST).
It is also possible to register for virtual attendance; the meeting will be broadcast on Zoom.