Throughout my PhD I’ve needed nice PyMOL visualizations, but struggled to quickly and easily make the pictures I wanted. I’ve used Claire Marks‘ blopig post, Making Pretty Pictures in PyMOL, many times and wanted to expand it with what I’ve learned to make satisfying visualizations quickly!
Continue readingCategory Archives: How To
Controlling PyMol from afar
Do you keep downloading .pdb
and .sdf
files and loading them into PyMol repeatedly?
If yes, then PyMol remote might be just for you. With PyMol remote, you can control a PyMol session running on your laptop from any other machine. For example, from a Jupyter Notebook running on your HPC cluster.
Continue readingCross referencing across LaTeX documents in one project
A common scenario we come across is that we have a main manuscript document and a supplementary information document, each of which have their own sections, tables and figures. The question then becomes – how do we effectively cross-reference between the documents without having to tediously count all the numbers ourselves every time we make a change and recompile the documents?
The answer: cross referencing!
Continue readingUsing Jupyter on a remote slurm cluster
Suppose we need to do some interactive analysis in a Jupyter notebook, but our local machine lacks the power. We have access to a slurm cluster, but we can’t SSH from the head node to the worker node; we can only SSH from the worker node to the head node. Can we still interact with a Jupyter notebook running on the worker node? As it happens, the answer is “yes” – we just need to do some reverse SSH tunnelling.
Continue readingNavigating Hallucinations in Large Language Models: A Simple Guide
AI is moving fast, and large language models (LLMs) are at the centre of it all, doing everything from generating coherent, human-like text to tackling complex coding challenges. And this is just scratching the surface—LLMs are popping up everywhere, and their list of talents keeps growing by the day.
However, these models aren’t infallible. One of their most intriguing and concerning quirks is the phenomenon known as “hallucination” – instances where the AI confidently produces information that is fabricated or factually incorrect. As we increasingly rely on AI-powered systems in our daily lives, understanding what hallucinations are is crucial. This post briefly explores LLM hallucinations, exploring what they are, why they occur, and how we can navigate them and get the most out of our new favourite tools.
Continue readingTesting python (or any!) command line applications
Through our work in OPIG, many of our projects come in the form of code bases written in Python. These can be many different things like databases, machine learning models, and other software tools. Often, the user interface for these tools is developed as both a web app and a command line application. Here, I will discuss one of my favourite tools for testing command-line applications: prysk!
Continue readingThe XChem trove of protein–small-molecules structures not in the PDB
The XChem facility at Diamond Light Source is truly impressive feat of automation in fragment-based drug discovery, where visitors comes clutching a styrofoam ice box teeming with apo-form protein crystals, which the shifter soaks with compounds from one or more fragment libraries and a robot at the i04-1 beamline kindly processes each of the thousands of crystal-laden pins, while the visitor enjoys the excellent food in the Diamond canteen (R22). I would especially recommend the jambalaya. Following data collection, the magic of data processing happens: the PanDDA method is used to find partial occupancy in the density, which is processed semi-automatedly and most open targets are uploaded in the Fragalysis web app allowing the ligand binding to be studied and further compounds elaborated. This collection of targets bound to hundreds of small molecules is a true treasure trove of data as many have yet to be deposited in the PDB, making it a perfect test set for algorithm design: fragments are notorious fickle to model and deep learning models cannot cheat by remembering these from the protein database.
Continue readingMDAnalysis: Work with dynamics trajectories of proteins
For a long time crystallographers and subsequently the authors of AlphaFold2 had you believe that proteins are a static group of atoms written to a .pdb file. Turns out this was a HOAX. If you don’t want to miss out on the latest trend of working with dynamic structural ensembles of proteins this blog post is exactly right for you. MDAnalysis is a python package which as the name says was designed to analyse molecular dyanmics simulation and lets you work with trajectories of protein structures easily.
Continue readingMaking your code pip installable
aka when to use a CutomBuildCommand or a CustomInstallCommand when building python packages with setup.py
Bioinformatics software is complicated, and often a little bit messy. Recently I found myself wading through a python package building quagmire and thought I could share something I learnt about when to use a custom build command and when to use a custom install command. I have also provided some information about how to copy executables to your package installation bin. **ChatGPT wrote the initial skeleton draft of this post, and I have corrected and edited.
Next time you need to create a pip installable package yourself, hopefully this can save you some time!
Continue readingConverting or renaming files, whilst still maintaining the directory structure
For various reasons we might need to convert files from one format to another, for instance from lossless FLAC to MP3. For example:
ffmpeg -i lossless-audio.flac -acodec libmp3lame -ab 128k compressed-audio.mp3
This could be any conversion, but it implies that the input file and the output file are in the same directory. What if we have a carefully curated directory structure and we want to convert (or rename) every file within that structure?
find . -name “*.whateveryouneed” -exec somecommand {} \; is the tool for you.
Continue reading