Category Archives: How To

How to do things. doh.

How ChatGPT changed my writing as an ESL speaker

It’s not always easy to live in an Anglophone scientific world when English isn’t your first language. When careers are built upon the ability to communicate ideas clearly and eloquently, struggling to find the right words can be a real hindrance to explain your science in a way that is taken seriously. Contrary to popular belief, it’s not something you can simply “work” on. Often, it doesn’t matter how many books you’ve read, how many years of education you have, or how articulate you are in your original language — your brain will refuse to summon the right expression, or get stuck in a construction that a native speaker would never use. Struggling with a second language is very much a biological phenomenon.

The standard recommendation for ESL (English as a Second Language) speakers has long been to ask a native colleague to read through any text that needs to be published or submitted somewhere (such as an article or a grant application). Well-intentioned as this advice may be, there are multiple problems with it. Lingua franca or not, only 15% of the world population speaks English, of which only 5% are native speakers — meaning that for most scientists not working in Anglophone countries, the option is rarely available. Even when available, it is unreasonable to expect these colleagues to add charitable proof-reading to their workload simply because they happened to be born speaking a different language. But, most importantly, I have always felt — and I want to emphasize that I truly believe most people who issue this kind of advice to be well-intentioned — that the underliying message sounds too much like “you need vetting by a member of our select linguistic club if you want your ideas to be taken seriously“.

Continue reading →

BRICS Decomposition in 3D

Inspired by this blog post by the lovely Kate, I’ve been doing some BRICS decomposing of molecules myself. Like the structure-based goblin that I am, though, I’ve been applying it to 3D structures of molecules, rather than using the smiles approach she detailed. I thought it may be helpful to share the code snippets I’ve been using for this: unsurprisingly, it can also be done with RDKit!

I’ll use the same example as in the original blog post, propranolol.

*1DY4: CBH1 IN COMPLEX WITH S-PROPRANOLOL*

First, I import RDKit and load the ligand in question:

Continue reading →

The ultimate modulefile for conda

Environment modules is a great tool for high-performance computing as it is a modular system to quickly and painlessly enable preset configurations of environment variables, for example a user may be provided with modulefile for an antiquated version of a tool and a bleeding-edge alpha version of that same tool and they can easily load whichever they wish. In many clusters the modules are created with a tool called EasyBuild, which delivered an out-of-the-box installation. This works for things like a single binary, but for conda this severely falls short as there are many many configuration changes needed.

Continue reading →

LaTeX Beamer Template with Logos

Alternative Title: The tragic story of how I got trapped making slides with latex.

Typically after giving a presentation at least one person will approach me and ask if they could have access to my custom latex template to make slides with beamer that don’t look rubbish.

TL;DR Yes you can: https://github.com/npqst/latex-beamer-template

Continue reading →

Creating a Personal Website

Personal websites are a great and increasingly important way to build your online presence. Along with professional social media pages, such as on LinkedIn and Twitter, a website can provide a boost to your career and/or job search.

This blog post is based on my recent experience creating a personal website, following guidelines from Lewis’ talk at the OPIG Retreat last year (thank you Lewis!). The method I used and will cover here, based on an HTML5 UP! template and GitHub pages, is free and fast.

Why have a personal website?

Improves your online presence and brand
Boost for your career, including by allowing potential future employers to find you
Share things you have accomplished or are interested in

Continue reading →

Cleaning outliers in conductance timeseries from molecular dynamics

Have you ever had an annoying dataset that looks something like this?

or even worse, just several of them

In this blog post, I will introduce basic techniques you can use and implement with Python to identify and clean outliers. The objective will be to get something more eye-pleasing (and mostly less troublesome for further data analysis) like this

Continue reading →

Unreasonably faster notes, with command-line fuzzy search

A good note system should act like a second brain:

Accessible in seconds
Adding information should be frictionless
Searching should be exhaustive – if it’s there, you must find it

The benefits of such a note system are immense – never forget anything again! Search, perform the magic ritual of Copy Paste, and rejoice in the wisdom of your tried and tested past.

But how? Through the unreasonable effectiveness of interactive fuzzy search. This is how I have used Fuz, a terminal-based file fuzzy finder, for about 4 years.

Briefly, Fuz extracts all text within a directory using ripgrep, enables interactive fuzzy search with FZF, and returns you the selected item. As you type, the search results get narrowed down to a few matches. Files are opened at the exact line you found. And it’s FAST – 100,000 lines in half a second fast.

Using *Fuz* to quickly add a code-snippet in our note directory – then retrieving it with fuzzy-search. Here, on how to read FASTA files with Biopython, conveniently added to a file called biopython.py.

Continue reading →

Naga101: A Guide to Getting Started with (OPIG) Slurm Servers

Over the past months, I’ve been working with a few new members of OPIG, which left me answering (and asking) lots of questions about working with Slurm. In this blog post, I will try to cover key, practical basics to interacting with servers that are set up on Slurm.

Slurm is a workload manager or job scheduler for Linux, meaning that it helps with allocating resources (eg CPUs and GPUs) on a server to users’ jobs.

To note, all of the commands and files shown here are run from a so-called ‘head’ node, from which you access Slurm servers.

1. Entering an interactive session

Unlike many other servers, you cannot access a Slurm server via ‘ssh’. Instead, you can enter an interactive (or ‘debug’) session – which, in OPIG, is limited to 30 minutes – via the srun command. This is incredibly useful for copying files, setting up environments and checking that your code runs.

srun -p servername-debug --pty --nodes=1 --ntasks-per-node=1 -t 00:30:00 --wait=0 /bin/bash

2. Submitting jobs

While the srun command is easy and helpful, many of the jobs we want to run on a server will take longer than the debug queue time limit. You can submit a job, which can then run for a longer (although typically still capped) time but is not interactive, via sbatch.

Continue reading →

Code your own molecule sketcher in 4 easy steps!

Drawing molecules on your laptop usually requires access to proprietary software such as ChemDraw (link) or free websites such as PubChem’s online sketcher (link). However, if you are feeling adventurous, you can build your personal sketcher in React/Typescript using the Ketcher package!

Ketcher is an open-source package that allows easy implementation of a molecule sketcher into a web application. Unfortunately, it does require TypeScript so the script to run it cannot be imported directly into an HTML page. Therefore we will set up a simple React app to get it working.

The sketcher is very sleek and has a vast array of functionality, such as choosing any atom from the periodic table and being able to directly import molecules from either SMILES or Mol2/SDF file format into the sketcher. These molecules can then be edited and saved to a new file in the chemical file format of your choosing.

Continue reading →

How to build a Python dictionary of residues for each molecule in PyMOL

Sometimes it can be handy to work with multiple structures in PyMOL using Python.

Here’s a snippet of code you might find useful: we iterate over all the α-carbon atoms in a protein and append to a list tuples such as (‘GLY’, 1). The dictionary, ‘reslist’, returns a list of residue names and indices for each molecule, where the key is a string containing the name of the molecule.

from pymol import cmd

# Create a list of all the objects, called 'mpls':
mols = cmd.get_object_list('*')

# Create an empty dictionary that will return a list of residues
# given the name of the molecule object
reslist = {}

# Set the dictionaries to be empty lists
for m in mols:  reslist[m] = []

# Use PyMOL's iterate command to go over every α-Carbon and append 
# a tuple consisting of the each residue's residue name ('resn') and
# residue index ('resi '):
for m in mols:  cmd.iterate('%s and n. ca'%m, 'reslist["%s"].append((resn,int(resi)))'%m)

This script assumes you only have protein molecules loaded, and ignores things like chain ID and insertion codes.

Once you have your list of residues, you can use it with the cmd.align command, e.g., to align a particular residue to a reference structure.

Oxford Protein Informatics Group

or "OPIG" to friends

Category Archives: How To

How ChatGPT changed my writing as an ESL speaker

BRICS Decomposition in 3D

The ultimate modulefile for conda

LaTeX Beamer Template with Logos

Alternative Title: The tragic story of how I got trapped making slides with latex.

Creating a Personal Website

Cleaning outliers in conductance timeseries from molecular dynamics

Unreasonably faster notes, with command-line fuzzy search

Naga101: A Guide to Getting Started with (OPIG) Slurm Servers

Code your own molecule sketcher in 4 easy steps!

How to build a Python dictionary of residues for each molecule in PyMOL