CCK-18 is Going Virtual

We are going virtual! Our next Comp Chem Kitchen, CCK-18, will be via a Zoom Webinar, on Friday, March 27, 2020, at 5-6 pm. We are delighted to announce that Prof. Andreas Bender from the University of Cambridgewill be speaking, as well as Dr Vicky Hellon from F1000 Research. To attend the CCK-18 webinar, you must sign up for a free Eventbrite ticket (limit 100).

Converting Miles to Kilometres – An inefficient but neat method

Picture this: You’re a zealous acolyte of the metric system, with a rare affliction that makes multiplying decimal numbers impossible. You’re on holiday in the UK, where road signs give distances in miles. Heathens! How can you efficiently estimate the number of kilometres without multiplying by approximately 1.60934?

Continue reading

Visualisation of very large high-dimensional data sets as minimum spanning trees

Large high-dimensional data sets are frequently used in chemical and biological sciences. For example the ChEMBL database contain millions of bioactive molecules from the scientific literature and their associated biological assay data are usually used for drug discovery. Visualising such databases helps understand the structure of data.

Continue reading

Bayesian Optimization and Correlated Torsion Angles—in Small Molecules

Our collaborator, Prof. Geoff Hutchison from the University of Pittsburg recently took part in the Royal Society of Chemistry’s 2020 Twitter Poster Conference, to highlight the great work carried out by one of my DPhil students, Lucian Leung Chan, on the application of Bayesian optimization to conformer generation:

Lightning-fast Python code

Scientific code is never fast enough. We need the results of that simulation before that pressing deadline, or that meeting with our advisor. Computational resources are scarce, and competition for a spot in the computing nodes (cough, cough) can be tiresome. We need to squeeze every ounce of performance. And we need to do it with as little effort as possible.

Continue reading

Coronavirus

A zoonosis is an infectious disease that has jumped from a non-human animal to humans.

A painting by David S. Goodsell showing coronavirus in pink and purple. Secreted mucus (greenish threads) and antibodies (yellow/orange Y-shapes), and several small immune systems proteins (orange) from the lungs’ respiratory cells surround it. © 2020, David S. Goodsell.

The coronavirus disease 2019 (COVID-19) is one such zoonosis, and is caused by severe acute respiratory syndrome coronavirus 2 (SARS coronavirus 2, SARS-CoV-2, or 2019-nCoV). This is very similar to the SARS virus that emerged in 2003. Its recent emergence has resulted in a WHO-declared public health emergency of international concern.

Continue reading

Considering Containers? – Go for Singularity

Docker is an excellent containerisation system ideally suited to production servers.  It allows you to do one small thing but do it well.  For example, breaking a large blog up into individually maintained containers for a web-server, a database and (say) a wordpress instance. However due to inherent security woes, Docker doesn’t play nicely with multi-tenanted machines, the kind which are the bread and butter for researchers and HPC users.  That’s where Singularity steps in.   

Continue reading

Molecular dynamics analysis in MDAnalysis

Any opportunity to use rigorously tested and supported analysis tools rather than in-house code is, in my opinion, an opportunity you owe it to yourself to explore.

My preferred tool for analyzing the output of molecular dynamics (MD) simulations is MDAnalysis, a Python library that provides robust and easy-to-use tools for analyzing most common files output by MD packages (including PDB, DCD, COR, and XTC file formats). But, of course, MDAnalysis can analyze any PDB file, not just one output from an MD simulations. There may be an opportunity in your workflow to incorporate MDAnalysis to save time or to provide more robust error handling than whatever in-house code you currently use.

Continue reading

State of the art in AI for drug discovery: more wet-lab please

The reception of ML approaches for the drug discovery pipeline, especially when focused on the hit to lead optimization process, has been rather skeptical by the medchem community. One of the main drivers for that is the way many ML publications benchmark their models: Historic datasets are split into two parts, with the larger part used to train and the smaller to test ML models. In order to standardize that validation process, computational chemists have constructed widely used benchmark datasets such as the DUD-E set, which is commonly used as a standard for protein-ligand binding classification tasks. Common criticism from medicinal chemists centers on the main problem associated with benchmark datasets: the absence of direct lab validation.

Continue reading

Using SLURM a little bit more efficiently

Your research group slurmified their servers? You basically have two options now.

Either you install all your necessary things on one of the slurm nodes within an interactive session, e.g.:

srun -p funkyserver-debug --pty --nodes=1 --ntasks-per-node=1 -t 00:10:00 --wait=0 /bin/bash

and always specify this node by adding the ‘#SBATCH –nodelist=funkyserver.cpu.do.work’ line to your sbatch scripts or you set up some template scripts that will help you to install all your requirements on multiple nodes so you can enjoy the benefits of the slurm system.

Here is how I did it; comments and suggestions welcome!

Step 1: Create an sbatch template file (e.g. sbatch_job_on_server.template_sh) on the submission node that does what you want. In the ‘#SBATCH –partition’ or ‘–nodelist’ lines use a placeholder, e.g. ‘<server>’, instead of funkyserver. 

For example, for installing the same conda environment on all nodes that you want to work on:

Continue reading