An recently wrote a post on how to use the seaborn library. I really like seaborn and use it a lot for 2D plots. However, recently I have been dealing with 3D data and have found plotly to be best. When used in a jupyter notebook, it allows you to easily generate 3D interactive plots. This is extremely useful to visualize structural data.
Monthly Archives: January 2021
Ribosome occupancy profiles are conserved between structurally and evolutionarily related yeast domains
Shameless plug for any OPIG blog readers to take a look at our recent publication in Bioinformatics. Consider giving it a read if the below summary grabs your attention.
Many proteins are now known to fold during their synthesis through the process known as co-translational folding. Translation is an inherently non-equilibrium process – one consequence of this fact is that the speed of translation can radically influence the ability of proteins to fold and function. In this paper we compare ribosome occupancy profiles between related domains in yeast to test the hypothesis that evolutionarily related proteins with similar native folds should tend to have similar translation speed profiles to preserve efficient co-translational folding. We find strong evidence in support of this hypothesis at the level of individual protein domains and across a set of 664 pairs of related domains for which we are able to compute high-quality ribosome occupancy profiles.
To find out more, view the Advance Article at Bioinformatics.
Seaborn 101
Seaborn is a Python-based data visualization library, which is based on matplotlib (https://seaborn.pydata.org/) . I would like to share some guidance/code to get started with drawing plots using this library! I will be using the dataset ‘flights’ from Seaborn (https://github.com/mwaskom/seaborn-data) to highlight an example.
Continue readingCalculating symmeterised small molecule RMSDs using graph automorphisms in python with GEMMI and NetworkX
When a ring flips, how do we calculate RMSD?
This surprisingly simple question leads to a very interesting problem! If we take a benzene molecule, say, and rotate it 180 degrees, then we have the exact same molecule, but if we have a data structure in which our atoms are labelled, and we apply the same transformation to the atomic positions, the numbering does not reflect that symmetry. If we were then naively to calculate the RMSD it would be huge, despite the fact that the molecule is, chemically speaking, identical.
How can we make our RMSD calculations reflect these symmetries?
Continue readingTracking machine learning projects with Weights & Biases
Optimising machine learning models requires extensive comparison of architectures and hyperparameter combinations. There are many frameworks that make logging and visualising performance metrics across model runs easier. I recently started using Weights & Biases. In the following, I give a brief overview over some basic code snippets for your machine learning python code to get started with this tool.
Continue readingMiniproteins – small but mighty!
Proteins come in all shapes and sizes, ranging from thousands of amino acids in length to less than 20. However, smaller size does not correlate with reduced importance. Miniproteins, which are commonly defined as being less than 100 amino acids long, are receiving increased attention for their potential roles as pharmaceuticals. A recent paper by David Baker’s group put miniproteins into the spotlight, as the study authors were able to design miniproteins that bind the SARS-CoV-2 spike protein with as strong affinity as an antibody would – but in a tiny fraction of the size (Cao et al., 2020). These miniproteins are much cheaper to manufacture than antibodies (as they can be expressed in bacteria) and can be highly stable (with melting temperatures of >90º possible, meaning they can easily be stored at room temperature). The most promising miniprotein developed by the Baker group (LCB1) is currently undergoing testing to be used as a prophylactic nasal spray that provides protection against SARS-CoV-2 infection. These promising results – and the speed in which progress was made – brings the vast potential of miniproteins in healthcare to the fore.
Continue readingMaking Pretty Pictures with PyMOL
There’s few things I like more in our field than the opportunity to make a really nice image of a protein structure. Don’t judge me, but I’ve been known to spend the occasional evening in front of the TV with a cup of tea and PyMOL open in front of me! I’ve presented on the subject at a couple of our research group retreats, and have wanted to type it up into a blog post for a while – and this is the last opportunity I will have, since I will be leaving in just a few weeks time, after nearly eight years (!) as an OPIGlet. So, here goes – my tips and tricks for making pretty pictures with PyMOL!
Ray Tracing
set ray_trace_mode, number
I always ray trace my images to make them higher quality. It can take a while for large proteins, but it’s always worth it! My favourite setting is 1, but 3 can be fun to make things a bit more cartoon-ish.
You can also improve the quality of the image by increasing the ‘surface_quality’ and ‘cartoon_sampling’ settings.
Fragment-to-Lead Successes in 2019
In this blogpost, I want to highlight the excellent work by Jahnke and collaborators. For the past 5 years, they have published an annual perspective covering fragment-to-lead success stories from the previous year. Very helpfully, their work includes a table detailing the hit fragment(s) and lead molecule, together with key experimental results and parameters.
Continue readingAIRR Community Meeting V – December 2020
We attended the virtual Adaptive Immune Receptor Repertoire (AIRR) Community Meeting in early December. The three day conference is usually held every 18 months and covered a range of research talks, software demonstrations and poster presentations on the latest TCR and BCR (antibody) research. While we missed certain elements that were present at the last AIRR community meeting (namely focaccia), it was a really interesting meeting with technology all running very smoothly.
Given our current research on SARS-CoV-2 antibodies, we particularly enjoyed the work presented by Armita Nourmohammad from the University of Washington on “Dynamics of BCR in Covid”, based on the preprint on medRxiv. The research identified 34 significantly expanded rare clonal lineages shared among patients with SARS-CoV-2, which are potential candidates for covid response. In particular, the analysis includes an assessment of whether an antibody sequence identified in different individuals (known as a shared or public sequence) is likely to be found due to inherent biases in antibody recombination. Shared antibody sequences which are calculated as unlikely to be shared are potentially a response to a shared exposure such as SARS-CoV2, rather than randomly found in the antibody repertoire. In this way, Nourmohammed and colleagues identified ‘rare’ antibodies which were identified in more individuals than would statistically be expected, and therefore might be worthy of further experimental analysis.
A theme common across a short talk and poster by Hadas Neuman (Bar-Ilan) and a poster by Kenneth Hoehn (Yale), was class-switching dynamics revealed by phylogenetic inference (from IgM to IgA in the human gut in the former, and IgE and IgG4 in a paediatric patient with peanut allergy in the latter). Kenneth Hoehn’s poster also looked at B-cell differentiation during HIV infection – this can all be read about in this preprint. The methods developed in the paper for discrete trait analysis of differentiation, isotype switching and B-cell migration are implemented in the new R package dowser (https://bitbucket.org/kleinstein/dowser) which is part of the Immcantation suite (http://immcantation.org).
It was also really nice to see evidence of the burgeoning use of single-cell sequencing for immune repertoire profiling, with posters by Igor Snapkov (UiO), Indu Khatri (Leiden University Medical Centre), Nick Borcherding (Washington University in St. Louis) all using single-cell technologies, and a talk by Ivelin Georgiev on LIBRA-seq.
If you missed the conference and have had your interest piqued, some of the conference talks are available at the AIRRC youtube channel.
We look forward to AIRRC6, Dec 7 – 11, 2021!
Sarah and Eve