Category Archives: Journal Club

How to write a review paper as a first year PhD student

As a first year PhD student, it is not an uncommon thing to be asked to write a review paper on your subject area. It is both a great way to get acquainted with your research field and to get the background portion of your thesis completed early. However, it can seem like a daunting task to go from knowing almost nothing about your research field to producing something of interest for experts who have spent years studying your subject matter.

In my first year, I was exactly in this position and I found very little online to help guide this process. Thus, here is my reflective look at writing a review paper that will hopefully help someone else in the future.

Continue reading

The evolution, evolvability and engineering of gene regulatory DNA

Catching up on the literature is one of the highlights of my job as a scientist. True, sometimes you can be overwhelmed by the amount of information you don’t have; or wonder if we really need another paper showing that protein-ligand scoring functions don’t work. And yet, sometimes you find excellent research that you can’t but regard with a mixture of awe and envy. At a recent group meeting, I discussed one such paper from the research group of Aviv Regev at MIT, where the authors perform an impressive combination of computation and experiment to consider some basic questions in gene regulation and evolution. Here is why I think it’s excellent.

The authors are interested in promoters, small sequences of DNA that precede genes, which are known to regulate how frequently their partners will be expressed. In short, these promoters are binding sites for transcription factors, a family of proteins that in turn recruit RNA polymerase to transcribe DNA to RNA. In turn, albeit not directly, the rate of gene transcription determines the rate at which a protein is produced. If this sounds simple, however, that is where our understanding stops. The human genome encodes some 1.6k different transcription factors (~6-7% of protein-coding genes) and their underworkings are still not well-understood.

Continue reading

A handful of lesser known python libraries

There are more python libraries than you can shake a stick at, but here are a handful that don’t get much love and may save you some brain power, compute time or both.

Fire is a library which turns your normal python functions into command-line utilities without requiring more than a couple of additional lines of copy-and-paste code. Being able to immediately access your functions from the command line is amazingly helpful when you’re making quick and dirty utilities and saves needing to reach for the nuclear approach of using getopt.

Continue reading

The Smallest Allosteric System

Allostery is still a badly understood but very general mechanism in the protein world. In principle, an allosteric event occurs when a ligand (small or big) binds to a certain site of a protein and something (activity or function) changes at a different, distant site. A well-known example would be G-protein-coupled receptors that transport such an allosteric signal even across a membrane. But it does not have to be that far apart. As part of the Protein Folding and Dynamics series, I have recently watched a talk by Peter Hamm (Zurich) who presented work on an allosteric system that I thought was very interesting because it was small and most importantly, controllable.

PDZ domains are peptide-binding domains, often part of multi-domain proteins. For the work presented the researchers used the PDZ3 domain which is a bit special and has an additional (third) C-terminal α-helix (α3-helix) which is packing to the other side of the binding pocket. Previous work (Petit et al. 2009) had shown that removal of the α3-helix had changed ligand affinity but not PDZ structure, major changes were of an entropic nature instead. Peter Hamm’s group linked an azobenzene-derived photoswitch to that α3-helix; in its cis configuration stabilizing the α3-helix and destabilising in trans (see Figure 1).

Figure 1: PDZ3 domain (purple) and photoswitch (red) have different affinities for the peptide ligand (green), depending on the photoswitch’s isomerisation state (and temperature). From Bozovic, O., Jankovic, B. & Hamm, P. Sensing the allosteric force. Nat Commun 11, 5841 (2020). https://doi.org/10.1038/s41467-020-19689-7
Continue reading

C is for Cysteines (plus a fun quiz)

At group meeting a few weeks ago I presented this paper, “Landscape of Non-canonical Cysteines in Human VH Repertoire Revealed by Immunogenetic Analysis“, from Prabakaran and Chowdhury. The paper is an investigation of the frequency, location and patterns of cysteines contained in human antibody sequences. Cysteines are important amino acids found in proteins, including antibodies, which can form disulphide bonds with other cysteines due to the presence of their reactive sulfhydryl group in the side chain.

Continue reading

Learning from Biased Datasets

Both the beauty and the downfall of learning-based methods is that the data used for training will largely determine the quality of any model or system.

While there have been numerous algorithmic advances in recent years, the most successful applications of machine learning have been in areas where either (i) you can generate your own data in a fully understood environment (e.g. AlphaGo/AlphaZero), or (ii) data is so abundant that you’re essentially training on “everything” (e.g. GPT2/3, CNNs trained on ImageNet).

This covers only a narrow range of applications, with most data not falling into one of these two categories. Unfortunately, when this is true (and even sometimes when you are in one of those rare cases) your data is almost certainly biased – you just may or may not know it.

Continue reading