Category Archives: Commentary

Why you should care about startups as a researcher

I was recently awarded the EIT Health Translational Fellowship, which aims to fund DPhil projects with the goal of commercializing the research and addressing the funding gap between research and seed funding. In order to win, I had to deliver a short 5 minute startup pitch in front of a panel of investors and scientific experts to convince them that my DPhil project has impact as well as commercial viability. Besides the £5000 price, the fellowship included a week-long training course on how to improve your pitch, address pain points in your business strategy etc. I found the whole experience to be incredibly rewarding and the skills I picked up very important, even as a researcher. As a summary, this is why I think you should care about the startup world as a researcher.

Continue reading

Chat bots and the Turing test

When I recently tested out the voice activation features on my phone, I was extremely impressed with how well it understood not only the actual words I was saying, but also the context. The last time I used voice control features was years ago when the technology was still in its infancy. There was only a specific set of commands the voice recognition software understood and most of them were hard-coded. Given the impressive advances we have made utilizing machine learning for voice recognition and natural language processing to the point where I can tell my phone: “Hey Google, can you give me a list of the best BBQ restaurants near me?” and it will actually understand and do it, it is interesting that we still struggle with a language based technology that has been around for ages: chatbots.

Continue reading

A new way of eating too much

Fresh off the pages of Therapeutic Advances in Endocrinology and Metabolism comes a warning no self-respecting sweet tooth should ignore.

“Liquorice is not just a candy,” write a team of ten from Chicago. “Life-threatening complications can occur with excess use.” Hold on to your teabags. Liquorice – the Marmite of sweets – is about to become a lot more sinister.

Continue reading

Oh lord, they comin’ – a diversity of units

For scientists, units are like money: a few people obsess about them, but the less you have to think about units, the better. And, like switching a bank account, changing your units is usually tiresome and complicated for little real advantage. But spare a thought for the many units that have been lost to the inexorable march of scientific advance, and for the few that are still in regular use.

Continue reading

Getting home: An ordeal of flight cancellations (& what they actually cost)

Last week, a sizeable flock of OPIGlets went to ISMB in Basel. Also last week, a storm and a radar tower problem over London Gatwick (LGW) and London Heathrow (LHR) led to four of those OPIGlets being stranded in Switzerland. This is a (somewhat accurate) timeline of their ordeal:

Continue reading

On the Virtues of the Command Line

Wind the clock back about 50 years, and you would have found the DSKY interface—with a display (DS) and keyboard (KY)—quite familiar. It was frontend to the guidance computer used on the Apollo missions, that ultimately allowed Neil Armstrong to utter that celebrated, “One small step for [a] man, one giant leap for mankind.” The device effectively used a command line.

Continue reading

Seeing the Mesoscale

There’s a range of scales that is really hard for us to see. Techniques like X-ray crystallography and increasingly, cryo-electron microscopy, let us see molecules to atomic level-of-detail. Microscopes reveal organelles in cells, but seeing the molecular ‘trees’ in the cellular ‘forest’ requires a synthesis of knowledge. David Goodsell was one of the first to show us the emergent beauty of the cell at the molecular level, and work carried out in the Molecular Graphics Laboratory at The Scripps Research Institute under the direction of Art Olson has led to a 3D molecular modeling tools like ePMVautoPACK and cellPACK.

One of the fruits of this labor is the Visual Guide to the Cell, part of the Allen Cell Explorer. It’s well worth a look at how you can explore 3D representations of the cell in a web browser.

The Protein World

This week’s issue of Nature has a wonderful “Insight” supplement titled, “The Protein World” (Vol. 537 No. 7620, pp 319-355). It begins with an editorial from Joshua Finkelstein, Alex Eccleston & Sadaf Shadan (Nature, 537: 319, doi:10.1038/537319a), and introduces four reviews, covering:

  • the computational de novo design of proteins that spontaneously fold and assemble into desired shapes (“The coming of age of de novo protein design“, by Po-Ssu Huang, Scott E. Boyken & David Baker, Nature, 537: 320–327, doi:10.1038/nature19946). Baker et al. point out that much of protein engineering until now has involved modifying naturally-occurring proteins, but assert, “it should now be possible to design new functional proteins from the ground up to tackle current challenges in biomedicine and nanotechnology”;
  • the cellular proteome is a dynamic structural and regulatory network that constantly adapts to the needs of the cell—and through genetic alterations, ranging from chromosome imbalance to oncogene activation, can become imbalanced due to changes in speed, fidelity and capacity of protein biogenesis and degradation systems. Understanding these complex systems can help us to develop better ways to treat diseases such as cancer (“Proteome complexity and the forces that drive proteome imbalance“, by J. Wade Harper & Eric J. Bennett, Nature, 537: 328–338, doi:10.1038/nature19947);
  • the new challenger to X-ray crystallography, the workhorse of structural biology: cryo-EM. Cryo-electron microscopy has undergone a renaissance in the last 5 years thanks to new detector technologies, and is starting to give us high-resolution structures and new insights about processes in the cell that are just not possible using other techniques (“Unravelling biological macromolecules with cryo-electron microscopy“, by Rafael Fernandez-Leiro & Sjors H. W. Scheres, Nature, 537: 339–346, doi:10.1038/nature19948); and
  • the growing role of mass spectrometry in unveiling the higher-order structures and composition, function, and control of the networks of proteins collectively known as the proteome. High resolution mass spectrometry is helping to illuminate and elucidate complex biological processes and phenotypes, to “catalogue the components of proteomes and their sites of post-translational modification, to identify networks of interacting proteins and to uncover alterations in the proteome that are associated with diseases” (“Mass-spectrometric exploration of proteome structure and function“, by Ruedi Aebersold & Matthias Mann, Nature, 537: 347–355, doi:10.1038/nature19949).

Baker points out that the majority of de novo designed proteins consist of a single, deep minimum energy state, and that we have a long way to go to mimic the subtleties of naturally-occurring proteins: things like allostery, signalling, and even recessed binding pockets for small moleculecules, functional sites, and hydrophobic binding interfaces present their own challenges. Only by increasing our understanding, developing better models and computational tools, will we be able to accomplish this.

A program to aid primary protein structure determination -1962 style.

This year, OPIG have been doing series of weekly lectures on papers we considered to be seminal in the field of protein informatics. I initially started looking at “Comprotein: A computer program to aid primary protein structure determination” as it was one of the earliest (1960s) papers discussing a computational method of discovering the primary structure of proteins. Many bioinformaticians use these well-formed, tidy, sterile arrays of amino acids as the input to their work, for example:

MGLSDGEWQL VLNVWGKVEA DIPGHGQEVL IRLFKGHPET LEKFDKFKHL KSEDEMKASE DLKKHGATVL TALGGILKKK GHHEAEIKPL AQSHATKHKI PVKYLEFISE CIIQVLQSKH PGDFGADAQG AMNKALELFR KDMASNYKEL GFQG
(For those of you playing at home, that’s myoglobin.)

As the OPIG crew come from a diverse background and frequently ask questions well beyond my area of expertise, if for nothing other than posterior-covering, I needed to do some background reading. Though I’m not a researcher by trade any more, I began to realise despite the lectures/classes/papers/seminars I’d been exposed to, regarding all the clever things you do with a sequence when you have it, I didn’t know how you would actually go from a bunch of cells expressing (amongst a myriad of other molecules) the protein you were interested in, to the neat array of characters shown above. So without further ado:

The first stage in obtaining your protein is: cell lysis and there’s not much in it for the cell.
Mangle your cell using chemicals, enzymes, sonification or a French press (not your coffee one).

The second stage is producing a crude extract by centrifuging the above cell-mangle. This, terrifyingly, appears to be done between 10,000G and 100,000G and removes the cellular debris leaving it as a pellet in the bottom of the container, with the supernatant containing little but a mix of the proteins which were present in the cytoplasm along with some additional macromolecules.

Stage three is to purify the crude extract. Depending on the properties of the protein you’re interested in, one or more of the following stages are required:

  • Reverse-phase chromatography to separate based on hydrophobicity
  • Ion-exchange to separate based on the charge of the proteins
  • Gel-filtration to separate based on the size of the proteins

If all of the above are preformed, whilst the sequence of these variously charged/size-sorted/polar proteins will still be still unknown, they will now be sorted into various substrates based upon their properties. This is where the the third stage departs from science and lands squarely in the realm of art. The detergents/protocols/chemicals/enzymes/temperatures/pressures of the above techniques all differ depending on the hydrophobicity/charge/animal source of the type of protein one is aiming to extract.

Since at this point we still don’t know their sequence, working out the concentrations of the various constituent amino acids will be useful. One of the simplest methods of determining the amino acid concentrations of a protein is follow a procedure similar to:

Heat the sample in 6M HCL at at a temperature of 110C for 18-24h (or more) to fully hydrolyse all the peptide bonds. This may require an extended period (over 72h) to hydrolyse peptide bonds which are known to be more stable, such as those involving valine, isoleucine and leucine. This however can degrade Ser/Thr/Tyr/Try/Gln and Cys which will subsequently skew results. An alternative is to raise the pressure in the vessel to allow temperatures of 145-155C to for 20-240 minutes.

TL;DR: Take the glassware that’s been lying about your lab since before you were born, put 6M hydrochloric acid in it and bring to the boil. Take one difficultly refined and still totally unknown protein and put it in your boiling hydrochloric acid. Seal the above glassware in order to use it as a pressure vessel. Retreat swiftly whilst the apparatus builds up the appropriate pressure and cleaves the protein as required. -What could go wrong?

At this point I wondered if the almost exponential growth in PDB entries was due to humanity’s herd of biochemists now having been thinned to those which remained simply being several generations worth of lucky.

Once you have an idea of how many of each type of amino acid comprise your protein, we can potentially rebuild it. However at this point it’s like we’ve got a jigsaw puzzle and though we’ve got all the pieces and each piece can only be one of a limited selection of colours (thus making it a combinatorial problem) we’ve no idea what the pattern on the box should be. To further complicate matters, since this isn’t being done on but a single copy of the protein at a time, it’s like someone has put multiple copies of the same jigsaw into the box.

Once we have all the pieces, to determine the actual sequence, a second technique needs to be used. Though invented in 1950, Edman degradation appears not to have been a particularly widespread protocol, or at least it wasn’t in the National Biomedical Research Foundation from which the above paper emerged. This means of degradation tags the N-terminal amino acid and cleaves it from the rest of the protein. This can then be identified and the protocol repeated. Whilst this would otherwise be ideal, it suffers from a few issues in that it takes about an hour per cycle, only works reliably on sequences of about 30 amino acids and doesn’t work at all for proteins which have their n-terminus bonded or buried.

Instead, the refined protein is cleaved into a number of fragments at known points using a single enzyme. For example, Trypsin will cleave on the carboxyl side of arginine and lysine residues. A second copy of the protein is then cleaved using a different enzyme at a different point. These individual fragments are then sorted as above and their individual (non-sequential) components determined.

For example, if we have a protein which has an initial sequence ABCDE
Which then gets cleaved by two different enzymes to give:
Enzyme 1 : (A, B, C) and (D, E)
Enzyme 2 : (A, B) and (C, D)

We can see that the (C, D) fragment produced by Enzyme 2 overlaps with the (A, B, C) and (D, E) fragments produced by Enzyme 1. However, as we don’t know the order in which the amino acid appear within in each fragment, thus there are a number of different sequences which can come to light:

Possibility 1 : A B C D E
Possibility 2 : B A C D E
Possibility 3 : E D C A B
Possibility 4 : E D C B A

At this point the paper comments that such a result highlights to the biochemist that the molecule requires further work for refinement. Sadly the above example whilst relatively simple doesn’t include the whole host of other issues which plague the biochemist in their search for an exact sequence.