Monthly Archives: April 2018

My experience with (semi-)automating organic synthesis in the lab

After three years of not touching a single bit of glassware, I have recently donned on the white coat and stepped back into the Chemistry lab. I am doing this for my PhD project to make some of the follow-up compounds that my pipeline suggests. However, this time there is a slight difference – I am doing reactions with the aid of a liquid handler robot, the Opentrons. This is the first encounter that I have with (semi-)automated synthesis and definitely a very exciting opportunity! (Thanks to my industrial sponsor, Diamond Light Source!)

A picture of the Opentrons machine I have been using to do some organic reactions. Picture taken from https://opentrons.com/robots.

Opentrons is primarily used by biologists and their goal is to make a platform to easily share protocols and reproduce each other’s work (I think we can all agree how nice this would be!). They provide a very easy to use API, wishing it to be accessible to any bench scientist with basic computer skills. From my experience so far, this has been the case as I found it extremely easy to pick up and write my own protocols for chemical reactions. Here is the command that will: (1) pick up a new pipette tip; (2) transfer a volume from source1 to destination1; (3) drop the pipette tip in the trash; (4) pick up a new pipette tip; (5) transfer a volume from source2 to destination2; (5) drop the pipette tip in the trash.

pipette.transfer(volume, [source1, source2], [destination1, destination2], new_tip=’always')

But of course not everything is plain sailing – there are many challenges you will encounter by using an automated pipette. The robot is a liquid handler – it cannot handle solids so either the solids need to be pre-weighed and/or made into solution beforehand. Further difficulties lie within the properties of the solvent it is handling, for example:

  • Dripping – low boiling point solvents tend to drip more.
  • Viscosity of liquids causes issues with not drawing up the correct amount of liquid – more viscous liquids require longer times to aspirate and if aspiration is too quick then air pockets may be drawn up.

Here is a GIF I made of a dry run I was doing with the robot (sorry for the slight shake, this was recorded on my phone in the lab… See their website for professional footage of the robot!)

My (shaky) footage of a dry run I was performing with the Opentrons.

The Curious Case Of A Human Chimera

In my role as a PhD student in the OPIG group, I integrate and analyse data from various biological, chemical and data sources. As I am interested in the intersection between chemistry, biology and daily life, it seems suitable that my next BLOPIG posts will discuss and highlight how biological phenomena have either influenced law or history.

Connection between Law and Biology – The Curious Case Of A Human Chimera
Our scene opens in a dark lab, where a scientist injects himself with an unknown substance. The voice over notes that they created a monster named “Chimera” while searching for their hero “Bellerophon”.  This scene is the famous opening scene of the movie “Mission Impossible II” , where we are introduced to the dangerous bioweapon “Chimera”, a combination of multiple diseases. As “Chimera” is a mythological beast from Ancient Greek mythology, with a lion’s head, a goat’s body, and a serpent’s tail, the naming of this bioweapon seems appropriate.

What does this dangerous mixture of multiple diseases, an ancient mythological monster and the promised connection between law and biology have in common?

Apart from a really bad joke, the term “Chimera” is an actual term in biology to describe a biological entity of multiple diverse components, e.g. a human organism, whose cells are composed of distinct genotypes.
In case of tetragametic chimerism, human chimeras thus possess forty-six chromosome pairs instead of the “usual” set of twenty-six chromosome pairs, and as such, their organs and tissues are constructed according to the DNA outlined in the respective organ or tissue.
Tetragametic chimerism occurs by the fertilization of two ova by two spermatozoa, which develop into zygotes. These zygotes then subsequently fuse into one organism, which continues to develop into an organism with two sets of DNA.1-2

But how did such a biological phenomenon like a chimera enter the court of law?

The Romans famously defined that the mother of a child is the one who gives birth to it (Mater sempre certa est, which can be translated as “The mother is always certain”).  I would like to point out that in the times of in-vitro fertilization, this principle is no longer viable, since a child can now have both a genetic mother and a birth mother.3
This Principle was disproved in 2002, when Lydia Fairchild applied to receive Welfare for her two children and her third, unborn child, from the US State. Paternity tests were conducted on all children to prove her ex-partner’s paternity. While the tests proved the paternity of the father without a doubt, Lydia was shown to be no genetic match to her children.

Accused of being a “wellfare fraud” or a surrogate, the judge ordered that Lydia Fairchild had to give birth to her third child in front of witnesses. Immediately blood samples were taken, which revealed that Lydia Fairchild also did not share DNA with this child, despite giving birth to it. Now accused of being a surrogate, Lydia’s case looked dire.
Fortunately, Lydia’s lawyer read a journal article about a similar case involving a woman named Kareen Keegan.2, 4-5 Karen, a 52-year old woman, had renal failure. As she needed a kidney replacement, Karen’s sons underwent the histocompability process to test for donation.Yet the genetic tests showed that only one of her three sons was related to her.1 Material from her entire body was tested for genetic matches to her sons’ DNA, but only genetic material of her thyroid matched her sons.2
Ultimately, the researchers concluded that Karen was a tetragametic chimera, born of the fusion of her zygote and her twin sibling in her mother’s womb. As Dr. Lynne Uhl, a pathologist and doctor of transfusion medicine at Beth Israel Deaconess Medical Center in Boston, said:
“In her blood, she was one person, but in other tissues, she had evidence of being a fusion of two individuals.”6

Subsequently, scientists collected Lydia’s cell material from various body parts and tested for a genetic match with her children. The DNA from her cervical smear was found to be a match, while the DNA collected from her skin and hair was not. Additionally, DNA samples from Lydia’s mother matched her childrens’ DNA. 4-5

Interestingly, while both Lydia and Karen were carrying two sets of DNA as a result of prenatal fusions with their twins, they didn’t show any phenotypic sign of being a chimera, e.g. different skin types or the so-called Blaschko lines.7-8

 

  1. https://www.scientificamerican.com/article/3-human-chimeras-that-already-exist/
  2. To, E. & Report, C. LEADING TO IDENTIFICATION OF TETRAGAMETIC CHIMERISM. 346, (2002).
  3. https://en.wikipedia.org/wiki/Mater_semper_certa_est
  4. https://pictorial.jezebel.com/one-person-two-sets-of-dna-the-strange-case-of-the-hu-1689290862
  5. https://web.archive.org/web/20140301211020/http://www.essentialbaby.com.au/life-style/nutrition-and-wellbeing/when-your-unborn-twin-is-your-childrens-mother-20140203-31woi.html
  6. http://abcnews.go.com/Primetime/shes-twin/story?id=2315693
  7. https://jamanetwork.com/journals/jamadermatology/fullarticle/419529
  8. http://biologicalexceptions.blogspot.co.uk/2015/09/when-youre-not-just-yourself.html

All links were last viewed on the 24.04.2018.

My next blog post: Can a mismatch in maternal DNA threaten a government? How Biology can Influence History.

Measuring correlation

Correlation is defined as how close two variables are to having a dependence relationship with each other. At first sight, it looks kind of simple, but there are two main problems:

  1. Despite the obvious situations (i.e. correlation = 1), it is difficult to say whether 2 variables are correlated or not (i.e correlation = 0.7). For instance, would you be able to say if the variables X and Y from the following to plots are correlated?
  2. There are different ways of measure of correlation that may not agree when comparing different distributions. As an example, which plot shows a higher correlation? The answer will depend on how you do measure the correlation since if you use Pearson correlation, you would pick A whereas if you choose Spearman correlation you will take B

Here, I will explain some of the different correlation measures you can use:

Pearson product-moment correlation coefficient

  • What does it measure? Only linear dependencies between the variables.
  • How it is obtained? By dividing the covariance of the two variables by the product of their standard deviations. (It is defined only if both of the standard deviations are finite and nonzero). \rho _{X,Y}={\frac {\operatorname {cov} (X,Y)}{\sigma _{X}\sigma _{Y}}}
  • Properties:
  1. ρ (X,Y) = +1 : perfect direct (increasing) linear relationship (correlation).
  2. ρ (X,Y) = -1 : perfect decreasing (inverse) linear relationship (anticorrelation).
  3. In all other cases, ρ (X,Y) indicates the degree of linear dependence between the variables. As it approaches zero there is less of a relationship (closer to uncorrelated).
  4. Only gives a perfect value when X and Y are related by a linear function.
  • When is it useful? For the case of a linear model with a single independent variable, the coefficient of determination (R squared) is the square of r, Pearson’s product-moment coefficient.

 

Spearman’s rank correlation coefficient:

  • What does it measure? How well the relationship between two variables can be described using a monotonic function (a function that only goes up or only goes down).
  • How it is obtained? Pearson correlation between the rank values of the two variables.

{\displaystyle r_{s}=\rho _{\operatorname {rg} _{X},\operatorname {rg} _{Y}}={\frac {\operatorname {cov} (\operatorname {rg} _{X},\operatorname {rg} _{Y})}{\sigma _{\operatorname {rg} _{X}}\sigma _{\operatorname {rg} _{Y}}}}}

Only if all n ranks are distinct integers, it can be computed using the popular formula.

{\displaystyle r_{s}={1-{\frac {6\sum d_{i}^{2}}{n(n^{2}-1)}}}.}

Where di is the difference between the two ranks of each observation.

  • Properties:
  1. rs (X,Y) = +1:  X and Y are related by any increasing monotonic function.
  2. rs (X,Y) = -1:  X and Y are related by any decreasing monotonic function.
  3. The Spearman correlation increases in magnitude as X and Y become closer to being perfect monotone functions of each other.
  • When is it useful? It is appropriate for both continuous and discrete ordinal variables. It can be use for looking for non-linear dependence relationships.

Kendall’s tau coefficient

  • What does it measure? The ordinal association between two measured quantities.
  • How it is obtained?

{\displaystyle \tau ={\frac {({\text{number of concordant pairs}})-({\text{number of discordant pairs}})}{n(n-1)/2}}.}

Any pair of observations (xi , yi)  and (xj, yj) are said to be concordant if the ranks for both elements agree. That happens if xi-xj and yi-xj have the same sign. If their sign are different, they are considered as discordant pairs

  • Properties:
  1. τ (X,Y) = +1: The agreement between the two rankings is perfect (i.e., the two rankings are the same)
  2. τ (X,Y) = -1: The disagreement between the two rankings is perfect (i.e., one ranking is the reverse of the other)
  3. If X and Y are independent, then we would expect the coefficient to be approximately zero.
  • When is it useful? It is appropriate for both continuous and discrete ordinal variables. It can be use for looking for non-linear dependence relationships.

Distance correlation:

  • What does it measure? Both linear and nonlinear association between two random variables or random vectors.
  • How is it obtained? By dividing the variable’s distance covariance by the product of their distance standard deviations:

\operatorname {dCor}(X,Y)={\frac {\operatorname {dCov}(X,Y)}{{\sqrt {\operatorname {dVar}(X)\,\operatorname {dVar}(Y)}}}},

The distance covariance is defined as:

{\displaystyle \operatorname {dCov} _{n}^{2}(X,Y):={\frac {1}{n^{2}}}\sum _{j=1}^{n}\sum _{k=1}^{n}A_{j,k}\,B_{j,k}.}

Where:

{\displaystyle A_{j,k}:=a_{j,k}-{\overline {a}}_{j\cdot }-{\overline {a}}_{\cdot k}+{\overline {a}}_{\cdot \cdot },\qquad B_{j,k}:=b_{j,k}-{\overline {b}}_{j\cdot }-{\overline {b}}_{\cdot k}+{\overline {b}}_{\cdot \cdot },}

{\begin{aligned}a_{{j,k}}&=\|X_{j}-X_{k}\|,\qquad j,k=1,2,\ldots ,n,\\b_{{j,k}}&=\|Y_{j}-Y_{k}\|,\qquad j,k=1,2,\ldots ,n,\end{aligned}}

where || ⋅ || denotes Euclidean norm.

  • Properties:
  1. dCor (X,Y) = 0 if and only if the random vectors are independent.
  2. dCor (X,Y) = 1: Perfect dependence between the two distributions.
  3. dCor (X,Y) is defined for X and Y in arbitrary dimension.
  • When is it useful? It is appropriate to find any kind  dependence relationships between the 2 variables. Also if X and Y have different dimensions.

OPIGTREAT

On the 19th of March OPIG set off on our group retreat – henceforth referred to as the OPIGTREAT.

We kicked off a little late as apparently Saulo and check in times are not a good combination (though he is an expert at reversing on an icy road).

Jin and Flo gave the first talk on web programming specifically Flask and D3. If I understood correctly flask is a web development framework for python that runs everything on the server side. Whereas D3 is data/driven/document, which appears to be a way of making very pretty things.

Garrett then gave us an impressive overview on the area of docking, thinking about whether docking had improved in the last 10 years. He discussed how docking can be used to both predict the binding mode (the orientation and conformation) as well as the binding affinity. The state of the art appears to be if we are docking a small molecule into approximately the correct binding site a native like pose can be identified but binding affinity prediction in all cases remains challenging.

Mark then attempted the impossible, he tried to give a talk explaining how to give a good talk. In this case in the context of public engagement and taking our work out to schools. I am now versed in the 4 Ms Manageable, Measurable, Made first and Most Important. I am also weirdly aware that my head shouldn’t move when I am teaching.

Ellliot then took us through how we should judge a PDB structure, a really useful skill for everyone in the group. He described measures such as resolution, B factors Rfree, Clash score, Ramachandran outliers, sidechain outliers and RSRZ outliers. Interesting facts that I collected the average resolution of an X-ray structure in the PDB is ~2A and the average Rfree is 0.25. I also learnt of the existence of PDBredo a service that re-refines datasets in the PDB.

Saulo and the Fergi were up next and they treated us all to a short talk and then a Jupyter notebook practical on machine learning. They discussed supervised, unsupervised and reinforcement learning. Giving examples of each and how and when they should/could be used. Claire and I then learnt a great deal about Jupyter notebooks, the most important thing being to press shift enter. Useful facts “out of the bag” is a method for measuring the error of random forests, score using all data points apart from those used to make that tree.  

The evening finished with a film about the evil iniquities of smoking (very high brow stuff!?!).

The second day began with Bernhard (a visitor from the far of land of Barcelona these days) talking to us about his latest research project. As this is his story – no details in the blog.

Claire then gave an update of the talk she gave at the last OPIGTREAT – how to make “stuff” pretty. Obviously a popular topic as we all wish to display our data and findings in a way that is easily interpretable as well as visually appealing. Claire took us through some of the tools to use like ggplot and Pymol – showed us where to find the lists of useful commands and then showed us the types of images you could make if you really put some thought into it.

Anne was up next, she discussed the challenges and opportunities of integrating heterogeneous data sources and she came up with a lot of data sources to think about, running from protein structures, protein interactions, small molecule structures, drug safety, drug targets, functional annotation and pathways. One thing to remember probably don’t tell your boss when she should or shouldn’t be taking notes……

It was then the turn of team networks Javi, James and Lyuba who walked us through the basics of networks and expanded on their uses across multiple data types in biology. They mentioned areas from simple motifs to protein structure, MD simulations, ontologies, disease prediction, drug target identification…. We then had a practical to check we had understood the power of networks! The networks under consideration were dolphins, Myoglobin structure, Facebook data and the mystery voter network (where we discovered that Fergus the first in no way tried to rig the vote for what film to watch).

That afternoon I visited the bird sanctuary just down the road, others went to a gin distillery or on a walk. Top quote of the afternoon was from James “I want the birds to eat from my pants”. I believe he is from one of those countries that has the misguided belief that pants means trousers. Actually I could have a different top quote from Alex about somebody being a cheap ride in his dreams but I think I should pass over that one.

That evening we were treated to a fragment based drug discovery extravaganza headed up Hannah, Susan and Joe. They took us through the use of fragments for drug discovery and then we attempted a practical. I seem to remember that Claire and I once again excelled at shift enter on the Jupyter notebook.

That evening we had a pub quiz, which apparently ended in a draw between all the teams playing. I feel that Claire and Flo as quizmasters might have made a minor miscalculation. I was happy though as I ended up with the minions bowl and cup. I also managed to persuade several grown men to jump and smash chocolate eggs on their heads on the ceiling.

Next morning Alex and Matt were up first. In their talk they demonstrated not only their knowledge on the area of the future immunotherapy repertoire but also their ability to finish each other’s sentences. They gave a really excellent overview of current immunotherapies and where the field is moving and what might be the future. Facts to store in the head, first ever approved AB therapeutic Muromonab (1986). Currently most successful Humira (Adalimumab) from Abbvie worth 18.4b dollars in 2017, this is a fully human AB for autoimmune diseases and binds to the mediator of inflammation (TNF-alpha).

Next up Catherine and Lucian who discussed distributed computing in PySpark, they started by explaining why distributed computing is going to become so important. Basic info by 2025, 100 million to 2 billion human genomes will have been sequenced that is 2 – 40 exabytes of data. They discussed distributed computing vs centralised and Pyspark compared to Hadoop. There was a practical but Mark had to solo perform for the audience leading to one of the top photos of the whole OPIGTREAT.

As a punishment for being in charge I gave the final talk where I discussed future research direction and how you decide what those might be.

So with thanks to all of the group that concludes the OPIGTREAT report.

New avenues in antibody engineering

Hi everyone,

In this blog post I would like to review an unusual antibody scaffold that can potentially give rise to a new avenue in antibody engineering. Here, I will discuss a couple of papers that complement each others research.

My DPhil is centered on antibody NGS (Ig-seq) data analysis. I always map an antibody sequence to its structure as the three-dimensional antibody configuration dictates its function, the piece of information that cannot be obtained from just the nucleotide or amino acid sequence. When I work with human Ig-seq data, I bear in mind that antibodies are composed of two pairs of light and heavy chains that tune the antibody towards its cognate antigen. In the light of recent research discoveries, Tan et al., found that antibody repertoires of people that live in malaria endemic regions have adopted a unusual property to defend the body from the pathogen (1). Several studies followed up on this discovery to further dissect the yet uncharacterized property of antibodies.

Malaria parasites in the erythrocytic stage produce RIFIN proteins that are displayed on the surface of the erythrocytes. The main function of RIFINs is to bind to the LAIR1 receptors that are found on the surface on the immune cells. The LAIR1 receptor is inhibitory, which leads to inhibition of the immune system. The endogenous ligand of the LAIR1 receptor is collagen, which is found on the surface of body cells. This is to make sure that the immune cells will not be activated against its own body. Activating the LAIR1 receptors is one of the escape mechanisms that the malaria parasite has evolved.

Tan et al., (1) showed that in an evolutionary arms race between human and malaria, our immune system has harnessed the property of RIFINs to bind to LAIR1 against the parasite itself. By doing single B cell isolation and sequencing, it was discovered that antibodies, which are the effector molecules of our immune system, can incorporate the LAIR1 protein in its structure. Taking into account our knowledge of antibody engineering, the idea of incorporating a 100 amino acid long protein into antibody structure is very hard to comprehend. Sequences of these antibodies showed that the LAIR1 insertion was introduced to CDR-H3. Recently, the crystal structure of this construct has become available (2). The crystal structure revealed that the LAIR1 insertion indeed is structurally functional. All 5 of antibody canonical CDRs interact with the LAIR1 protein and its linkers to accommodate the insertion. The CDR-L3 forms two disulfide bonds with the liker to orientate the LAIR1 protein in the way, it will interact with RIFINs. It is worth to stress that LAIR1 sequence differs from the wild type, but the structure is very similar (<0.5 RMSD). The change in sequence and structure is crucial to prevent the LAIR1 containing antibody from interacting with collagen, but only with RIFINs.

Pieper et al., (3) tried to interrogate the modality of LAIR1 insertions into antibody structures. It was performed by single cell sequences as well as NGS of the antibody shift region. It turns out that human antibodies can accommodate two types of insertion modalities and can form   camelid-like antibodies. The insertion of LAIR1 can happen to CDR-H3, leading to the loss of antibody binding to its cognate antigen. Another modality is the incorporation of the LAIR1 protein to the shift region of the antibody. This kind of insertion does not interfere with the Fv domain binding properties, which leads to creating of  bi-specific antibodies. The last finding was the insertion of the LAIR1 into antibody structure where D, J and most of V genes, and the light chain were deleted. The resultant scaffold is structurally viable and only possesses the heavy chain. Hence, it is the evidence that human antibodies can also form camelid-like antibodies. Interestingly, these insertions into the shift region are not exclusive to people that live in malaria endemic regions. By doing NGS of the shift domain from European donors, around 1 in 1000 antibody sequences had an insertion of varying lengths. These insertions are introduced from different chromosomes of both intergenic and genic regions.

To sum up, it is very intriguing that our immune system has evolved to create camelid-like and bi-specific antibodies. It will be very informative to try to crystallize these structures to see how these antibodies accommodate the insertion of LAIR1. Current antibody NGS data analysis primarily concentrates on the heavy chain due to sequencing technology limitations. It will be invaluable information if we could sequence the entire heavy chain as well as adjacent shift region to see how our immune system matures and activates against pathogens.

 

  1. Tan J, Pieper K, Piccoli L, Abdi A, Foglierini M, Geiger R, Maria Tully C, Jarrossay D, Maina Ndungu F, Wambua J, et al. A LAIR1 insertion generates broadly reactive antibodies against malaria variant antigens. Nature (2016) 529:105–109. doi:10.1038/nature16450
  2. Hsieh FL, Higgins MK. The structure of a LAIR1-containing human antibody reveals a novel mechanism of antigen recognition. Elife (2017) 6: doi:10.7554/eLife.27311
  3. Pieper K, Tan J, Piccoli L, Foglierini M, Barbieri S, Chen Y, Silacci-Fregni C, Wolf T, Jarrossay D, Anderle M, et al. Public antibodies to malaria antigens generated by two LAIR1 insertion modalities. Nature (2017) 548:597–601. doi:10.1038/nature23670

 

Helpful resources for people studying therapeutic antibodies

My work within OPIG involves studying therapeutic antibodies. It can be tough to find information about these commercial molecules, often known by unintelligible developmental names until the later stages of clinical trials. Their structures are frequently absent, as one might expect, but even their sequences are sometimes a nightmare to get hold of! Below is a list of resources that I have found particularly helpful.

IDENTITIES OF RELEVANT ANTIBODIES

1. Wikipedia (don’t judge!) is an extremely helpful resource to get started. They have the following databases:

(a) A list of FDA-approved therapeutic monoclonal antibody therapies
(b) A more general list of therapeutic, diagnostic and preventive monoclonal antibodies (includes some things that have been withdrawn)

2. The Antibody Society has list of FDA/EU approved and antibodies to watch on their website. NB: This is only available to members of the society (free for students and other concessions, standard membership is $100pa).

3. The journal ‘mAbs’ also has a series of ‘Antibodies to Watch in [Year]’ papers. Here are the ones for 2016, 2017 and 2018.

SEQUENCES

4. 137 clinical-stage (post-phase I) mAb sequences can be found in the SI of this paper by Jain et al.

5. A slightly outdated (last updated Nov 2016), but still extremely useful, resource of antibody seqeunces is this FASTA list, written by Dr Martin’s Group at UCL.

SEQUENCES & STRUCTURES

6. The IMGT monoclonal antibody database (mAb-DB) has been possibly the most helpful resource. This includes 798 entries of both therapeutics and non-therapeutics, so it’s helpful to get a list of the antibodies you are interested in first. You can search it with a wide range of parameters, including antibody name. A typical antibody result will include its mAb-DB ID, INN details, common & developmental names, species, receptor type and isotype, sequence (via the “IMGT/2Dstructure-DB” link), target, clinical trials details and – if available – the 3D structure (via the “IMGT/3Dstructure-DB” link).

7. SAbDab has a continually-updated section for all therapeutic antibody structures deposited in the PDB.

CURRENT STATUS OF THE THERAPEUTIC

8. Search the therapeutic name on AdisInsight, or Pharmacodia to see its current clinical trial status, and whether or not it has been withdrawn.

I just wanted TensorFlow

Finally got TensorFlow to install on my Mac. You’d be tempted to think, “Jin, it’s just a pip install, surely?”

No, MacOS begs to differ! You see, if you’re on a slightly older macOS version like I was (10.12), then you’d still be using TLS 1.0 – long story short, when querying PyPI via pip to get any packages on TLS 1.0, your requests will get rejected. And this cutoff was chosen something like a week ago – SAD! If you have MacOS 10.13 and onward, TLS should be set to 1.2 so you need not worry.

TL;DR:

  1. Get a new version of pip (10.0); see Stack Overflow post.
  2. Install any dependencies for pip as necessary by doing tons of source compilations.
  3. Install desired package(s) as necessary.

Biophysical Society 62nd Annual Meeting

In February I was very fortunate to attend the Biophysical Society 62nd Annual Meeting, which was held in San Francisco – my first real conference and my first trip to North America. Despite arriving with the flu, I had a great time! The conference took place over five days, during which there were manageable 15-minute talks covering a huge range of Biophysics-related topics, and a few thousand more posters on display (including mine). With almost 6,500 attendees, it was also large enough to slip across the road to the excellent SF Museum of Modern Art without anyone noticing.

The best presentation of the conference was, of course, Saulo’s talk on integrating biological folding features into protein structure prediction [1]. Aside from that, here are a few more of my favourites:

Folding proteins from one end to the other
Micayla A. Bowman, Patricia L. Clark [2]

Here in the COFFEE (COtranslational Folding Family of Expert Enthusiasts) office, we love to talk about the vectorial nature of cotranslational folding and how it contributes to the efficiency of protein folding in vivo. Micayla Bowman and Patricia Clark have created a novel technique that will allow the effects of this vectorial folding to be investigated specifically in vitro.

The Clp complex grabs, unfolds and degrades proteins (diagram from [3]). ClpX, the translocase unit of this complex, was used to recapitulate vectorial protein refolding in vitro for the first time.

ClpX is an A+++ molecular motor that grabs proteins and translocates them through its pore. In vivo, its role is to denature substrates and feed them to an associated protease (ClpP) [3]. Bowman & Clark have used protein tags to initiate translocation of the target protein through ClpX, resulting in either N-C or C-N vectorial refolding.

The YKB construct used to demonstrate the vectorial folding mediated by ClpX (diagram from [4]).

They demonstrate the effect using YKB, a construct with two mutually exclusive native states: YK-B (fluoresces yellow) and Y-KB (fluoresces blue) [4]. In vitro refolding results in an equal proportion of yellow and blue states. Cotranslational folding, which proceeds in the N-C direction, biases towards the yellow (YK-B) state. C-N refolding in the presence of ClpX and ATP biases towards the blue (Y-KB) state. With this neat assay, they demonstrate that ClpX can mediate vectorial folding in vitro, and they plan to use the assay to investigate its effect on protein folding pathways and yields.

An ambiguous view of protein architecture
Guillaume Postic, Charlotte Perin, Yassine Ghouzam, Jean-Christope Gelly [Poster abstract: 5, Paper: 6]

This work addresses the ambiguity of domain definition by assigning multiple possible domain boundaries to protein structures. Their automated method, SWORD (Swift and Optimised Recognition of Domains), performs protein partitioning via the hierarchical clustering of protein units (PUs) [7], which are smaller than domains and larger than secondary structures. The structure is first decomposed into protein units, which are then merged depending on the resulting “separation criterion” (relative contact probabilities) and “compactness” (contact density).

Their method is able to reproduce the multiple conflicting definitions that often exist between domain databases such as SCOP and CATH. Additionally, they present a number of cases for which the alternative domain definitions have interesting implications, such as highlighting early folding regions or functional subdomains within “single-domain” structures.

Alternative SWORD domain delineations identify (R) an ultrafast folding domain and (S,T) stable autonomous folding regions within proteins designated single-domain by other methods [6]

Dual function of the trigger factor chaperone in nascent protein folding
Kaixian Liu, Kevin Maciuba, Christian M. Kaiser [8]

The authors of this work used optical tweezers to study the cotranslational folding of the first two domains of 5-domain protein elongation factor G.

In agreement with a number of other presentations at the conference, they report that interactions with the ribosome surface during the early stages of translation slows folding by stabilising disordered states, preventing both native and misfolded conformations. They found that the N-terminal domain (G domain) folds independently, while the subsequent folding of the second domain (Domain II) requires the presence of the folded G domain. Furthermore, while partially extruded, unfolded domain II destabilises the native G domain conformation and leads to misfolding. This is prevented in the presence of the chaperone Trigger factor, which protects the G domain from unproductive interactions and unfolding by stabilising the native conformation. This work demonstrates interesting mechanisms by which Trigger factor and the ribosome can influence the cotranslational folding pathway.

Optical tweezers are used to interrogate the folding pathway of a protein during stalled cotranslational folding. Mechanical force applied to the ribosome and the N-terminal of the nascent chain causes unfolding events, which can be identified as sudden increases in the extension of the chain. (Figure from [9])

Predicting protein contact maps directly from primary sequence without the need for homologs
Thrasyvoulos Karydis, Joseph M. Jacobson [10]

The prediction of protein contacts from primary sequence is an enormously powerful tool, particularly for predicting protein structures. A major limitation is that current methods using coevolution inference require a large multiple sequence alignment, which is not possible for targets without many known homologous sequences.

In this talk, Thrasyvoulos Karydis presented CoMET (Convolutional Motif Embeddings Tool), a tool to predict protein contact maps without a multiple sequence alignment or coevolution data. They extract structural and sequence motifs from known sequence-structure pairs, and use a Deep Convolutional Neural Network to associate sequence and structure motif embeddings. The method was trained on 137,000 sequence-structure pairs with a maximum of 256 residues, and is able to recreate contact map patterns with low resolution from primary sequence alone. There is no paper on this yet, but we’ll be looking out for it!


1. de Oliveira, S.H. and Deane, C.M., 2018. Exploring Folding Features in Protein Structure Prediction. Biophysical Journal, 114(3), p.36a.
2. Bowman, M.A. and Clark, P.L., 2018. Folding Proteins From One End to the Other. Biophysical Journal, 114(3), p.200a.
3. Baker, T.A. and Sauer, R.T., 2012. ClpXP, an ATP-powered unfolding and protein-degradation machine. Biochimica et Biophysica Acta (BBA)-Molecular Cell Research, 1823(1), pp.15-28.
Acta (BBA) – Molecular Cell Research, 2012, 1823 (1), 15-28
4. Sander, I.M., Chaney, J.L. and Clark, P.L., 2014. Expanding Anfinsen’s principle: contributions of synonymous codon selection to rational protein design. Journal of the American Chemical Society, 136(3), pp.858-861.
5. Postic, G., Périn, C., Ghouzam, Y. and Gelly, J.C., 2018. An Ambiguous View of Protein Architecture. Biophysical Journal, 114(3), p.46a.
6. Postic, G., Ghouzam, Y., Chebrek, R. and Gelly, J.C., 2017. An ambiguity principle for assigning protein structural domains. Science advances, 3(1), p.e1600552.
7. Gelly, J.C. and de Brevern, A.G., 2010. Protein Peeling 3D: new tools for analyzing protein structures. Bioinformatics, 27(1), pp.132-133.
8. Liu, K., Maciuba, K. and Kaiser, C.M., 2018. Dual Function of the Trigger Factor Chaperone in Nascent Protein Folding. Biophysical Journal, 114(3), p.552a.
9. Liu, K., Rehfus, J.E., Mattson, E. and Kaiser, C., 2017. The ribosome destabilizes native and non‐native structures in a nascent multi‐domain protein. Protein Science.
10. Karydis, T. and Jacobson, J.M., 2018. Predicting Protein Contact Maps Directly from Primary Sequence without the Need for Homologs. Biophysical Journal, 114(3), p.36a.