Category Archives: Uncategorized

The Curious Case Of A Human Chimera

In my role as a PhD student in the OPIG group, I integrate and analyse data from various biological, chemical and data sources. As I am interested in the intersection between chemistry, biology and daily life, it seems suitable that my next BLOPIG posts will discuss and highlight how biological phenomena have either influenced law or history.

Connection between Law and Biology – The Curious Case Of A Human Chimera
Our scene opens in a dark lab, where a scientist injects himself with an unknown substance. The voice over notes that they created a monster named “Chimera” while searching for their hero “Bellerophon”.  This scene is the famous opening scene of the movie “Mission Impossible II” , where we are introduced to the dangerous bioweapon “Chimera”, a combination of multiple diseases. As “Chimera” is a mythological beast from Ancient Greek mythology, with a lion’s head, a goat’s body, and a serpent’s tail, the naming of this bioweapon seems appropriate.

What does this dangerous mixture of multiple diseases, an ancient mythological monster and the promised connection between law and biology have in common?

Apart from a really bad joke, the term “Chimera” is an actual term in biology to describe a biological entity of multiple diverse components, e.g. a human organism, whose cells are composed of distinct genotypes.
In case of tetragametic chimerism, human chimeras thus possess forty-six chromosome pairs instead of the “usual” set of twenty-six chromosome pairs, and as such, their organs and tissues are constructed according to the DNA outlined in the respective organ or tissue.
Tetragametic chimerism occurs by the fertilization of two ova by two spermatozoa, which develop into zygotes. These zygotes then subsequently fuse into one organism, which continues to develop into an organism with two sets of DNA.1-2

But how did such a biological phenomenon like a chimera enter the court of law?

The Romans famously defined that the mother of a child is the one who gives birth to it (Mater sempre certa est, which can be translated as “The mother is always certain”).  I would like to point out that in the times of in-vitro fertilization, this principle is no longer viable, since a child can now have both a genetic mother and a birth mother.3
This Principle was disproved in 2002, when Lydia Fairchild applied to receive Welfare for her two children and her third, unborn child, from the US State. Paternity tests were conducted on all children to prove her ex-partner’s paternity. While the tests proved the paternity of the father without a doubt, Lydia was shown to be no genetic match to her children.

Accused of being a “wellfare fraud” or a surrogate, the judge ordered that Lydia Fairchild had to give birth to her third child in front of witnesses. Immediately blood samples were taken, which revealed that Lydia Fairchild also did not share DNA with this child, despite giving birth to it. Now accused of being a surrogate, Lydia’s case looked dire.
Fortunately, Lydia’s lawyer read a journal article about a similar case involving a woman named Kareen Keegan.2, 4-5 Karen, a 52-year old woman, had renal failure. As she needed a kidney replacement, Karen’s sons underwent the histocompability process to test for donation.Yet the genetic tests showed that only one of her three sons was related to her.1 Material from her entire body was tested for genetic matches to her sons’ DNA, but only genetic material of her thyroid matched her sons.2
Ultimately, the researchers concluded that Karen was a tetragametic chimera, born of the fusion of her zygote and her twin sibling in her mother’s womb. As Dr. Lynne Uhl, a pathologist and doctor of transfusion medicine at Beth Israel Deaconess Medical Center in Boston, said:
“In her blood, she was one person, but in other tissues, she had evidence of being a fusion of two individuals.”6

Subsequently, scientists collected Lydia’s cell material from various body parts and tested for a genetic match with her children. The DNA from her cervical smear was found to be a match, while the DNA collected from her skin and hair was not. Additionally, DNA samples from Lydia’s mother matched her childrens’ DNA. 4-5

Interestingly, while both Lydia and Karen were carrying two sets of DNA as a result of prenatal fusions with their twins, they didn’t show any phenotypic sign of being a chimera, e.g. different skin types or the so-called Blaschko lines.7-8

 

  1. https://www.scientificamerican.com/article/3-human-chimeras-that-already-exist/
  2. To, E. & Report, C. LEADING TO IDENTIFICATION OF TETRAGAMETIC CHIMERISM. 346, (2002).
  3. https://en.wikipedia.org/wiki/Mater_semper_certa_est
  4. https://pictorial.jezebel.com/one-person-two-sets-of-dna-the-strange-case-of-the-hu-1689290862
  5. https://web.archive.org/web/20140301211020/http://www.essentialbaby.com.au/life-style/nutrition-and-wellbeing/when-your-unborn-twin-is-your-childrens-mother-20140203-31woi.html
  6. http://abcnews.go.com/Primetime/shes-twin/story?id=2315693
  7. https://jamanetwork.com/journals/jamadermatology/fullarticle/419529
  8. http://biologicalexceptions.blogspot.co.uk/2015/09/when-youre-not-just-yourself.html

All links were last viewed on the 24.04.2018.

My next blog post: Can a mismatch in maternal DNA threaten a government? How Biology can Influence History.

I just wanted TensorFlow

Finally got TensorFlow to install on my Mac. You’d be tempted to think, “Jin, it’s just a pip install, surely?”

No, MacOS begs to differ! You see, if you’re on a slightly older macOS version like I was (10.12), then you’d still be using TLS 1.0 – long story short, when querying PyPI via pip to get any packages on TLS 1.0, your requests will get rejected. And this cutoff was chosen something like a week ago – SAD! If you have MacOS 10.13 and onward, TLS should be set to 1.2 so you need not worry.

TL;DR:

  1. Get a new version of pip (10.0); see Stack Overflow post.
  2. Install any dependencies for pip as necessary by doing tons of source compilations.
  3. Install desired package(s) as necessary.

Fun with Proteins and 3D Printing!

When I’m not postdoc-ing, as part of my job I’m also involved with teaching at the Doctoral Training Centre here in Oxford. I mainly teach the first-year students of the Systems Approaches to Biomedical Science CDT – many members of this group are doing (or have done) their DPhils through this program (including myself!). Recently, I and some other OPIGlets were responsible for two modules called Structural Biology and Structure-Based Drug Discovery, and as part of those modules we arranged a practical session on 3D printing.

Most of the time, the way we ‘see’ protein structures is through a computer screen, using visualisation software such as PyMOL. While useful, these virtual representations have their limitations – since the screen is flat, it’s difficult to get a proper feel for the structure1, and seeing how your protein could interact and form assemblies with others is difficult. Physical, three-dimensional models, on the other hand, allow you to get hands-on with your structure, and understand aspects of your protein that couldn’t be gained from simply looking at images. Plus, they look pretty cool!

This year, I printed three proteins for myself (shown in the photo above). Since my most recent work has focused on transmembrane proteins, I felt it was only right to print one – these are proteins that cross membranes, usually to facilitate the transport of molecules in and out of the cell. I chose the structure of a porin (top of the photo), which (as the name suggests) forms a pore in the cell membrane to allow diffusion across it. This particular protein (1A0S) is a sucrose-specific porin from a type of bacteria called Salmonella typhimurium, and it has three chains (coloured blue, pink and purple in the printed model), each of which has a beta barrel structure. You can just about see in the photo that each chain has regions which are lighter in colour – these are the parts that sit in the cell membrane layer; the darker regions are therefore the parts that stick out from the membrane.

My second printed model was the infamous Zika virus (bottom right). Despite all the trouble it has caused in recent years, in my opinion the structure of the Zika virus is actually quite beautiful, with the envelope proteins forming star-like shapes in a highly symmetrical pattern. This sphere of proteins contains the viral RNA. The particular structure I used to create the model (5IRE) was solved using cryo-electron microscopy, and required aligning over 10,000 images of the virus.

Finally, I printed the structure of a six-residue peptide, that’s probably only interesting to me… Can you tell why?!2

 

1 – However, look at this link for an example of looking at 3D structures using augmented reality!

2 – Hint: Cysteine, Leucine, Alanine, Isoleucine, Arginine, Glutamic Acid…

The Seven Summits

Last week my boyfriend Ben Rainthorpe returned from Argentina having successfully climbed Aconcagua – the highest mountain in South America. At a staggering 6963m above sea level it is the highest peak outside of the Himalayas. The climb took 20 days in total with a massive 14 hours of hiking and climbing on summit day.

Aconcagua is part of the mountaineering challenge known as the Seven Summits. This is achieved by summiting the highest mountain in each of the seven continents. This was first successfully completed in 1985 by Richard Bass. In 1992 Junko Tabei became the first woman to complete the challenge. In December Ben quit his job as a primary teacher to follow his dream of achieving this feat. Which mountains constitute the seven summits is debated and there are a number of different lists. In addition the challenge can be extended by including the highest volcano in each continent.

The Peaks:

1.Kilimanjaro – Africa (5895m) 

Kilimanjaro is usually the starting point for the challenge. At 5895 m above sea level and no technical climbing required it is a good introduction to high altitude trekking. However, this often means it is underestimated and the most common cause of death on the mountain is altitude sickness.

2. Aconcagua – South America (6963 m)

The next step up from Kilimanjaro Aconcagua is the second highest of the seven summits. However the lack of technical climbing required make it a good second peak to ascend after Kilimanjaro. For Aconcagua however, crampons and ice axes are required. The trek takes three weeks instead of one.

3. Elbrus – Europe (5,642 m)

Heralded as the Kilimanjaro of Europe, Elbrus even has a chair lift part of the way up! This mountain is regularly underestimated causing a high number of fatalities per year. Due to snowy conditions crampons and ice axes are once again required. Some believe that Elbrus should not count as the European peak and instead Mount Blanc should be summited – a much more technical and dangerous climb.

4. Denali – North America (6190 m).

Denali is a difficult mountain to summit. Although slightly lower than other peaks, the distance from the equator means the effects of altitude are more keenly felt. More technical skills are needed. In addition there are no porters to help carry additional gear so climbers must carry a full pack and drag a sled.

5. Vinson Massif – Antartica (4892 m).

Vinson is difficult because of the location rather than any technical climbing. The costs of going to Antartica are great and the conditions are something to be battled with.

6. Puncak Jaya – Australasia (4884 m) or Kosciuszko – Australia (2228 m)

The original Seven Summits included Mount Kosciuszko of Australia – the shortest and easiest climb on the list. However it is now generally agreed that Puncak Jaya is the offering from the Australasia continent. Despite being smaller than others on the list this is the hardest of the seven to climb with the highest technical rating. It is also located in an area that is highly inaccessible to the public due to a large mine, and is one of the few where a rescue by helicopter is not possible.

7. Everest – Asia (8848 m).

Everest is the highest mountain in the world at 8848 m above sea level. Many regard the trek to Everest Base Camp as challenge enough. Some technical climbing is required as well as bottled oxygen to safely reach altitudes of that level. One of the most dangerous parts is the Khumbu Icefall which must be traversed every time the climbers leave base camp. As of 2017 at least 300 people have died on Everest – most of their bodies still remain on the mountain.

Ben has now climbed two of the Seven Summits. His immediate plans are to tackle Elbrus in July (which I might try and tag along to) and Vinson next January. If you are interested in his progress check out his instagram (@benrainthorpe).

TCR Database

Back-to-back posting – I wanted to talk about the growing volume of TCR structures in the PDB. A couple of weeks ago, I presented my database to the group (STCRDab), which is now available at http://opig.stats.ox.ac.uk/webapps/stcrdab.

Unlike other databases, STCRDab is fully automated and updates on Fridays at 9AM (GMT), downloading new TCR structures and annotating them with the IMGT numbering (also applies for MHCs!). Although the size of the data is significantly smaller than, say, the number of antibody structures (currently at 3000+ structures and growing), the recent approval of CAR therapies (Kymriah, Yescarta), and the rise of interest in TCR engineering (e.g. Glanville et al., Nature, 2017; Dash et al., Nature, 2017) point toward the value of structures.

Feel free to read more in the paper, and here are some screenshots. 🙂

STCRDab front page.

Look! 5men, literally.

Possibly my new favourite PDB code.

STCRDab annotates structures automatically every Friday!

ABodyBuilder and model quality

Currently I’m working on developing a new strategy to use FREAD within the ABodyBuilder pipeline. While running some tests I’ve realised that some of the RMSD values that there were some minor miscalculations of CDR loops’ RMSD in my paper.

To start with, the main message of the paper remains the same; the overall quality of the models (Fv RMSD) was correct, and still is. ABodyBuilder isn’t necessarily the most accurate modelling methodology per se, but it’s unique in its ability to estimate RMSD. ABodyBuilder would still be capable of doing this calculation regardless of what the CDR loops’ RMSD may be. This is because the accuracy estimation looks at the RMSD data and places a probability that a new model structure would have some RMSD value “x” (given the CDR loop’s length). Our website has now been updated in light of these changes too.

Update to Figure 2 of the paper.

Update to Figure S4 of the paper.

Update to Figure S5 of the paper.

Paper review: “Inside the black box”

There are nearly 17,000 Oxford students on taught courses. They turn up reliably every October. We send them to an army of lecturers and tutors, drawn from every rank of the research hierarchy. As members of that hierarchy, we owe it to the students – all 17,000 of them – to teach them as best we can.

And where can we learn the most about how to teach? There are 438,000 professional teachers in the UK. Maybe people who spend all of their working time on the subject might have good strategies to help people learn.

The context of the paper

Teachers obsess over assessment. Assessment is the process by which teachers figure out what students have learned. It is probably true that assessment is the only reason we have classrooms at all.

Inside the Black Box is of the vanguard of recent changes in educational thinking. Modern teaching regards good pedagogy as a practical skill. Like other types of performance, it depends on a specific set of concrete actions which can be taught and learned. Not everyone is a natural teacher – but nearly everyone can become a competent teacher.

Formative assessment is the focus of Inside the Black Box. The article argues that this process, in which teachers figure what students know and tell them how it’s going wrong, is essential to good classroom practice.

What is the black box?

The black box is the classroom. After societal convulsions over class sizes, funding deficits, curriculum reforms, and examination structure, it’s time – says the article, in 2001 – that we focus on what actually goes on inside the classroom. These social changes, it says, adjust the inputs to the black box, and society expects better things out of the black box. But what if changing the inputs makes the work inside the black box harder? Don’t we have an obligation to figure out what needs to happen to get students to learn?

The article touches three questions:

  • Is there evidence that improving formative assessment raises standards?
  • Is there evidence that there is room for improvement?
  • Is there evidence about how to improve formative assessment?

The answers are yes, yes, and yes. In meta-analyses of educational experiments, formative assessment consistently raises standards. These experiments match the experience of teachers, who know that the least effective lessons are those which do not respond to students’ needs. Standard observations – such as those from Ofsted – ask teachers to answer what are they learning, and then how do you know, and then what are you doing about it?

The second question – is there room for improvement? – is one they address in great detail in the context of primary and secondary education. Some criticisms (the giving of grades for its own sake, unintentional encouragement of “rote or superficial learning”, relentless competition between students) seem applicable in different parts of our university context. A greater weakness is a lack of emphasis. People engaged in university teaching frequently center the delivery of knowledge instead of learning, an idea exacerbated by our obsession with lectures and masked by the long lag between those lectures and the exams in which we assess them.

Recommendations

Inside the Black Box makes specific recommendations for instructors about how to engage in formative assessment. Those recommendations – unusually, for an item in the educational literature – are specific and detailed. But rather than focus on them, it is worth examining three themes which run across the article.

The overriding focus is the importance of formative assessment. If we care about what students learn, then we’ve got to be checking what it is that they actually are learning. Opportunities for formative assessment should be “designed into any piece of teaching”. In extremis, this idea has interesting implications for the institution of lectures, which generally lack them entirely.

A subsidiary idea is the importance of setting clear objectives for learning. Too many students view learning as a series of exercises rather than a step in the formation of a coherent body of knowledge. The overarching direction should be made clear. And on a more detailed level, we need to be explicit about what outcomes we want our students to obtain so that they know whether they are making satisfactory progress. Formative assessment must make reference to expectations, and formative self- or peer assessment becomes impossible if those expectations are not well-understood.

And this discussion ties into a final point: when students truly apply themselves to the task of learning, their self-perception and self-esteem becomes bound up in it. Ineffective expectation-setting and insufficient clarity about the means for improvement result in students feeling demotivated, which causes them to revise their goals downward. They put in less effort and achieve outcomes that are worse. These effects are costly and can be avoided by effective formative assessment.

Inside the Black Box is a diversion from our diet of scientific articles, but I think it is worth our attention. Pedagogy is difficult to get right. In the university context, good practice is the subject of little attention and rarely assessed. Thinking about good asssessment means that our students benefit.

But all communication activities are a form of teaching. Really good teachers communicate really well. When good communication happens, everyone benefits, inside and outside the black box.

Typography in graphs.

Typography [tʌɪˈpɒɡrəfi]
    n.: the style and appearance of printed matter.

Perhaps a “glossed” feature of making graphs, having the right font goes a long way. Not only do we have the advantage of using a “pretty” font that we like, it also provides an aesthetic satisfaction of having everything (e.g. in a PhD thesis) in the same font, i.e. both the text and graph use the same font.

Fonts can be divided into two types: serif and sans-serif. Basically, serif fonts are those where the letters have little “bits” at the end; think of Times New Roman or Garamond as the classic examples. Sans-serif fonts are those that lack these bits, and give it a more “blocky”, clean finish – think of Arial or Helvetica as a classic example.

Typically, serif fonts are better for books/printed materials, whereas sans-serif fonts are better for web/digital content. As it follows, then what about graphs? Especially those that may go out in the public domain (whether it’s through publishing, or in a web site)?

This largely bottles down to user preference, and choosing the right font is not trivial. Supposing that you have (say, from Google Fonts), then there are a few things we need to do (e.g. make sure that your TeX distribution and Illustrator have the font). However, this post is concerned with how we can use custom fonts in a graph generated by Matplotlib, and why this is useful. My favourite picks for fonts include Roboto and Palatino.

The default font in matplotlib isn’t the prettiest ( I think) for publication/keeping purposes, but I digress…

To start, let’s generate a histogram of 1000 random numbers from a normal distribution.

The default font in matplotlib, bitstream sans, isn’t the prettiest thing on earth. Does the job but it isn’t my go-to choice if I can change it. Plus, with lots of journals asking for Type 1/TrueType fonts for images, there’s even more reason to change this anyway (matplotlib, by default, generates graphs using Type 3 fonts!). If we now change to Roboto or Palatino, we get the following:

Sans-serif Roboto.

Serif font Palatino.

Basically, the bits we need to include at the beginning of our code are here:

# Need to import matplotlib options setting method
# Set PDF font types - not necessary but useful for publications
from matplotlib import rcParams
rcParams['pdf.fonttype'] = 42

# For sans-serif
from matplotlib import rc
rc("font", **{"sans-serif": ["Roboto"]}

# For serif - matplotlib uses sans-serif family fonts by default
# To render serif fonts, you also need to tell matplotlib to use LaTeX in the backend.
rc("font", **{"family": "serif", "serif": ["Palatino"]})
rc("text", usetex = True)

This not only guarantees that images are generated using a font of our choice, but it gives a Type 1/TrueType font too. Ace!

Happy plotting.

Biological Space – a starting point in in-silico drug design and in experimentally exploring biological systems

What is the “biological space” and why is this space so important for all researchers interested in developing novel drugs? In the following, I will first establish a definition of the biological space and then highlight its use in computationally developing novel drug compounds and as a starting point in the experimental exploration of biological systems.

While chemical space has been defined as the entirety of all possible chemical compounds which could ever exist, the definition of biological space is less clear. In the following, I define biological space as the area(s) of chemical space that possess biologically active (”bioactive”) compounds for a specific target or target class1. As such, they can modulate a given biological system and subsequently influence disease development and progression. In literature, this space has also been called “biologically relevant chemical space”2.

Only a small percentage of the vast chemical space has been estimated to be biologically active and is thus relevant for drug development, as randomly searching bioactive compounds in chemical space with no prior information resembles the search for “the needle in a haystack”. Hence, it should come as no surprise that bioactive molecules are often used as a starting point in in-silico explorations of biological space.
The plethora of in-silico methods for this task includes similarity and pharmacophore searching methods3-6 for novel compounds, scaffold-hopping approaches to derive novel chemotypes7-8 or the development of quantitative structure-activity relationships (QSAR)9-10 to explore the interplay between the 3D chemical structure and its biological activity towards a specific target.

The biological space is comprised of small molecules which are active on specific targets. If researchers want to explore the role the role of targets in a given biological system experimentally, they can use small molecules which are potent and selective towards a specific target (thus confided to a particular area in chemical space)11-12.
Due to their high selectivity ( f.e. a greater than 30-fold selectivity towards proteins of the same family12), these so-called “tool compounds” can help establish the biological tractability – the relationship between the target and a given phenotype – and its clinical tractability – the availability of biomarkers – of a target11. They are thus highly complementary to methods such as RNAi, CRISPR12 and knock-out animals11. Consequently, tool compounds are used in drug target validation and the information they provide on the biological system can increase the probability of a successful drug 11. Most importantly, tool compounds are particularly important to annotate targets in currently unexplored biological systems and thus important for novel drug development13.

  1. Sophie Petit-Zeman, http://www.nature.com/horizon/chemicalspace/background/figs/explore_b1.html, accessed on 03.07.2016.
  2. Koch, M. A. et al. Charting biologically relevant chemical space: a structural classification of natural products (SCONP). Proceedings of the National Academy of Sciences of the United States of America 102, 17272–17277 (2005).
  3. Stumpfe, D. & Bajorath, J. Similarity searching. Wiley Interdisciplinary Reviews: Computational Molecular Science 1, 260–282 (2011).
  4. Bender, A. et al. How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space. Journal of Chemical Information and Modeling 49, 108–119 (2009).
  5. Ai, G. et al. A combination of 2D similarity search, pharmacophore, and molecular docking techniques for the identification of vascular endothelial growth factor receptor-2 inhibitors: Anti-Cancer Drugs 26, 399–409 (2015).
  6. Willett, P., Barnard, J. M. & Downs, G. M. Chemical Similarity Searching. Journal of Chemical Information and Computer Sciences 38, 983–996 (1998)
  7. Sun, H., Tawa, G. & Wallqvist, A. Classification of scaffold-hopping approaches. Drug Discovery Today 17, 310–324 (2012).
  8. Hu, Y., Stumpfe, D. & Bajorath, J. Recent Advances in Scaffold Hopping: Miniperspective. Journal of Medicinal Chemistry 60, 1238–1246 (2017)
  9. Cruz-Monteagudo, M. et al. Activity cliffs in drug discovery: Dr Jekyll or Mr Hyde? Drug Discovery Today 19, 1069–1080 (2014).
  10. Bradley, A. R., Wall, I. D., Green, D. V. S., Deane, C. M. & Marsden, B. D. OOMMPPAA: A Tool To Aid Directed Synthesis by the Combined Analysis of Activity and Structural Data. Journal of Chemical Information and Modeling 54, 2636–2646 (2014).
  11. Garbaccio, R. & Parmee, E. The Impact of Chemical Probes in Drug Discovery: A Pharmaceutical Industry Perspective. Cell Chemical Biology 23, 10–17 (2016).
  12. Arrowsmith, C. H. et al. The promise and peril of chemical probes. Nature Chemical Biology 11, 536–541 (2015).
  13. Fedorov, O., Müller, S. & Knapp, S. The (un) targeted cancer kinome. Nature chemical biology 6, 166–169 (2010).