Proteins are fascinating. They are ubiquitous in living organisms, carrying out all kinds of functions: from structural support to unbelievably powerful catalysis. And yet, despite their ubiquity, we are still bemused by their functioning, not to mention by how they came to be. As computational scientists, our research at OPIG is mostly about modelling proteins in different forms. We are a very heterogeneous group that leverages approaches of diverse scale: from modelling proteins as nodes in a complex interaction network, to full atomistic models that help us understand how they behave.
Continue readingCategory Archives: Uncategorized
Exciting new studies in OAS
Hi everyone!
Today is the day for another blog post from me. Here, I would like to give you an update on new studies, which were deposited in the Observed Antibody Space (OAS) resource and take a closer look at one of these studies. To date, we have curated 57 studies in OAS, where we provide raw nucleotide and numbered amino acid sequences for download. These amino acid sequences have been filtered using ANARCI parsing, which ensures that the sequences align to respective species HMM profiles and do not have unusual indels and frameshifts. More than 660 million numbered amino acid sequences are deposited in OAS, where every sequence keeps a link to its corresponding nucleotide sequence. Recently we added two more studies to OAS: Sheng et al., (2017) and Setliff et al., (2018). We numbered roughly 2.8 and 46 million sequences in Sheng et al., and Setliff et al., studies respectively. In this blog post, I would like to talk more about the uniqueness of Setliff et al., data.
OPunting 2018
Hi everyone!
Today is the day to present to you my belated blogpost on OPIG punting (or OPunting for short). I promise I was not procrastinating on writing it. I am currently not in Oxford, as I am visiting beautiful Zurich as proved by the photo below. Continue reading
The Curious Case Of A Human Chimera
In my role as a PhD student in the OPIG group, I integrate and analyse data from various biological, chemical and data sources. As I am interested in the intersection between chemistry, biology and daily life, it seems suitable that my next BLOPIG posts will discuss and highlight how biological phenomena have either influenced law or history.
Connection between Law and Biology – The Curious Case Of A Human Chimera
Our scene opens in a dark lab, where a scientist injects himself with an unknown substance. The voice over notes that they created a monster named “Chimera” while searching for their hero “Bellerophon”. This scene is the famous opening scene of the movie “Mission Impossible II” , where we are introduced to the dangerous bioweapon “Chimera”, a combination of multiple diseases. As “Chimera” is a mythological beast from Ancient Greek mythology, with a lion’s head, a goat’s body, and a serpent’s tail, the naming of this bioweapon seems appropriate.
What does this dangerous mixture of multiple diseases, an ancient mythological monster and the promised connection between law and biology have in common?
Apart from a really bad joke, the term “Chimera” is an actual term in biology to describe a biological entity of multiple diverse components, e.g. a human organism, whose cells are composed of distinct genotypes.
In case of tetragametic chimerism, human chimeras thus possess forty-six chromosome pairs instead of the “usual” set of twenty-six chromosome pairs, and as such, their organs and tissues are constructed according to the DNA outlined in the respective organ or tissue.
Tetragametic chimerism occurs by the fertilization of two ova by two spermatozoa, which develop into zygotes. These zygotes then subsequently fuse into one organism, which continues to develop into an organism with two sets of DNA.1-2
But how did such a biological phenomenon like a chimera enter the court of law?
The Romans famously defined that the mother of a child is the one who gives birth to it (Mater sempre certa est, which can be translated as “The mother is always certain”). I would like to point out that in the times of in-vitro fertilization, this principle is no longer viable, since a child can now have both a genetic mother and a birth mother.3
This Principle was disproved in 2002, when Lydia Fairchild applied to receive Welfare for her two children and her third, unborn child, from the US State. Paternity tests were conducted on all children to prove her ex-partner’s paternity. While the tests proved the paternity of the father without a doubt, Lydia was shown to be no genetic match to her children.
Accused of being a “wellfare fraud” or a surrogate, the judge ordered that Lydia Fairchild had to give birth to her third child in front of witnesses. Immediately blood samples were taken, which revealed that Lydia Fairchild also did not share DNA with this child, despite giving birth to it. Now accused of being a surrogate, Lydia’s case looked dire.
Fortunately, Lydia’s lawyer read a journal article about a similar case involving a woman named Kareen Keegan.2, 4-5 Karen, a 52-year old woman, had renal failure. As she needed a kidney replacement, Karen’s sons underwent the histocompability process to test for donation.Yet the genetic tests showed that only one of her three sons was related to her.1 Material from her entire body was tested for genetic matches to her sons’ DNA, but only genetic material of her thyroid matched her sons.2
Ultimately, the researchers concluded that Karen was a tetragametic chimera, born of the fusion of her zygote and her twin sibling in her mother’s womb. As Dr. Lynne Uhl, a pathologist and doctor of transfusion medicine at Beth Israel Deaconess Medical Center in Boston, said:
“In her blood, she was one person, but in other tissues, she had evidence of being a fusion of two individuals.”6
Subsequently, scientists collected Lydia’s cell material from various body parts and tested for a genetic match with her children. The DNA from her cervical smear was found to be a match, while the DNA collected from her skin and hair was not. Additionally, DNA samples from Lydia’s mother matched her childrens’ DNA. 4-5
Interestingly, while both Lydia and Karen were carrying two sets of DNA as a result of prenatal fusions with their twins, they didn’t show any phenotypic sign of being a chimera, e.g. different skin types or the so-called Blaschko lines.7-8
- https://www.scientificamerican.com/article/3-human-chimeras-that-already-exist/
- To, E. & Report, C. LEADING TO IDENTIFICATION OF TETRAGAMETIC CHIMERISM. 346, (2002).
- https://en.wikipedia.org/wiki/Mater_semper_certa_est
- https://pictorial.jezebel.com/one-person-two-sets-of-dna-the-strange-case-of-the-hu-1689290862
- https://web.archive.org/web/20140301211020/http://www.essentialbaby.com.au/life-style/nutrition-and-wellbeing/when-your-unborn-twin-is-your-childrens-mother-20140203-31woi.html
- http://abcnews.go.com/Primetime/shes-twin/story?id=2315693
- https://jamanetwork.com/journals/jamadermatology/fullarticle/419529
- http://biologicalexceptions.blogspot.co.uk/2015/09/when-youre-not-just-yourself.html
All links were last viewed on the 24.04.2018.
My next blog post: Can a mismatch in maternal DNA threaten a government? How Biology can Influence History.
I just wanted TensorFlow
Finally got TensorFlow to install on my Mac. You’d be tempted to think, “Jin, it’s just a pip install, surely?”
No, MacOS begs to differ! You see, if you’re on a slightly older macOS version like I was (10.12), then you’d still be using TLS 1.0 – long story short, when querying PyPI via pip to get any packages on TLS 1.0, your requests will get rejected. And this cutoff was chosen something like a week ago – SAD! If you have MacOS 10.13 and onward, TLS should be set to 1.2 so you need not worry.
TL;DR:
- Get a new version of pip (10.0); see Stack Overflow post.
- Install any dependencies for pip as necessary by doing tons of source compilations.
- Install desired package(s) as necessary.
Fun with Proteins and 3D Printing!
When I’m not postdoc-ing, as part of my job I’m also involved with teaching at the Doctoral Training Centre here in Oxford. I mainly teach the first-year students of the Systems Approaches to Biomedical Science CDT – many members of this group are doing (or have done) their DPhils through this program (including myself!). Recently, I and some other OPIGlets were responsible for two modules called Structural Biology and Structure-Based Drug Discovery, and as part of those modules we arranged a practical session on 3D printing.
Most of the time, the way we ‘see’ protein structures is through a computer screen, using visualisation software such as PyMOL. While useful, these virtual representations have their limitations – since the screen is flat, it’s difficult to get a proper feel for the structure1, and seeing how your protein could interact and form assemblies with others is difficult. Physical, three-dimensional models, on the other hand, allow you to get hands-on with your structure, and understand aspects of your protein that couldn’t be gained from simply looking at images. Plus, they look pretty cool!
This year, I printed three proteins for myself (shown in the photo above). Since my most recent work has focused on transmembrane proteins, I felt it was only right to print one – these are proteins that cross membranes, usually to facilitate the transport of molecules in and out of the cell. I chose the structure of a porin (top of the photo), which (as the name suggests) forms a pore in the cell membrane to allow diffusion across it. This particular protein (1A0S) is a sucrose-specific porin from a type of bacteria called Salmonella typhimurium, and it has three chains (coloured blue, pink and purple in the printed model), each of which has a beta barrel structure. You can just about see in the photo that each chain has regions which are lighter in colour – these are the parts that sit in the cell membrane layer; the darker regions are therefore the parts that stick out from the membrane.
My second printed model was the infamous Zika virus (bottom right). Despite all the trouble it has caused in recent years, in my opinion the structure of the Zika virus is actually quite beautiful, with the envelope proteins forming star-like shapes in a highly symmetrical pattern. This sphere of proteins contains the viral RNA. The particular structure I used to create the model (5IRE) was solved using cryo-electron microscopy, and required aligning over 10,000 images of the virus.
Finally, I printed the structure of a six-residue peptide, that’s probably only interesting to me… Can you tell why?!2
1 – However, look at this link for an example of looking at 3D structures using augmented reality!
2 – Hint: Cysteine, Leucine, Alanine, Isoleucine, Arginine, Glutamic Acid…
The Seven Summits
Last week my boyfriend Ben Rainthorpe returned from Argentina having successfully climbed Aconcagua – the highest mountain in South America. At a staggering 6963m above sea level it is the highest peak outside of the Himalayas. The climb took 20 days in total with a massive 14 hours of hiking and climbing on summit day.
Aconcagua is part of the mountaineering challenge known as the Seven Summits. This is achieved by summiting the highest mountain in each of the seven continents. This was first successfully completed in 1985 by Richard Bass. In 1992 Junko Tabei became the first woman to complete the challenge. In December Ben quit his job as a primary teacher to follow his dream of achieving this feat. Which mountains constitute the seven summits is debated and there are a number of different lists. In addition the challenge can be extended by including the highest volcano in each continent.
The Peaks:
1.Kilimanjaro – Africa (5895m)
Kilimanjaro is usually the starting point for the challenge. At 5895 m above sea level and no technical climbing required it is a good introduction to high altitude trekking. However, this often means it is underestimated and the most common cause of death on the mountain is altitude sickness.
2. Aconcagua – South America (6963 m)
The next step up from Kilimanjaro Aconcagua is the second highest of the seven summits. However the lack of technical climbing required make it a good second peak to ascend after Kilimanjaro. For Aconcagua however, crampons and ice axes are required. The trek takes three weeks instead of one.
3. Elbrus – Europe (5,642 m)
Heralded as the Kilimanjaro of Europe, Elbrus even has a chair lift part of the way up! This mountain is regularly underestimated causing a high number of fatalities per year. Due to snowy conditions crampons and ice axes are once again required. Some believe that Elbrus should not count as the European peak and instead Mount Blanc should be summited – a much more technical and dangerous climb.
4. Denali – North America (6190 m).
Denali is a difficult mountain to summit. Although slightly lower than other peaks, the distance from the equator means the effects of altitude are more keenly felt. More technical skills are needed. In addition there are no porters to help carry additional gear so climbers must carry a full pack and drag a sled.
5. Vinson Massif – Antartica (4892 m).
Vinson is difficult because of the location rather than any technical climbing. The costs of going to Antartica are great and the conditions are something to be battled with.
6. Puncak Jaya – Australasia (4884 m) or Kosciuszko – Australia (2228 m)
The original Seven Summits included Mount Kosciuszko of Australia – the shortest and easiest climb on the list. However it is now generally agreed that Puncak Jaya is the offering from the Australasia continent. Despite being smaller than others on the list this is the hardest of the seven to climb with the highest technical rating. It is also located in an area that is highly inaccessible to the public due to a large mine, and is one of the few where a rescue by helicopter is not possible.
7. Everest – Asia (8848 m).
Everest is the highest mountain in the world at 8848 m above sea level. Many regard the trek to Everest Base Camp as challenge enough. Some technical climbing is required as well as bottled oxygen to safely reach altitudes of that level. One of the most dangerous parts is the Khumbu Icefall which must be traversed every time the climbers leave base camp. As of 2017 at least 300 people have died on Everest – most of their bodies still remain on the mountain.
Ben has now climbed two of the Seven Summits. His immediate plans are to tackle Elbrus in July (which I might try and tag along to) and Vinson next January. If you are interested in his progress check out his instagram (@benrainthorpe).
TCR Database
Back-to-back posting – I wanted to talk about the growing volume of TCR structures in the PDB. A couple of weeks ago, I presented my database to the group (STCRDab), which is now available at http://opig.stats.ox.ac.uk/webapps/stcrdab.
Unlike other databases, STCRDab is fully automated and updates on Fridays at 9AM (GMT), downloading new TCR structures and annotating them with the IMGT numbering (also applies for MHCs!). Although the size of the data is significantly smaller than, say, the number of antibody structures (currently at 3000+ structures and growing), the recent approval of CAR therapies (Kymriah, Yescarta), and the rise of interest in TCR engineering (e.g. Glanville et al., Nature, 2017; Dash et al., Nature, 2017) point toward the value of structures.
Feel free to read more in the paper, and here are some screenshots. 🙂
Look! 5men, literally.
ABodyBuilder and model quality
Currently I’m working on developing a new strategy to use FREAD within the ABodyBuilder pipeline. While running some tests I’ve realised that some of the RMSD values that there were some minor miscalculations of CDR loops’ RMSD in my paper.
To start with, the main message of the paper remains the same; the overall quality of the models (Fv RMSD) was correct, and still is. ABodyBuilder isn’t necessarily the most accurate modelling methodology per se, but it’s unique in its ability to estimate RMSD. ABodyBuilder would still be capable of doing this calculation regardless of what the CDR loops’ RMSD may be. This is because the accuracy estimation looks at the RMSD data and places a probability that a new model structure would have some RMSD value “x” (given the CDR loop’s length). Our website has now been updated in light of these changes too.
Paper review: “Inside the black box”
There are nearly 17,000 Oxford students on taught courses. They turn up reliably every October. We send them to an army of lecturers and tutors, drawn from every rank of the research hierarchy. As members of that hierarchy, we owe it to the students – all 17,000 of them – to teach them as best we can.
And where can we learn the most about how to teach? There are 438,000 professional teachers in the UK. Maybe people who spend all of their working time on the subject might have good strategies to help people learn.
The context of the paper
Teachers obsess over assessment. Assessment is the process by which teachers figure out what students have learned. It is probably true that assessment is the only reason we have classrooms at all.
Inside the Black Box is of the vanguard of recent changes in educational thinking. Modern teaching regards good pedagogy as a practical skill. Like other types of performance, it depends on a specific set of concrete actions which can be taught and learned. Not everyone is a natural teacher – but nearly everyone can become a competent teacher.
Formative assessment is the focus of Inside the Black Box. The article argues that this process, in which teachers figure what students know and tell them how it’s going wrong, is essential to good classroom practice.
What is the black box?
The black box is the classroom. After societal convulsions over class sizes, funding deficits, curriculum reforms, and examination structure, it’s time – says the article, in 2001 – that we focus on what actually goes on inside the classroom. These social changes, it says, adjust the inputs to the black box, and society expects better things out of the black box. But what if changing the inputs makes the work inside the black box harder? Don’t we have an obligation to figure out what needs to happen to get students to learn?
The article touches three questions:
- Is there evidence that improving formative assessment raises standards?
- Is there evidence that there is room for improvement?
- Is there evidence about how to improve formative assessment?
The answers are yes, yes, and yes. In meta-analyses of educational experiments, formative assessment consistently raises standards. These experiments match the experience of teachers, who know that the least effective lessons are those which do not respond to students’ needs. Standard observations – such as those from Ofsted – ask teachers to answer what are they learning, and then how do you know, and then what are you doing about it?
The second question – is there room for improvement? – is one they address in great detail in the context of primary and secondary education. Some criticisms (the giving of grades for its own sake, unintentional encouragement of “rote or superficial learning”, relentless competition between students) seem applicable in different parts of our university context. A greater weakness is a lack of emphasis. People engaged in university teaching frequently center the delivery of knowledge instead of learning, an idea exacerbated by our obsession with lectures and masked by the long lag between those lectures and the exams in which we assess them.
Recommendations
Inside the Black Box makes specific recommendations for instructors about how to engage in formative assessment. Those recommendations – unusually, for an item in the educational literature – are specific and detailed. But rather than focus on them, it is worth examining three themes which run across the article.
The overriding focus is the importance of formative assessment. If we care about what students learn, then we’ve got to be checking what it is that they actually are learning. Opportunities for formative assessment should be “designed into any piece of teaching”. In extremis, this idea has interesting implications for the institution of lectures, which generally lack them entirely.
A subsidiary idea is the importance of setting clear objectives for learning. Too many students view learning as a series of exercises rather than a step in the formation of a coherent body of knowledge. The overarching direction should be made clear. And on a more detailed level, we need to be explicit about what outcomes we want our students to obtain so that they know whether they are making satisfactory progress. Formative assessment must make reference to expectations, and formative self- or peer assessment becomes impossible if those expectations are not well-understood.
And this discussion ties into a final point: when students truly apply themselves to the task of learning, their self-perception and self-esteem becomes bound up in it. Ineffective expectation-setting and insufficient clarity about the means for improvement result in students feeling demotivated, which causes them to revise their goals downward. They put in less effort and achieve outcomes that are worse. These effects are costly and can be avoided by effective formative assessment.
Inside the Black Box is a diversion from our diet of scientific articles, but I think it is worth our attention. Pedagogy is difficult to get right. In the university context, good practice is the subject of little attention and rarely assessed. Thinking about good asssessment means that our students benefit.
But all communication activities are a form of teaching. Really good teachers communicate really well. When good communication happens, everyone benefits, inside and outside the black box.