Bringing practical bioinformatics to high school classrooms

Back in July a litter of OPIGlets went rooting for interesting science at ISMB/ECCB 2019 in Basel, Switzerland. When not presenting, working on my sunburn, or paying nine Francs for a beer, I made a point to attend talks outside my usual bubble of machine learning and drug discovery. In particular, I spent the latter half of the conference in the Education track, and am very glad I did. I love teaching, and am always excited to learn from more experienced educators and trainers. Today I’m going to talk about a fantastic presentation by Stevie Bain from the University of Edinburgh about introducing practical bioinformatics to high school biology classrooms through the 4273pi project.

4273pi is a custom distribution of Debian-based Raspbian operating system for the Raspberry pi, originally developed for use in teaching undergraduate bioinformatics at the University of St Andrews. Often, life science faculties simply lack the infrastructure to deliver practical bioinformatics training to large cohorts – after all, maintaining a teaching lab of Linux machines is difficult to justify if most of your users work on Windows! Using the Raspberry pi neatly sidesteps this problem by providing a computational environment tailored for bioinformatics teaching which can be connected to the computer peripherals in a lab without the need to pester the sysadmin about dependencies or security. The entire project, including the operating system, bioinformatics tools, and course materials, is freely available as a 32GB SD card image.

Bain described their experiences running practical workshops in high schools at two different levels: Higher/Advanced Higher (ages 16-17, comparable to AS/A2 level in the rest of the UK) and National 4 & 5 (ages 14-15, comparable to GCSE level). All the materials used for the workshops are available on the website; definitely worth checking out if you’re interested in teaching.

For Higher and Advanced Higher classes, the students are introduced to the NCBI website and BLAST. Students are provided with an unidentified gene sequence and asked to run a BLAST search to identify the gene. It turns out that the sequence corresponds to the GULO gene in mouse, which codes for L-gulonolactone oxidase, an enzyme responsible for vitamin C synthesis. Students are then asked to repeat the search, this time restricting to the human genome. This time, there are no matches, and Bain excitedly informed us that the students were quick to realise that this means humans are unable to synthesize vitamin C. This result neatly leads into discussions about nutrition, another component of the curriculum.The students then repeat this exercise using the command line version of BLAST on the raspberry pi, leading to discussions about the use of bioinformatics tools for automated searches. The ‘history’ function of BLAST serves as a neat introduction to the notion of record-keeping and reproducibility in computational research, concepts the students are already intimately familiar with through the keeping of lab books during experimental work in class.

For National 4 & 5 students, the raspberry pi wasn’t used; instead the workshop focused on introducing students to bioinformatics through the NCBI website and running BLAST online. Students were provided with a set of sequences extracted from a pork sausage and asked to determine what animals the meat in the sausage came from. Unsurprisingly, it was mostly pork, but there were also matches for chicken, and even a bit of human! At this stage in secondary education, students have experience of experimental work, and were sufficiently familiar with the concept of contamination to recognise that the non-pig matches were (hopefully!) just contamination from the butcher’s shop. What’s nice about this exercise is that although it’s much simpler than BLASTing the GULO gene on a raspberry pi, it still gives students the opportunity to interpret their results, and the low levels of contamination naturally leads into a discussion about E-values and reliability of results obtained via computational methods.

Bain and colleagues have already reached high school students across every region of Scotland, and show no signs of slowing down. During her talk, Bain presented a map of the schools visited, annotated by deprivation levels defined by the Scottish Index of Multiple Deprivation. Although the project has reached students across the length and breadth of Scotland, those in the most deprived areas were under-represented in the set of schools visited. To address this, in addition to expanding the reach of the project to less-visited regions of Scotland, subsequent workshops will be targeted on reaching schools in the most deprived areas. A glimmer of hope, perhaps, in these trying times.

Author