Diagnostics on the Cutting Edge, Software in the Stone Age: A Microbiology Story

The need to treat and control infectious diseases has challenged humanity for millennia, driving a series of remarkable advancements in diagnostic tools and techniques. One of the earliest known legal texts, the Code of Hammurabi, references the visual and tactile diagnosis of leprosy. For centuries, the distinct smell of infected wounds was used to identify gangrene, and in Ancient Greece and Rome, the balance of the four humors (blood, phlegm, black bile, and yellow bile) was a central theory in diagnosing infections.

The invention of the compound microscope in 1590 by Hans and Zacharias Janssen, and its refinements by Robert Hooke and Antonie van Leeuwenhoek, marked a turning point as it enabled the direct observation of microorganisms, thereby linking diseases to their microbial origins. Louis Pasteur’s introduction of liquid media aided Joseph Lister in identifying microbes as the source of surgical infections, whilst Robert Koch’s experiments with Bacillus anthracis firmly established the connection between specific microbes and diseases.

As microbiology advanced, diagnosing pathogens based on morphology alone proved inadequate. Immunological methods, such as the Widal test for typhoid, developed in 1896, laid the groundwork for modern serological tools like ELISA and lateral flow immunoassays, which are widely used to detect diseases such as malaria, COVID-19, and influenza. An often-overlooked development was the introduction of commercial systems for antibiotic susceptibility testing (AST) in the 70’s, such as bioMérieux’s Vitek, which standardized quantitative MIC testing to guide therapy, and the Biolog system which combined extensive data collection with the concept of continuous improvement through evolving data—a key principle of modern diagnostic predictions.

Kary Mullis’ invention of PCR propelled microbial diagnostics into the genetics area, with early applications in detecting TB and HIV. Whole genome sequencing (WGS), however, offers unparalleled resolution for identifying pathogens and resistance mechanisms. The first bacterial genome, Haemophilus influenzae, was sequenced in 1995, and WGS quickly became essential for epidemiology and outbreak investigations. By 2014, the WHO had endorsed WGS for global TB surveillance and antimicrobial resistance (AMR) diagnostics. In 2016, Public Health England and the CDC implemented WGS for routine microbial diagnostics and surveillance of foodborne pathogens.

However, despite the immense potential of such methods and data, the release and packaging of widely regarded gold-standard software and reference datasets have not kept pace with the advancements made in the field over the last 10-15 years.

Catalogues of mutation effects underpin prediction systems

The majority of algorithms and systems that predict drug efficacy from genetic data rely on variant calling followed by variant-look ups in large reference tables or databases, known as catalogues. These either contain resistance-associated alleles, or in the case of pathogens without mobile genetic elements, such as Mycobacterium tuberculosis (Mtb), specific mutations and their associated effects. Over the past 10 years, catalogues have been released whose diagnostic scope and resolution are typically limited by the data available for that particular pathogen. I have grouped these into 3 broad categories:

  1. Low resolution, heuristic tables or databases curated from a wide range of studies in the literature, each often using different antimicrobial susceptibility testing (AST) methods and without any real phenotype standardisation.  Examples include HCV-GLUE (1), MalariaGEN pf7’s resistance mapping (2), and the WHO report on antimalarial drug efficacy (2010-2019) (3).
  2. Some studies have gone a step further in terms of reliability and have curated catalogues from studies in the literature and use the raw data and/or stricter curation rules, often including quantitative information and statistics. However, these can still contain considerable bias as they often contain phenotypes from different assay types, and there is no guarantee of strain and geographic coverage. Examples include the HIV Drug Resistance Database (HIVDB) (4), the Comprehensive Antimicrobial Resistance Database (CARD) (5), and the WHO’s summaries of amino acid substitutions associated with reduced inhibition by Neuraminidase  inhibitors (NAI) (6) and Polymerase acidic inhibitors (PAI) (7) .
  3. There are less than a handful of catalogues developed from sequencing and AST results that derive from a few large global clinical studies linking genetics to AST results that use standardised phenotyping methods. Not only are these in theory less biassed to method and lineage, but they are developed directly from quantitative data and provide measures of predictive performance. The WHO Tuberculosis catalogue (8) is good example of this, and the reliability of its underlying data has facilitated its implementation into prediction systems like TB-Profiler (9) and Mykrobe (10), which previously used reference tables that would have fallen into the first category.

In an ideal world, diagnostic prediction systems would utilise catalogues falling into the more reliable third category. However, for most pathogens comprehensive datasets do not yet exist, and as a result, the first two categories often represent the best available approach. Despite what I perceive to be their limitations in a clinical setting, these catalogues have been instrumental in advancing research and shaping national treatment policies.

These distinctions, however, do underscore the significant leap forward the WHO Tb catalogue represents in terms of standardisation and global coverage. Yet, this achievement also highlights areas where progress is still needed. The packaging and release of certain catalogues fall short of modern expectations for sustainability, usability, and long-term impact, raising several key concerns that must be addressed.

How easy are they to use?

Diagnostic decisions are unlikely to rely on manually looking up variants in a table. Instead, catalogues are typically embedded in software that uses lookups to generate drug resistance predictions. A key factor in determining the utility of a catalogue is its ability to be easily parsed and machine-readable. Many systems, such as HIVDB, HCV-GLUE, and CARD, have recognized this need and offer neat offline Python or CLI APIs for querying their underlying databases. This is particularly valuable for pathogens like HIV and HCV, which exhibit high genetic variability and complex resistance patterns, making simple lookup tables inadequate for storing the necessary depth of information. HIVDB, for example, provides tab-delimited datasets, R scripts for parsing, online query tools, and even an in-browser editor for customizing data schemas. Similarly, HCV-GLUE generates JSON and HTML reports designed for easy integration into bioinformatics pipelines.

Unfortunately, none of the catalogues released by the WHO for Tb, influenza, or malaria are easily parsed. The Tb catalogue is as a large, complex Excel spreadsheet, while the influenza and malaria catalogues are embedded in static PDF reports. The WHO Tb catalogue is cumbersome to use directly because it relies on a method of multi-step matching of Excel files or a VCF, requiring users to normalize their variants and manually process concatenated fields. It lacks coverage for certain deletions and rare variants, and its reliance on Excel parsing is overcomplicated, inefficient and unintuitive. Bioinformaticians are therefore forced to laboriously write code to parse the table into simpler, flexible, more manageable data structures.

Although a database and API is probably overkill for Tb variants, even something as simple as a JSON object or CSV format would vastly improve usability by enabling direct annotation of variants and easy conversions to formats more suitable for individual pipelines.

How reliable is the software?

The WHO are not only the only adherents to Excel spreadsheets, however, and ECOFFinder (11), the EUCAST gold-standard method for calculating Epidemiological Cut-off values (used to identify phenotypically non-WT populations) is also packaged as an Excel-based software tool, with an interface that has remained unchanged since its inception in 2003.  Not only is an Excel-based application unwieldy but is also incompatible with bioinformatic pipelines. It cannot be programmatically tested, which should absolutely be a pre-requisite for software that generated breakpoints underpinning clinical predictions, regardless of how widely it is adopted.

In the case of the WHO TB catalogues, the need for programmatic testing of the algorithms used to generate the catalogue from sequencing and AST data is even more pressing, as these algorithms are not publicly available. While most catalogues are curated from the literature, the code for protocols that rely on custom logic to derive resistance classifications from raw data or AST values should be made available, not only for validation, but to also enable other researchers to use the method.

Fortunately, this lack of transparency is uncommon. The code for most software used to build catalogues, such as CARD, or downstream prediction tools like RGI, TB-Profiler, and Mykrobe, is publicly available through tightly version-controlled GitHub repositories. Unfortunately, the majority of this code remains untested.

Can the catalogues be reproduced?

Many catalogues curated from smaller studies in the literature define mutation effects based on evidence from just one or a handful of experimental or clinical in vitro studies. MalariaGEN Pf7’s catalogue, for instance, is extensive with over 16,000 samples containing tightly controlled genomic data. However, due to the limited role in vitro testing has played in monitoring malaria drug resistance, the variant classifications derive from resistance markers identified in just a single study per drug. Such databases underscore the importance of catalogues built from sufficient, high-quality data, since unbiased mutation-effect predictions depend on AST data produced using standardized methods.

For the few catalogues generated from high quality AST data, therefore, it is a shame transparency has been discarded, and the training data has not been released. Such gatekeeping prevents third parties from reproducing the classifications, and without sharing the data and algorithm, one cannot benchmark one’s own method on the same training set. If neither the algorithm nor data is released, it does beg the question, why not?

Data sharing is admittedly uncommon in medical fields where researchers often treat data as a private preserve. Notwithstanding, for pathogens where global AST projects are rare, certain researchers such as those behind HIVDB, have gone to immense trouble to track down authors of smaller studies to gain access to and release the quantitative data behind those analyses and their reference sets. That many funding agencies have published guidelines on how raw data should be made publicly available for validation and re-use, further emphasises the need for data transparency.

Are the catalogues validated?

All resistance prediction systems publish some form of validation. Ideally, performance would be assessed using an independent test set, as demonstrated by tools like Mykrobe and TB-Profiler. However, as we’ve seen in fields like machine learning and protein/chemo-informatics, true “independence” is more nuanced than simply using a separate set of samples from the training set. Microbial diagnostics has a long way to go before this distinction is made, and there is seldom enough data to create a separate test set. In this instance, methods like cross-validation can still provide some measure of predictive uncertainty, and in the case where independent validation is lacking, HCV-GLUE’s approach of explicitly stating this is encouraged, to avoid unrealistic expectations in the clinic.

These systems rely on querying catalogues, however, and therefore there is no real abstraction from the reference set to the prediction. The catalogue is therefore the key determinant of sensitivity and specificity, not the prediction algorithm. I would accordingly argue catalogues themselves should be released with validation metrics calculated from the simplest possible prediction protocol. The WHO Tb catalogue is one of the only candidates to subscribe to this (although these metrics are calculated on the training data), and researchers are encouraged to adopt a similar mindset.

Summary

Catalogues are developed in various formats and serve a range of purposes. Their design and utility are heavily constrained by the availability and standardization of phenotypic and AST data, which remain limited for many pathogens. Initiatives like the CRyPTIC global clinical data collection project and the WHO TB catalogue represent significant progress in integrating genomics-driven diagnostics into clinical practice. The packaging of certain catalogues, however, has taken a step backward in terms of software sustainability.

A “good release” does not necessarily require complex online tools or databases. Even smaller, less-ambitious catalogues can adhere to fundamental principles to enhance reliability and usability. To this end, I have compiled a non-exhaustive list of considerations to help maximise the reliability and adoption of reference tables and datasets. While not every point applies to all catalogues, the overarching principles should remain relevant.

Many organisations with access to less data of lower quality have successfully prioritised reliability and ease of use. There is no reason why these best practices cannot be universally adopted.

Key practical considerations for clinical end-use

  1. Can the catalogue be parsed easily?
    • By human and machine
    • Are the catalogues in formats compatible with standard diagnostic pipelines
    • Are API’s available for querying and updating the catalogue programmatically
  2. Can the catalogues be statistically reproduced?
    • Are algorithms released
    • Is raw and/or curated data released
  3. Are the catalogues clearly reliable/trustworthy?
    • Are independent validation results released for classifications
    • Is classification evidence readily available (ie is metadata in the catalogue)
    • Are they version controlled
  4. Is the underlying software/algorithm/method released in a modern, intuitive framework?
    • Can it easily be run on new data
    • Can it be plugged into independent software
    • Is it easy to run for non-specialist audience and is their guidance
    • Is the software version controlled
    • Is the software tested
    • Is their comprehensive documentation for algorithm, format, and data input/outputs
  5. Are the catalogues regularly updated?
    • Is there an audit trail/version controlling for updates
    • Can the catalogue be easily updated with new data – particularly by someone else should the current team drop off
  6. Are licensing and ethical considerations clearly stated?
    • Are data use rights clearly stated
    • Are ethical considerations for patient derived data clearly addressed
  7. Bias and limitations
    • Are potential biases in the data or algorithm clearly stated
    • Are limitations of the method and catalogue transparently stated

Bibliography

  1. MRC-University of Glasgow Centre for Virus Research. HCV-GLUE. [Online] 2024. https://hcv-glue.cvr.gla.ac.uk/
  2. Wellcome Sanger Institute. Drug resistance markers to inferred resistance status, pdf. MalariaGEN Pf7. [Online] 2024. https://www.malariagen.net/resource/34/.
  3. WHO. Report on antimalarial drug efficacy, resistance and response: 10 years of surveillance (2010-2019). 2020. ISBN 978-92-4-001281-3.
  4. Stanford University. HIV DRUG RESISTANCE DATABASE. [Online] 2025. https://hivdb.stanford.edu/.
  5. McMaster University. The Comprehensive Antibiotic Resistance Database. [Online] 2024. https://card.mcmaster.ca/home.
  6.  WHO. Summary of neuraminidase (NA) amino acid substitutions assessed for their effects on inhibition by neuraminidase inhibitors (NAIs). 2024.
  7. WHO. Summary of polymerase acidic protein (PA) amino acid substitutions assessed for their effects on PA inhibitor (PAI) baloxavir susceptibility. 2024.
  8. WHO. Catalogue of mutations in Mycobacterium tuberculosis complex and their association with drug resistance, 2nd ed. 2023. ISBN: 9789240082410.
  9. Phelan, Jody. TB Profiler. [Online] 2019. https://tbdr.lshtm.ac.uk/.
  10. EMBL-EBI. MYKROBE. [Online] 2020. https://www.mykrobe.com/.
  11. EUCAST. MIC and zone diameter distributions and ECOFFs. [Online] 2024. https://www.eucast.org/mic_and_zone_distributions_and_ecoffs.

Author