To address some of the karmic imbalance created by computational scientists complaining about other people’s code, I am listing here some (not all) of other people’s code that I love.
IgBLAST
IgBLAST is a sequence alignment tool for immunoglobulin sequences implemented in the NCBI C++ toolkit – it applies the classic BLAST algorithm to searching immunoglobulin germline gene databases. It always impresses me how quickly it works. The paper is here, and the authors are Jian Ye, Ning Ma, Thomas L. Madden and James M. Ostell.
Change-O
The Change-O collection of tools is essential software for immunoinformaticians. You can use it to parse IgBLAST output into a number of useful formats which can be quality-filtered. Further down the pipeline, their tools can perform clonal assignment, germline reconstruction and finally lineage tree assignment. It is speed-optimised, has been very extensively benchmarked, and I love both how well documented it is and how up-to-date it is kept. Here is the paper, the authors being: Namita T. Gupta, Jason A. Vander Heiden, Mohamed Uduman, Daniel Gadala-Maria, Gur Yaari and Steven H. Kleinstein.
IGoR
IGoR fits recombination models to antibody and TCR sequences. I particularly love this software because it makes something otherwise fairly inaccessible (fitting Bayesian models using expectation-maximisation algorithms) usable by just about anybody – installation is simple and the documentation is so clear and comprehensive. The paper is here and the authors are Quentin Marcou, Thierry Mora and Aleksandra M. Walczak.
ANARCI
I must have used ANARCI a million times and I don’t know where I would be without it, so it has to be on this list. It numbers immunoglobulin amino acid sequences according to a choice of numbering schemes: Kabat, Chothia, Extended Chothia, IMGT or AHo. It was written by James Dunbar when he was in OPIG (paper here). As well as being indispensable to my research I also think ANARCI is such a cool name even if it did get a JABBA award.
SCALOP
This is another piece of software written by an “OPIGlet”, Catherine Wong. SCALOP is a canonical form predictor that is 800 times faster than previous approaches with state-of-the-art performance. I dig this code both for both its speed and elegance. The SCALOP paper is here and the authors are Wing Ki Wong (Catherine), Guy Georges, Francesca Ros, Sebastian Kelm, Alan P. Lewis, Bruck Taddese, Jinwoo Leem and Charlotte M. Deane.
Biopython
Biopython is a set of Python tools for just about any computational biology problem you have. This one is so deeply engrained in my psyche that I almost forgot about it – but again it is extremely well-documented and validated code that means that you can do stuff like read PDB files or perform sequence alignments without any mental energy, (something I absolutely take for granted). The paper is here and the authors are Peter J. A. Cock, Tiago Antao, Jeffrey T. Chang, Brad A. Chapman, Cymon J. Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, Michiel J. L. de Hoon.
These are just a handful of pieces of code written by other people that I use most days and that I am very grateful for. There are lots of other pieces of code but I think these are the ones I have used most recently.
All best wishes,
Eve