GPGPUs for bioinformatics

As the clock speed in computer Central Processing Units (CPUs) began to plateau, their data and task parallelism was expanded to compensate. These days (2013) it is not uncommon to find upwards of a dozen processing cores on a single CPU and each core capable of performing 8 calculations as a single operation. Graphics Processing Units were originally intended to assist CPUs by providing hardware optimised to speed up rendering highly parallel graphical data into a frame buffer. As graphical models became more complex, it became difficult to provide a single piece of hardware which implemented an optimised design for every model and every calculation the end user may desire. Instead, GPU designs evolved to be more readily programmable and exhibit greater parallelism. Top-end GPUs are now equipped with over 2,500 simple cores and have their own CUDA or OpenCL programming languages. This new found programmability allowed users the freedom to take non-graphics tasks which would otherwise have saturated a CPU for days and to run them on the highly parallel hardware of the GPU. This technique proved so effective for certain tasks that GPU manufacturers have since begun to tweak their architectures to be suitable not just for graphics processing but also for more general purpose tasks, thus beginning the evolution General Purpose Graphics Processing Unit (GPGPU).

Improvements in data capture and model generation have caused an explosion in the amount of bioinformatic data which is now available. Data which is increasing in volume faster than CPUs are increasing in either speed or parallelism. An example of this can be found here, which displays a graph of the number of proteins stored in the Protein Data Bank per year. To process this vast volume of data, many of the common tools for structure prediction, sequence analysis, molecular dynamics and so forth have now been ported to the GPGPU. The following tools are now GPGPU enabled and offer significant speed-up compared to their CPU-based counterparts:

Application Description Expected Speed Up Multi-GPU Support
Abalone Models molecular dynamics of biopolymers for simulations of proteins, DNA and ligands 4-29x No
ACEMD GPU simulation of molecular mechanics force fields, implicit and explicit solvent 160 ns/day GPU version only Yes
AMBER Suite of programs to simulate molecular dynamics on biomolecule 89.44 ns/day JAC NVE Yes
BarraCUDA Sequence mapping software 6-10x Yes
CUDASW++ Open source software for Smith-Waterman protein database searches on GPUs 10-50x Yes
CUDA-BLASTP Accelerates NCBI BLAST for scanning protein sequence databases 10 Yes
CUSHAW Parallelized short read aligner 10x Yes
DL-POLY Simulate macromolecules, polymers, ionic systems, etc on a distributed memory parallel computer 4x Yes
GPU-BLAST Local search with fast k-tuple heuristic 3-4x No
GROMACS Simulation of biochemical molecules with complicated bond interactions 165 ns/Day DHFR No
GPU-HMMER Parallelized local and global search with profile Hidden Markov models 60-100x Yes
HOOMD-Blue Particle dynamics package written from the ground up for GPUs 2x Yes
LAMMPS Classical molecular dynamics package 3-18x Yes
mCUDA-MEME Ultrafast scalable motif discovery algorithm based on MEME 4-10x Yes
MUMmerGPU An open-source high-throughput parallel pairwise local sequence alignment program 13x No
NAMD Designed for high-performance simulation of large molecular systems 6.44 ns/days STMV 585x 2050s Yes
OpenMM Library and application for molecular dynamics for HPC with GPUs Implicit: 127-213 ns/day; Explicit: 18-55 ns/day DHFR Yes
SeqNFind A commercial GPU Accelerated Sequence Analysis Toolset 400x Yes
TeraChem A general purpose quantum chemistry package 7-50x Yes
UGENE Opensource Smith-Waterman for SSE/CUDA, Suffix array based repeats finder and dotplot 6-8x Yes
WideLM Fits numerous linear models to a fixed design and response 150x Yes

It is important to note however, that due to how GPGPUs handle floating point arithmetic compared to CPUs, results can and will differ between architectures, making a direct comparison impossible. Instead, interval arithmetic may be useful to sanity-check the results generated on the GPU are consistent with those from a CPU based system.

Author