OPIG recently celebrated its 20th year; and on 10 January 2023 I gave a talk just a day before the 10th anniversary of BLOPIG’s first blog post. It’s worth reflecting on what’s stayed the same and what’s changed since then.
Continue readingTag Archives: gpgpu
GPGPUs for bioinformatics
As the clock speed in computer Central Processing Units (CPUs) began to plateau, their data and task parallelism was expanded to compensate. These days (2013) it is not uncommon to find upwards of a dozen processing cores on a single CPU and each core capable of performing 8 calculations as a single operation. Graphics Processing Units were originally intended to assist CPUs by providing hardware optimised to speed up rendering highly parallel graphical data into a frame buffer. As graphical models became more complex, it became difficult to provide a single piece of hardware which implemented an optimised design for every model and every calculation the end user may desire. Instead, GPU designs evolved to be more readily programmable and exhibit greater parallelism. Top-end GPUs are now equipped with over 2,500 simple cores and have their own CUDA or OpenCL programming languages. This new found programmability allowed users the freedom to take non-graphics tasks which would otherwise have saturated a CPU for days and to run them on the highly parallel hardware of the GPU. This technique proved so effective for certain tasks that GPU manufacturers have since begun to tweak their architectures to be suitable not just for graphics processing but also for more general purpose tasks, thus beginning the evolution General Purpose Graphics Processing Unit (GPGPU).
Improvements in data capture and model generation have caused an explosion in the amount of bioinformatic data which is now available. Data which is increasing in volume faster than CPUs are increasing in either speed or parallelism. An example of this can be found here, which displays a graph of the number of proteins stored in the Protein Data Bank per year. To process this vast volume of data, many of the common tools for structure prediction, sequence analysis, molecular dynamics and so forth have now been ported to the GPGPU. The following tools are now GPGPU enabled and offer significant speed-up compared to their CPU-based counterparts:
Application | Description | Expected Speed Up | Multi-GPU Support |
Abalone | Models molecular dynamics of biopolymers for simulations of proteins, DNA and ligands | 4-29x | No |
ACEMD | GPU simulation of molecular mechanics force fields, implicit and explicit solvent | 160 ns/day GPU version only | Yes |
AMBER | Suite of programs to simulate molecular dynamics on biomolecule | 89.44 ns/day JAC NVE | Yes |
BarraCUDA | Sequence mapping software | 6-10x | Yes |
CUDASW++ | Open source software for Smith-Waterman protein database searches on GPUs | 10-50x | Yes |
CUDA-BLASTP | Accelerates NCBI BLAST for scanning protein sequence databases | 10 | Yes |
CUSHAW | Parallelized short read aligner | 10x | Yes |
DL-POLY | Simulate macromolecules, polymers, ionic systems, etc on a distributed memory parallel computer | 4x | Yes |
GPU-BLAST | Local search with fast k-tuple heuristic | 3-4x | No |
GROMACS | Simulation of biochemical molecules with complicated bond interactions | 165 ns/Day DHFR | No |
GPU-HMMER | Parallelized local and global search with profile Hidden Markov models | 60-100x | Yes |
HOOMD-Blue | Particle dynamics package written from the ground up for GPUs | 2x | Yes |
LAMMPS | Classical molecular dynamics package | 3-18x | Yes |
mCUDA-MEME | Ultrafast scalable motif discovery algorithm based on MEME | 4-10x | Yes |
MUMmerGPU | An open-source high-throughput parallel pairwise local sequence alignment program | 13x | No |
NAMD | Designed for high-performance simulation of large molecular systems | 6.44 ns/days STMV 585x 2050s | Yes |
OpenMM | Library and application for molecular dynamics for HPC with GPUs | Implicit: 127-213 ns/day; Explicit: 18-55 ns/day DHFR | Yes |
SeqNFind | A commercial GPU Accelerated Sequence Analysis Toolset | 400x | Yes |
TeraChem | A general purpose quantum chemistry package | 7-50x | Yes |
UGENE | Opensource Smith-Waterman for SSE/CUDA, Suffix array based repeats finder and dotplot | 6-8x | Yes |
WideLM | Fits numerous linear models to a fixed design and response | 150x | Yes |
It is important to note however, that due to how GPGPUs handle floating point arithmetic compared to CPUs, results can and will differ between architectures, making a direct comparison impossible. Instead, interval arithmetic may be useful to sanity-check the results generated on the GPU are consistent with those from a CPU based system.