The NIH Biowulf Cluster: Supercomputing for Intramural Scientists

View other issues of Interface

Search Interface Issues:

Spring 2009 [Number 243] Printable version (1,469KB PDF) Download Adobe Reader

Index

The NIH Biowulf Cluster: Supercomputing for Intramural Scientists

Ten years ago, a cluster of 80 ordinary desktop computers arrived at NIH for the purpose of solving complex biomedical problems. Since then, this cluster—known as Biowulf—has supported the NIH mission by providing intramural scientists with a world-class scientific supercomputing system. But where did Biowulf come from? How did it get its name? Is Biowulf right for your research project? And what is new for Biowulf in 2009?

In the beginning: Beowulf

In 1994, computer scientists at NASA Goddard linked a group of inexpensive off-the-shelf personal computers together with Ethernet, creating a computing cluster with a cost-effectiveness that rivaled existing supercomputers. They called their project Beowulf.

This first Beowulf cluster was a 24-node cluster that cost $57,000, a fraction of the cost of the commercial supercomputers then available. As off-the-shelf, or commodity, computers became even less expensive and open-source software such as the Linux operating system became available, “Beowulf-class” systems grew in popularity. In the last 15 years, as this graph from the Top 500 organization shows, clusters have come to dominate supercomputing, with 400 clusters among the world’s top 500 computers in 2008.

A graph showing architecture share over time, from 1993 to 2008, for the world's top 500 computers.

(Image from www.top500.org, reproduced with permission)

From Beowulf to Biowulf

In 1999, DCSS’ Scientific Computing Branch went production with a Beowulf-class system of 80 nodes, which was called Biowulf — a Beowulf for Bioscience. Since then, the NIH Biowulf cluster has increased it processing power to over 6,300 processors. It is currently used by NIH intramural scientists for projects ranging from molecular dynamics simulations of protein structures and genome-wide association studies, to electron microscopy image analysis and statistical calculations.

Photo of 30% of the Biowulf cluster processors in 1999

$Photo of a fraction of the Biowulf cluster processors in 2009$

30% of the Biowulf cluster (48 processors) in 1999; and 10% of the cluster (600 processors) in 2009.

Biowulf today

The Biowulf cluster is heterogeneous, incorporating several generations of nodes and three different interconnects. Some applications on the cluster are best suited to a particular kind of node, and because the Biowulf cluster runs many different types of biomedical applications, the heterogeneity is an advantage in the NIH environment.

Biowulf’s supporting hardware includes file-system servers and a tape library. The cluster runs the Linux operating system and uses the PBS batch system for job submission and scheduling. A wide array of compilers is available for those who wish to develop programs or build code. The staff that manages and supports the cluster is comprised of both system administrators and scientists, ensuring that the Biowulf staff can address the scientific and the computing aspects of scientific computing.

Parallel jobs or “swarms”?

Initially, the Biowulf staff had assumed that most jobs on the system would be parallel jobs, but within a few months, users had found a novel way to utilize this new resource: running “swarms” — large numbers of independent single-threaded jobs.

Biomedical projects often lend themselves to these “swarms” of computation. For example, a scientist might want to analyze 100,000 DNA sequences with a series of standard programs. Most sequence analysis programs are not parallelized, and the analysis of each sequence is independent of the next, so running them all at once as a swarm of independent jobs makes sense.

Another example is image processing: NIH scientists analyze images from electron microscopy, PET and CT scans, and MRIs, which may require independently running an image-processing program on each image. To enable easier submission of such jobs, the “swarm” program was developed in-house at NIH.

However, parallel jobs still use a large proportion of cycles on the cluster. Molecular dynamics programs such as NAMD, Charmm, Gromacs, and Amber are parallelized, and since such simulations are typically very long-running, they benefit greatly from this parallelization.

Publish or perish!

The productivity of this massive computational resource is ultimately measured in scientific publications. The first publications that cited use of the Biowulf cluster appeared in 2000, about 18 months after the cluster went production. Since then, Biowulf users have continued to publish extensively, with over 80 publications in 2008 alone.

A symposium focusing on recent research citing the Biowulf cluster was held on February 3, 2009, with nine NIH researchers from diverse scientific fields speaking about the computational research they have conducted on the Biowulf cluster. Videocasts from the symposium are available at http://biowulf.nih.gov/symposium

Recent developments

2008: Focus on Molecular Dynamics. Molecular Dynamics simulations of the behavior of atoms in protein structures are both computing-intensive and communications-intensive, and such jobs benefit from a high-performance network between the nodes. This type of job accounts for a significant proportion of the cpu cycles used on the cluster, so every evolution of Biowulf has included some nodes connected by a high-speed, low-latency network such as Myrinet, and more recently, Infiniband. In 2008, the entire hardware upgrade was targeted at molecular dynamics jobs, and we added almost 1800 processors connected by Infiniband to the cluster.

2009: Focus on Storage. In the last decade, the storage needs for many scientific projects on the Biowulf cluster have increased from 100s of Gigabytes (GB) to multiple-Terabytes (TB). For example, a study on the functional consequences of human genetic variation on brain function from NIA and NIMH involves 193 human brain samples of 13,000 transcripts in each of 4 brain regions. The initial analysis required 1 TB, and the subsequent steps will require at least 5 TB. To enable such projects, the 2009 upgrade to the cluster has focused on storage, leading to an additional 500 TB of storage to be available in late spring 2009.

When do you need to use Biowulf?

If you have long jobs such as protein simulations that would take months or years on a desktop computer.

If you have large numbers of independent jobs, such as running Repeatmasker and Blast on many thousands of sequences.

If your jobs require large memory. For example, a Gaussian job requiring 20 GB of memory can be run on the Biowulf "fat node," a special node which is designed for large-memory jobs.

If your jobs can run in parallel on a distributed-memory system such as Biowulf. Many molecular dynamics programs are parallelized, and running a job on 128 processors can enable large simulations that would otherwise not be possible.

When should you not use Biowulf?

"Serial" jobs: (those where each successive process depends on the previous step) will not benefit from being run on a system like Biowulf, since such jobs cannot utilize multiple processors. Such jobs will run just as fast on a desktop system.

Small numbers of short jobs. If your proposed project requires limited computational power, it's not worth the overhead of running via a batch job on Biowulf. Use Helix (http://helix.nih.gov) instead!

Interactive jobs. The Biowulf batch system is most useful for jobs that can run as detached processes, that do not depend on an interactive connection, and that do not require large amounts of graphic information being transmitted through the network. Thus, programs that can only run via a GUI are generally not suitable for the cluster.

Useful URLs

How to get a Helix/Biowulf account: http://helix.nih.gov/Documentation/accounts.html
Applications installed on Biowulf: http://biowulf.nih.gov/apps/
The Biowulf User Guide: http://biowulf.nih.gov/user_guide.html

Questions? Contact us

The Biowulf staff welcomes your questions. To ask us about the Biowulf cluster, the suitability of jobs, available resources and applications, or anything else you’d like to know about Biowulf, please send email to staff@helix.nih.gov

Published by Center for Information Technology, National Institutes of Health

Accessibility | Disclaimers | Privacy Policy | FOIA | Office of Inspector General

NIH...Turning Discovery into Health