![]() |
![]() |
![]() |
|
Spring 2009 [Number 243]
|
||
![]() |
The NIH Biowulf Cluster: Supercomputing for Intramural ScientistsTen years ago, a cluster of 80 ordinary desktop computers arrived at NIH for the purpose of solving complex biomedical problems. Since then, this cluster—known as Biowulf—has supported the NIH mission by providing intramural scientists with a world-class scientific supercomputing system. But where did Biowulf come from? How did it get its name? Is Biowulf right for your research project? And what is new for Biowulf in 2009? In the beginning: Beowulf In 1994, computer scientists at NASA Goddard linked a group of inexpensive off-the-shelf personal computers together with Ethernet, creating a computing cluster with a cost-effectiveness that rivaled existing supercomputers. They called their project Beowulf. This first Beowulf cluster was a 24-node cluster that cost $57,000, a fraction of the cost of the commercial supercomputers then available. As off-the-shelf, or commodity, computers became even less expensive and open-source software such as the Linux operating system became available, “Beowulf-class” systems grew in popularity. In the last 15 years, as this graph from the Top 500 organization shows, clusters have come to dominate supercomputing, with 400 clusters among the world’s top 500 computers in 2008. (Image from www.top500.org, reproduced with permission) From Beowulf to Biowulf In 1999, DCSS’ Scientific Computing Branch went production with a Beowulf-class system of 80 nodes, which was called Biowulf — a Beowulf for Bioscience. Since then, the NIH Biowulf cluster has increased it processing power to over 6,300 processors. It is currently used by NIH intramural scientists for projects ranging from molecular dynamics simulations of protein structures and genome-wide association studies, to electron microscopy image analysis and statistical calculations. ![]() ![]() 30% of the Biowulf cluster (48 processors) in 1999; and 10% of the cluster (600 processors) in 2009. Biowulf today The Biowulf cluster is heterogeneous, incorporating several generations of nodes and three different interconnects. Some applications on the cluster are best suited to a particular kind of node, and because the Biowulf cluster runs many different types of biomedical applications, the heterogeneity is an advantage in the NIH environment. Biowulf’s supporting hardware includes file-system servers and a tape library. The cluster runs the Linux operating system and uses the PBS batch system for job submission and scheduling. A wide array of compilers is available for those who wish to develop programs or build code. The staff that manages and supports the cluster is comprised of both system administrators and scientists, ensuring that the Biowulf staff can address the scientific and the computing aspects of scientific computing. Parallel jobs or “swarms”? Initially, the Biowulf staff had assumed that most jobs on the system would be parallel jobs, but within a few months, users had found a novel way to utilize this new resource: running “swarms” — large numbers of independent single-threaded jobs. Biomedical projects often lend themselves to these “swarms” of computation. For example, a scientist might want to analyze 100,000 DNA sequences with a series of standard programs. Most sequence analysis programs are not parallelized, and the analysis of each sequence is independent of the next, so running them all at once as a swarm of independent jobs makes sense. Another example is image processing: NIH scientists analyze images from electron microscopy, PET and CT scans, and MRIs, which may require independently running an image-processing program on each image. To enable easier submission of such jobs, the “swarm” program was developed in-house at NIH. However, parallel jobs still use a large proportion of cycles on the cluster. Molecular dynamics programs such as NAMD, Charmm, Gromacs, and Amber are parallelized, and since such simulations are typically very long-running, they benefit greatly from this parallelization. Publish or perish! The productivity of this massive computational resource is ultimately measured in scientific publications. The first publications that cited use of the Biowulf cluster appeared in 2000, about 18 months after the cluster went production. Since then, Biowulf users have continued to publish extensively, with over 80 publications in 2008 alone. A symposium focusing on recent research citing the Biowulf cluster was held on February 3, 2009, with nine NIH researchers from diverse scientific fields speaking about the computational research they have conducted on the Biowulf cluster. Videocasts from the symposium are available at http://biowulf.nih.gov/symposium Recent developments 2008: Focus on Molecular Dynamics. Molecular Dynamics simulations of the behavior of atoms in protein structures are both computing-intensive and communications-intensive, and such jobs benefit from a high-performance network between the nodes. This type of job accounts for a significant proportion of the cpu cycles used on the cluster, so every evolution of Biowulf has included some nodes connected by a high-speed, low-latency network such as Myrinet, and more recently, Infiniband. In 2008, the entire hardware upgrade was targeted at molecular dynamics jobs, and we added almost 1800 processors connected by Infiniband to the cluster. 2009: Focus on Storage. In the last decade, the storage needs for many scientific projects on the Biowulf cluster have increased from 100s of Gigabytes (GB) to multiple-Terabytes (TB). For example, a study on the functional consequences of human genetic variation on brain function from NIA and NIMH involves 193 human brain samples of 13,000 transcripts in each of 4 brain regions. The initial analysis required 1 TB, and the subsequent steps will require at least 5 TB. To enable such projects, the 2009 upgrade to the cluster has focused on storage, leading to an additional 500 TB of storage to be available in late spring 2009. When do you need to use Biowulf?
When should you not use Biowulf?
Useful URLs How to get a Helix/Biowulf account: http://helix.nih.gov/Documentation/accounts.html Questions? Contact us The Biowulf staff welcomes your questions. To ask us about the Biowulf cluster, the suitability of jobs, available resources and applications, or anything else you’d like to know about Biowulf, please send email to staff@helix.nih.gov |
![]() |
Published by Center for Information Technology, National Institutes of Health |
Accessibility | Disclaimers | Privacy Policy | FOIA | Office of Inspector General |
![]() ![]() ![]() ![]() NIH...Turning Discovery into Health |