Biowulf at the NIH
RSS Feed
Scientific Applications on Biowulf

The Biowulf staff maintains several programs, packages and databases for our users. Below is a non-exhaustive list of software available on Biowulf with site-specific instructions on how to run a given package on the cluster, including links to vendor/author provided documentation if applicable.

BLAST, developed at NCBI, is a set of programs to find similarity between a query protein or DNA sequence and a sequence database. A scheme for efficiently running a large number of sequence files against a variety of BLAST databases has been implemented on Biowulf.

The EMBOSS package is a comprehensive suite of sequence analysis software that can perform sequence alignment, motif identification, pattern analysis, and more.

BLAT is a DNA/Protein Sequence Analysis program written by Jim Kent at UCSC. It is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more. It may miss more divergent or shorter sequence alignments. It will find perfect sequence matches of 33 bases, and sometimes find them down to 22 bases. BLAT on proteins finds sequences of 80% and greater similarity of length 20 amino acids or more. In practice DNA BLAT works well on primates, and protein blat on land vertebrates.

The fasta program package contains many programs for searching DNA and protein databases and one program (prss) for evaluating statistical significance from randomly shuffled sequences.

Meme is designed to discover motifs (highly conserved regions) in groups of related DNA or protein sequences, and Mast will search sequence databases using motifs.

NestedMICA is a method for discovering over-represented short motifs in large sets of strings. Typical applications include finding candidate transcription factor binding sites in DNA sequences.

Profile hidden Markov models (profile HMMs) can be used to do sensitive database searching using statistical descriptions of a sequence family's consensus. HMMER uses profile HMMs for several types of homology searches.

PolyPhen-2 (Polymorphism Phenotyping v2) is a software tool which predicts possible impact of amino acid substitutions on the structure and function of human proteins using straightforward physical and evolutionary comparative considerations.

RandFold computes the probability that, for a given RNA sequence, the Minimum Free Energy (MFE) of the secondary structure is different from a distribution of MFE computed with random sequences..

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program.

pMDR is a parallel implementation for Multifactor dimensionality reduction (MDR) for detecting gene-gene and gene-environment interactions.

A collection of executables from UCSC have been compiled on Biowulf. The programs perform a multitude of tasks from simple number crunching to highly specific sequence analysis and database construction. The executables are located in the directory /usr/local/ucsc on biowulf.

A list of all available nucleotide, protein, structural, and otherc databases available on the system for Blast, Fasta etc., and their update status.

back to top

A large variety of scientific databases are maintained in several formats on Helix/Biowulf, including parts of the 1000 Genomes data, human, mouse, and other genomes, and NCBI nonredundant protein and nucleotide databases. See here for a full list and update status.

ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The parallel version is implemented using MPI and is capable of assembling larger genomes.

BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.

bamUtil is a repository that contains several programs that perform operations on SAM/BAM files. All of these programs are built into a single executable, bam.

The BEDTools utilities allow one to address common genomics tasks such finding feature overlaps and computing coverage. In addition, one can develop sophisticated pipelines that answer complicated research questions by "streaming" several BEDTools together.

Blat-like Fast Accurate Search Tool (BFAST) facilitates the fast and accurate mapping of short reads to reference sequences. Specifically, BFAST was designed to facilitate whole-genome resequencing, where mapping billions of short reads with variants is of utmost importance. BFAST supports both Illumina and ABI SOLiD data, as well as any other Next-Generation Sequencing Technology (454, Helicos).

SOLiD Bioscope provides a command line interface for running application-specific sequence analysis tools.

A collection of various Perl scripts that utilize BioPerl modules for use in bioinformatics analysis.

Bowtie is an ultrafast, memory-efficient short read aligner geared toward quickly aligning large sets of short DNA sequences (reads) to large genomes.

Burrows-Wheeler Alignment (BWA) is a fast light-weighted tool that aligns short sequences to a sequence database, such as the human reference genome. BWA excels in its speed. Mapping 2 million high-quality 35bp short reads against the human genome can be done in 20 minutes.

Creates genomic builds, calls SNPs, detects indels, and counts reads from data generated from one or more sequencing runs.

Cis-regulatory Element Annotation is a tool designed to characterize genome-wide protein-DNA interaction patterns from ChIP-chip and ChIP-Seq of both sharp and broad binding factors.Creates genomic builds, calls SNPs, detects indels, and counts reads from data generated from one or more sequencing runs.

Cgatools provide tools for downstream analysis of Complete Genomics data. The focus is to provide command line utilities. The general areas of functionality include genome comparison, format conversion, and reference tools.

Circos is a program for the generation of publication-quality, circularly composited renditions of genomic data and related annotations. Circos is particularly suited for visualizing alignments, conservation and intra and inter-chromosomal relationships. Also, Circos is useful to visualize any type of information that benefits from a circular layout. Thus, although it has been designed for the field of genomics, it is sufficiently flexible to be used in other data domains.

Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.

deFuse is a software package for gene fusion discovery using RNA-Seq data. The software uses clusters of discordant paired end alignments to inform a split read alignment analysis for finding fusion boundaries. The software also employs a number of heuristic filters in an attempt to reduce the number of false positives and produces a fully annotated output for each predicted fusion.

a program for calling small indels from short-read sequence data. It is currently designed to handle only Illumina data.

eXpress is a streaming tool for quantifying the abundances of a set of target sequences from sampled subsequences. Example applications include transcript-level RNA-Seq quantification, allele-specific/haplotype expression analysis (from RNA-Seq), transcription factor binding quantification in ChIP-Seq, and analysis of metagenomic data.

FastQC is a quality control tool for high throughput sequence data

The fastQValidator validates the format of fastq files.

Fastx-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.

FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), and MNPs (multi-nucleotide polymorphisms) smaller than the length of a short-read sequencing alignment.

an efficient fusion aligner which aligns reads spanning fusion junctions directly to the genome without prior knowledge of potential fusion regions.

A computational framework to identify fusion transcripts from paired-end RNA-Seq data.

A structured software library that makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas.

GMAP: A Genomic Mapping and Alignment Program for mRNA and EST Sequences.
GSNAP: Genomic Short-read Nucleotide Alignment Program.

HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and ChIP-Seq analysis.

HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays.

a modular data analysis bioinformatics tool for performing off-instrument secondary and tertiary analyses on sequence data generated by Life Technologies instruments

lumpy is a software for finding and categorizing structural variation in genome sequencing data

Pedigree Drawing is a pedigree drawing program designed to handle large and complex pedigrees with an emphasis on readability and aesthetics.

Model-based Analysis of ChIP-Seq (MACS) is used on short reads sequencers such as Genome Analyzer (Illumina / Solexa). MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. MACS compares favorably to existing ChIP-Seq peak-finding algorithms and can be used for ChIP-Seq with or without control samples.

Accurate mapping of RNA-seq reads for splice junction discovery.

Maq is a software that builds mapping assemblies from short reads generated by the next-generation sequencing machines. It is particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has preliminary functions to handle ABI SOLiD data.

Whole Genome Shotgun and EST Sequence Assembler for Sanger, 454 and Solexa / Illumina

miRanda is an algorithm for finding genomic targets for microRNAs. MiRanda was developed at the Computational Biology Center of Memorial Sloan-Kettering Cancer Center.

MOSAIK is a reference-guided assembler that can work with FASTA,FASTQ,Illumina Bustard & Gerald, or SRF file formats and outputs phrap ace and GigaBayes gig formats.

Probabilistic analysis and design of RNA-Seq experiments for identifying isoform regulation

Picard comprises Java-based command-line utilities that manipulate SAM files, and a Java API (SAM-JDK) for creating new programs that read and write SAM files. Both SAM text format and SAM binary (BAM) format are supported.

NCBI SRA Toolkit

We have the whole package (novoalign, novoalignMPI, novoalignCS, novoalignCSMPI, novomethyl, novobarcode etc). Novoalign is an aligner for single-ended and paired-end reads from the Illumina Genome Analyser. Novoalign finds global optimum alignments using full Needleman-Wunsch algorithm with affine gap penalties whilst performing at the same or better speed than aligners that are limited to two mismatches and no insertions or deletions.

optiCall is designed to make accurate genotype calls across the minor allele frequency spectrum. Using intensity information from across multiple individuals and multiple SNPs when calling genotypes, allows it to call both rare and common variants accurately.

The PCAP program is intended for large-scale assembly of genomic sequences with quality values and with or without forward-reverse read pairs.

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.

A library and toolset for working with human genetic variation data

Genome-wide association analysis of imputed data

Ray is a paralleled computer-controlled software that computes de novo genome assemblies of next-gen sequencing data using message passing interface.

The RSEG software package is aimed to analyze ChIP-Seq data, especially for identifying genomic regions and their boundaries marked by diffusive histone modification markers, such as H3K36me3 and H3K27me3.

comprehensively evaluate RNA-seq datasets generated from clinical tissues or other well annotated organisms such as mouse, fly and yeast.

A modular framework to analyze RNA-Seq data using compact and anonymized data summaries.

RUM is an alignment, junction calling, and feature quantification pipeline specifically designed for Illumina RNA-Seq data.

SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.

Scripture is a method for transcriptome reconstruction that relies solely on RNA-Seq reads and an assembled genome to build a transcriptome ab initio. The statistical methods to estimate read coverage significance are also applicable to other sequencing data. Scripture also has modules for ChIP-Seq peak calling.

Scythe uses a Naive Bayesian approach to classify contaminant substrings in sequence reads. It considers quality information, which can make it robust in picking out 3'-end adapters, which often include poor quality bases.

SHRiMP is a software package for aligning genomic reads against a target genome. It was primarily developed with the multitudinous short reads of next generation sequencing machines in mind, as well as Applied Biosystem's colourspace genomic representation.

Sickle is a windowed adaptive trimming tool for FASTQ files using quality.

SOAP has been in evolution from a single alignment tool to a tool package that provides full solution to next generation sequencing data analysis. Currently, it consists of a new alignment tool (SOAPaligner/soap2), a re-sequencing consensus sequence builder (SOAPsnp), an indel finder ( SOAPindel ), a structural variation scanner ( SOAPsv ) and a de novo short reads assembler ( SOAPdenovo ). And a GPU-accelerated alignment tool (SOAP3/GPU) are being implemented.

SOLiD Software Suite provides software tools for data processing and analysis generated on SOLiD Analyzer. It supports multiple applications, is integrable with custom analysis pipelines and can complete primary (image acquisition and quality control) and secondary (alignment to a reference genome, base calling, and SNP identification) analysis of fragment and mate-paired experiments.

program for the detection of Structural Variation events from whole genome sequenced read pair data

SpliceMap is a de novo splice junction discovery and alignment tool. It offers high sensitivity and support for arbitrary RNA-seq read lengths.

SSAHA2 (Sequence Search and Alignment by Hashing Algorithm) is a pairwise sequence alignment program designed for the efficient mapping of sequencing reads onto genomic reference sequences. SSAHA2 reads of most sequencing platforms (ABI-Sanger, Roche 454, Illumina-Solexa) and a range of output formats (SAM, CIGAR, PSL etc.) are supported. A pile-up pipeline for analysis and genotype calling is available as a separate package.

STAR aligns RNA-seq reads to a reference genome using uncompressed suffix arrays.

Designed to annotate, visualize, and analyze the genetic variants identified through next-generation sequencing studies, including whole-genome sequencing (WGS) and exome sequencing studies.

TopHat is a program that aligns RNA-Seq reads to a genome in order to identify exon-exon splice junctions. It is built on the ultrafast short read mapping program Bowtie.

TopHat-Fusion is an enhanced version of TopHat with the ability to align reads across fusion points, which results from the breakage and re-joining of two different chromosomes, or from rearrangements within a chromosome.

USeq is a collection of software tools for for both low and high level analysis of next generation, ultra high throughput signature sequencing data from the Solexa, SOLiD, and 454 platforms. Initial emphasis: chIP-seq and RNA-Seq with FDR estimations.

software tool for identifying SNPs and indels in massively parallel sequencing of individual and pooled samples.

It can be used to perform common tasks with VCF files such as file validation, file merging, intersecting, complements, etc.

VEGAS is a free program for performing gene-based tests for association using the results from genetic association studies. It annotates SNPs to corresponding genes, produces a gene-based test statistic, and then uses simulation to calculate an empirical gene-based p-value.

The Vienna RNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures.

back to top

BEAST (Bayesian Evolutionary Analysis Sampling Trees) is a cross-platform program for Bayesian MCMC analysis of molecular sequences.

ChromoPainter is a tool for finding haplotypes in sequence data. ChromoCombine is a tool to help manage the large number of files generated when running ChromoPainter in parallel on a large number of separate compute nodes. fineSTRUCTURE is a fast and powerful algorithm for identifying population structure using dense sequencing data.

These programs are statistical technique used to map genes and find the approximate location of disease genes.

The EIGENSOFT package combines functionality from our population genetics methods and our EIGENSTRAT stratification correction method.

Loki is a linkage analysis package, primarily for large and complex pedigrees, which uses Markov chain Monte Carlo (MCMC) techniques to avoid many of the computational problems that prevent exact computational methods being used for large pedigrees.

MERLIN uses sparse trees to represent gene flow in pedigrees and is one of the fastest pedigree analysis packages around

MrBayes performs Bayesian estimation of phylogeny.

PAUP* (Phylogenetic Analysis Using Parsimony) is a software package for inference of evolutionary trees.

Tools for the statistical analysis of family-based association studies (FBAT).

QIIME is an open source software package for comparison and analysis of microbial communities, primarily based on high-throughput amplicon sequencing data (such as SSU rRNA) generated on a variety of platforms, but also supporting analysis of other types of data (such as shotgun metagenomic data).

GERMLINE is a program for discovering long shared segments of Identity by Descent (IBD) between pairs of individuals in a large population.

PLINK is whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

MACH 1.0 is a Markov Chain based haplotyper. It can be resolve long haplotypes or infer missing genotypes in samples of unrelated individuals.

Mach2qtl performs QTL analysis based on imputed dosages/posterior_probabilities.

IMPUTE is a program for estimating ("imputing") unobserved genotypes in SNP association studies.

Random Jungle is a fast implementation of RandomForest(TM) for high dimensional data, that can be used for GWAS data.

a fast and accurate haplotype inference software

A statistical genetics computer application for haplotype, parametric linkage, non-parametric linkage (NPL), identity by descent (IBD) and mistyping analyses on any size of pedigree. SimWalk2 uses Markov chain Monte Carlo (MCMC) and simulated annealing algorithms to perform these multipoint analyses.

A package of software to perform several kinds of statistical genetic analysis, including linkage analysis, quantitative genetic analysis, and covariate screening.

back to top

AMBER (Assisted Model Building with Energy Refinement) is a package of molecular simulation programs. Version 9 is currently installed on Biowulf. Major programs in the AMBER package include sander, gibbs, nmode, LEap

APBS (Adaptive Poisson-Boltzmann Solver) is a software package for the numerical solution of the Poisson-Boltzmann equation (PBE), one of the most popular continuum models for describing electrostatic interactions between molecular solutes in salty, aqueous media.

A suite of automated docking tools. It is designed to predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure.

CHARMM (Chemistry at HARvard Molecular Mechanics) is a program which supports a wide range of theoretical modeling calculations of the structure and dynamics of biological molecules. In addition to energy minimization and molecular dynamics simulations, Monte Carlo sampling, use of genetic algorithms, and several interfaces to quantum codes (AM1, GAMESS) are available or under development. Recent CHARMM versions have been made available for use on Biowulf, as a joint effort between NHLBI/LBC Computational Biophysics Section and CBER/OVRR Biophysics Lab and with the support of Biowulf Staff. Multiple executables are available for each version, in order to support larger molecular systems, and the different types of parallel communications available on Biowulf, i.e. ethernet and Myrinet 2000. The support files are also available for the above versions, e.g. version .doc files, and the standard topology and parameter files.

GAMESS is a program for ab initio quantum chemistry. Briefly, GAMESS can compute wavefunctions ranging from RHF, ROHF, UHF, GVB, and MCSCF, with CI and MP2 energy corrections available for some of these. Analytic gradients are available for these SCF functions, for automatic geometry optimization, transition state searches, or reaction path following. Computation of the energy hessian permits prediction of vibrational frequencies. A variety of molecular properties, ranging from simple dipole moments to frequency dependent hyperpolarizabilities may be computed.

Gaussian09 is a series of electronic structure programs performing computations starting from the basic laws of quantum mechanics. Gaussian can predict energies, molecular structures, vibrational frequencies for systems in the gas phase and in solution, and it can model them in both their ground state and excited states.

A versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins and lipids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.

The Multiscale Modeling Tools for Structural Biology (MMTSB) Tool Set is a novel set of utilities and programming libraries that provide new enhanced sampling and multiscale modeling techniques for the simulation of proteins and nucleic acids. The tool set interfaces with the existing molecular modeling packages CHARMM and Amber for classical all-atom simulations, and with MONSSTER for lattice-based low-resolution conformational sampling.

NAMD is a parallel molecular dynamics program for UNIX platforms designed for high-performance simulations in structural biology. It is developed by the Theoretical Biophysics Group at the Beckman Center, University of Illinois. NAMD is particularly well suited to Beowulf clusters, as it was specifically designed to rugn efficiently on parallel machines.
VMD, the molecular visualization program integrated with NAMD, is also available on Helix and Biowulf.

Q-Chem is a comprehensive ab initio quantum chemistry package for accurate predictions of molecular structures, reactivities, and vibrational, electronic and NMR spectra.

A limited number of Schrödinger applications (such as MacroModel, Jaguar, and QikProp) are available through the Molecular Modeling Interest Group.

TURBOMOLE is a fast quantum chemical program package that is very stable and requires little memory and disk space. It consists of a series of modules and tools. Portions of the code are optimized for parallel use.

back to top

An efficient search engine for identifying MS/MS peptide spectra by searching libraries of known protein sequences. OMSSA scores significant hits with a probability score developed using classical hypothesis testing, the same statistical method used in BLAST.

A MS/MS database search tool specifically designed to address two crucial needs of the proteomics comminuty: post-translational modification identification and search speed.

back to top

The GAUSS Mathematical and Statistical System is a fast matrix programming language designed for computationally intensive tasks, which has a wide variety of statistical, mathematical and matrix handling routines.

Matlab integrates mathematical computing, visualization, and a powerful language to provide a flexible environment for technical computing.

Mathematica is a fully integrated environment for technical and scientific computing. Mathematica combines numerical and symbolic computation, visualization, and programming in a single, flexible interactive system.

GNU Octave is an open-source language for numerical calculations that has a command-line interface and can interpret many (but not all) Matlab scripts. It is not license-limited and so can be used for many simultaneous independent runs.

OpenBUGS is a software package for performing Bayesian inference Using Gibbs Sampling.

R (the R Project) is a language and environment for statistical computing and graphics. R is similar to S, and provides a wide variety of statistical and graphical techniques (linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, ...)

The SAS System is an integrated, hardware-independent system of applications software for data access, management, statistical analysis and report writing. The Base SAS windowing environment provides a full-screen facility for interacting with all parts of a SAS program.

Scilab is an open-source alternative to Matlab which includes hundreds of mathematical functions and the ability to interactively add C/Fortran programs. It includes a Matlab->Scilab converter.

back to top

AFNI (Analysis of Functional NeuroImages) is a set of C programs for processing, analyzing, and displaying functional MRI (FMRI) data - a technique for mapping human brain activity.

Bsoft is a collection of programs and a platform for development of software for image and molecular processing in structural biology. Problems in structural biology are approached with a highly modular design, allowing fast development of new algorithms without the burden of issues such as file I/O. It provides an easily accessible interface, a resource that can be and has been used in other packages.

EMAN is a suite of scientific image processing tools aimed primarily at the transmission electron microscopy community, though it is beginning to be used in other fields as well.

EMAN2 is the successor to EMAN1. It is a broadly based greyscale scientific image processing suite with a primary focus on processing data from transmission electron microscopes.

FreeSurfer is a set of automated tools for reconstruction of the brain's cortical surface from structural MRI data, and overlay of functional MRI data onto the reconstructed surface.

FSL is a comprehensive library of image analysis and statistical tools for FMRI, MRI and DTI brain imaging data.

A package for the modeling of atomic resolution structures into low-resolution density maps e.g. from electron microscopy, tomography, or small angle X-ray scattering.

TORTOISE (Tolerably Obsessive Registration and Tensor Optimization Indolent Software Ensemble) is for processing diffusion MRI data.

back to top

Xplor-NIH is a structure determination program which builds on the X-PLOR program, including additional tools for NMR analysis. The advantage of running Xplor-NIH on Biowulf would be to spawn a large number of independent refinement jobs which would run on multiple Biowulf nodes.

POV-RAY (Persistence of Vision RAYtracer) is a high-quality tool for creating three-dimensional graphics. Raytraced images are publication-quality and 'photo-realistic', but are computationally expensive so that large images can take many hours to create. POV-Ray images can also require more memory than many desktop machines can handle. To address these concerns, a parallelized version of PovRay has been installed on the Biowulf system.

CNS provides the most commonly used algorithms in macromolecular structure determination.

HADDOCK (High Ambiguity Driven protein-protein DOCKing) is an approach for predicting protein-protein complex structures that makes use of biochemical and/or biophysical interaction data such as chemical shift perturbation data resulting from NMR titration experiments or mutagenesis data.

The Rosetta++ software suite can perform de novo protein structure predictions, identify low free energy sequences for target protein backbones, predict the structure of a protein-protein complex from the individual structures of the monomer components, incorporate NMR data into the basic Rosetta protocol to accelerate the process of NMR structure prediction, and more...

Chemical-Shift-ROSETTA is a robust protocol to use NMR chemical shifts for de novo protein structure generation by SPARTA-based selection of protein fragments from the PDB, in conjunction with a regular ROSETTA Monte Carlo assembly and relaxation method.

ZDOCK predicts protein-docking models, and uses a fast Fourier transform to search all possible binding modes for proteins, evaluating based on shape complementarity, desolvation energy, and electrostatics.

Command-line homology model builder (written by Jason (Zhexin) Xiang) on par with MODELER. To use, type nest at the prompt.

back to top

Neuron is a simulation environment for modeling individual neurons and networks of neurons. It provides tools for conveniently building, managing, and using models in a way that is numerically sound and computationally efficient. It is particularly well-suited to problems that are closely linked to experimental data, especially those that involve cells with complex anatomical and biophysical properties.

back to top

Swarm is a program designed to simplify submitting a group of commands to the Biowulf cluster. Some programs do not scale well and thus are not suited to true parallelizing. Other programs may be such that each individual job is very short, but many such jobs need to be run. Such programs are well suited to running 'swarms of single-threaded jobs'. The Swarm program simplifies this process. See the documentation for details. Download swarm.

Modules are a convenient and effective way to dynamically set up the environment for different applications.

A list of the scripting languages that are available, such as Perl, Python, PHP, and others.

back to top

A collection of utility programs is also available on the Biowulf cluster.