Biowulf at the NIH
RSS Feed
Mach 1.0 on Biowulf

MACH 1.0 is a Markov Chain based haplotyper. It can be resolve long haplotypes or infer missing genotypes in samples of unrelated individuals.

Mach was developed by Goncalo Abecasis at the University of Michigan. Mach website

Small numbers of simultaneous Mach jobs (< 3 simultaneous) are most easily run on Helix. Mach on Biowulf is intended for large numbers of simultaneous jobs, or Mach jobs that will run for a long time.

Minimac is a low memory, computationally efficient implementation of the MaCH algorithm for genotype imputation. It is related to Mach and will get loaded as part of the Mach module. [Minimac webpage].

ChunkChromosome is a helper utility for minimac and MaCH. It can be used to facilitate analyses of very large datasets in overlapping slices. It will get loaded as part of the Mach module. [ChunkChromosome webpage.

Setting up a swarm of Mach jobs

It is easiest to run a large number of simultaneous Mach jobs via swarm.

The Mach environment can be set up with the 'module load mach1' command. This will load the latest version:

[user@biowulf]$ module load mach1

[user@biowulf]$ module list
Currently Loaded Modulefiles:
  1) mach1/1.0.18

To load a specific version, use the modules commands to see available versions and load one, as in the example below:

[user@biowulf]$ module avail mach1

---------------- /usr/local/Modules/3.2.9/modulefiles --------------
mach1/1.0.12  mach1/1.0.17  mach1/1.0.18

[user@biowulf]$ module load mach1/1.0.17

[user@biowulf]$ module list
Currently Loaded Modulefiles:
  1) mach1/1.0.17

#------- this file is swarmcmd ------------------
mach1 --datfile sample1.dat --pedfile sample1.ped
mach1 --datfile sample2.dat --pedfile sample2.ped
mach1 --datfile sample3.dat --pedfile sample3.ped
mach1 --datfile sample4.dat --pedfile sample4.ped
[...]
If each Mach process requires less than 1 GB of memory, submit this to the batch system with the command:
swarm -f cmdfile --module mach1/1.0.18

If each Mach process requires more than 1 GB of memory, use

swarm -g # -f cmdfile --module mach1/1.0.18
where '#' is the number of Gigabytes of memory required by each Mach process.

The swarm program will package the commands for best efficiency and send them to the batch system.

Documentation

MACH tutorial
Swarm documentation