BIONET.MOLBIO.GENE-LINKAGE
FREQUENTLY ASKED QUESTIONS (FAQ)
as of 4/29/96

Local copy of Dean Flanders FAQ from lenti.med.umn.edu

Page Updated: Friday, 07-Jun-1996 09:15:41 EDT

1.0 - FAQ ADMINISTRATIVE INFORMATION

2.0 - INFORMATION RESOURCES

3.0 - GENE-LINKAGE SOFTWARE OVERVIEW

4.0 - LINKAGE PACKAGE SPECIFIC INFORMATION

5.0 - COMPUTER ADMINISTRATION AND OPTIMIZATION

6.0 - MOLECULAR BIOLOGY ISSUES IN LINKAGE ANALYSIS

1.0) FAQ ADMINISTRATIVE INFORMATION

1.1) Where can I obtain the bionet.gene-linkage FAQ?

It is available by anonymous FTP from lenti.med.umn.edu in /pub/linkage. The best way to view the FAQ is via the WWW, from http://lenti.med.umn.edu/linkage/linkage.html. The FAQ is also available via gopher at lenti.med.umn.edu in /Biologically Related Information/Linkage Analysis. The FAQ will also be posted in the USENET groups bionet.molbio.gene-linkage and news.answers the 1st and 15th of each month.

1.2) Who created the bionet.molbio.gene-linkage FAQ?

Darrell Root (rootd@ohsu.edu) originally started the bionet.molbio.gene-linkage FAQ in May of 1994 in an attempt to share information and experiences that may be of use to other people involved in linkage analysis. I am Dean Flanders (dean@lenti.med.umn.edu), the current maintainer of the FAQ, and began my tenure in December of 1994. The FAQ will never serve as a short course in linkage analysis, but instead it will ideally be a place to help beginners get started in the area and to help experts not make the same mistakes as others. All of the information in this FAQ by no means comes completely from Darrell or me, but from a large number of people that work in the area of linkage analysis. Their names are listed at the end of this section of the FAQ.

1.3) How can I help improve this FAQ?

Feel free to send any information that you think would be beneficial for other people who are just beginning in linkage or have been doing linkage for years to linkage@lenti.med.umn.edu. Also, if there is information you would like to see or errors in this FAQ please let us know by sending email to linkage@lenti.med.umn.edu. If you would like to see something changed or added to the FAQ please to send it in a format that can be quickly incorporated into the FAQ, such as correcting the errors in the section of the FAQ and emailing it back to the FAQ maintainer.

1.4) Contributors to this FAQ.

David Adler, John Attwood, Michael Boehnke, Marcia Brott, Don Bowden, Michael Braverman, Lucien Bachner, Young B Choi, Kevin Crawford, Dave Curtis, Peter Doris, Bennett Dyke, David Featherstone, Dean Flanders, Jonathan Haines, Rob Harper, Pierre Janssens, David Kikuchi, Wentian Li, Tim Little, Tara Matise, Eli Meir, Mike Miller, Jurg Ott, Darrell Root, Alex Schaffer, Robert Stodola, Frank Visser, Dan Weeks, Ellen Wijsman, Scott Wildenberg, Matthias Wjst, and Kim Worley.

1.5) When was the FAQ last updated?

The last update of the FAQ was on 1996/04/29. All sections should indicate what month and year they were last updated. In addition one can go to the list of updates that are maintained at http://lenti.med.umn.edu/linkage/gefaqup.html. This is a list in chronological order of updates with direct links to the updates in the FAQ.

2.0) INFORMATION RESOURCES

2.1) What anonymous-FTP sites have programs/utilities useful for linkage analysis?

At present there is no one site that serves as a repository for all linkage software. So the best way of finding FTP site information is to read the software package information below, which should provide all of the necessary FTP information.

2.2) What books are helpful when learning about linkage analysis?

2.3) What WWW sites have useful linkage information?

This is in no way an attempt to list the explosion of WWW sites of biological interest on the Internet, but it is a listing of some of the major ones and ones of particular interest in linkage analysis.

http://www.yahoo.com/Science/Biology/Genetics/, this is a list of sites related to genetics that is kept very up to date.

http://www.gdb.org/Dan/DOE/intro.html, this is a short course of sorts that gives some very basic information on how to go about gene mapping.

http://lenti.med.umn.edu/linkage/linkage.html, which is serving as linkage analysis home page, will have links to all of the WWW sites listed as well as gopher servers and a hypertext version of the FAQ.

http://www.genethon.fr, the Genethon Center, Genethon's home page.

http://www.chlc.org, the Cooperative Human Linkage Center, CHLC's home page.

http://gdbwww.gdb.org has a version of GDB available and access to OMIM.

http://www.pathology.washington.edu has human and mouse standard idiograms. The idiograms are useful for making illustrations for gene mapping and for constructing abnormal chromosomes. The PostScript idiograms can be manipulated band by band with illustration software such as Adobe Illustrator, Aldus FreeHand, Canvas, and Altsys Virtuoso.

http://www.gene.ucl.ac.uk/~john/programs.html contains software by John Attwood.

http://www.gene.ucl.ac.uk/packages/dcurtis/ contains software by Dave Curtis.

http://linkage.cpmc.columbia.edu has a lot of useful information on linkage analysis; in particular it offers information on software, the course offered by J. Ott, and the Linkage Newsletter.

2.4) What gopher sites have useful linkage information?

There is one that will be maintained with links to other gophers of interest in linkage analysis, as well as links to other gopher servers of biologically related information. It is at lenti.med.umn.edu, and the path to it is Biologically Related Information/Genetic Linkage Analysis.

2.5) What "linkage centers" make information and assistance available to researchers?

One such center is the Cooperative Human Linkage Center (CHLC). The goal of this center is to generate a high resolution map of the human genome and rapidly distribute this information to the genome community. They are in the process of identifying more human markers and developing high resolution framework maps. One can obtain information about CHLC from via gopher from gopher.chlc.org , http://www.chlc.org , ftp://ftp.chlc.org, info-server@chlc.org, or help@chclc.org. Among other things, CHLC provides primer selection and linkage analysis via email. Information on those services can be found by sending email to: primer-server@chlc.org and linkage-server@chlc.org.

David Featherston (davidf@caos.kun.nl) from the Dutch EMBnet Node is starting a linkage analysis service: software availability, support/advice initially, possibly training, and perhaps consultancy. At present they have MapMaker/EXP 3.0b, MapMaker/QTL 1.1, Lathrop and Lalouel's LINKAGE package, and Schaffer's FASTLINK package. This means that if users have Genomics Package accounts at the CAOS/CAMM Center, they can use these programs on their fast computers to analyze their data sets. Please contact David Featherston if you are interested in more information about such an account.

A major European center is the Human Genome Mapping Project Resource Centre in Hinxton, England. It is funded by the Medical Research Council, and has a broad range of software and databases available, mainly focused on the Human Genome Project. In the area of Linkage analysis it has the following programs available: FASTLINK, CRIMAP, MAP MAPMAKER, HOMOZ, PEDPACK, APM, SIMLINK, FASTMAP, COMDS, DOLINK & QDB, HANDLINK, GAS and Jurg Ott's collection of programs. The aim is to have all major (Unix-based) gene linkage packages available for our users. The Center also gives courses on linkage analysis. More information about the Centre can be obtained from it's home- page: http://www.hgmp.mrc.ac.uk/. If you want to register as user, send e-mail to admin@hgmp.mrc.ac.uk for a registration form. For more information about the gene-linkage services you can contact Frank Visser (fvisser@hgmp.mrc.ac.uk).

INFOBIOGEN: This is the French GDB node that offers also a linkage server and assistance in the process of linkage analysis. It uses LINKAGE, FASTLINK and other programs running on a Sparc Center 2000E with 1 giga RAM, 4 Gig of swap, and 6 CPU's. For furhter information contact Lucien Bachner at bachner@infobiogen.fr or look at the following web site http://www.infobiogen.fr/.

2.6) What journals are useful for linkage analysis?

American Journal of Human Genetics, Annals of Human Genetics, Computer Applications in Biosciences (CABIOS), Genomics, Genetic Epidemiology, Human Genome News (available by gopher from gopher.gdb.org), Human Genome Project Journal, Human Heredity, Journal of Computational Biology, Nature Genetics.

2.7) What courses are offered on linkage analysis?

There are three primary courses offered throughout the yeart on human linkage analysis. One is a four day course offered once per year by Drs. Margaret Pericak-Vance and Jonathan Haines. The next course will be offered in late April, 1996 in Boston. The focus of the course is on the overall design of a human disease gene mapping study, with particular emphasis on the problems of common/complex disorders. The course covers clinical classification, pedigree ascertainment, collection, and follow-up, basic linkage techniques, linkaghe and association analysis for complex disorders, laboratroy technqiues for genotyping, and gene characterization. The courseemphasizes the global decision-making process, rather than details of specific techniques. For more information write to Genetic Methods Course; c/o Dr. Margaret Pericak- Vance; Division of Neurology, Box 2900; Duke University Medical Center; Durham, NC 27710, or you can send e-mail to genclass@genemap.mc.duke.edu. The remaining two courses are both offered by Jurg Ott on the software used for human linkage. One is a beginner's course, and the other an advanced course for those familiar with the linkage analysis software. These courses are offered several times throughout the year and you can get more information by contacting Katherine Montague/Jurg Ott; Columbia University, Unit 58; 722 West 168th Street; New York, NY 10032. In addition you can fax to (212)568- 2750 or call (212)960 2507 or email km165@columbia.edu for more information.

A new beginner's level linkage course will be offered in French October 24-25 1995 by INFOBIOGEN, in Villejuif south suburb of Paris. It's free for all academic institutions. For furhter information contact Lucien Bachner at bachner@infobiogen.fr or linkage@infobiogen.fr.

3.0) GENE-LINKAGE SOFTWARE OVERVIEW

3.1) What database management programs do people use for linkage data?

One must be aware that some pedigree drawing software can also serve as databases for data as well as drawing pedigrees, see the next question in the FAQ for a description of those packages.

CEPH DBMS: The CEPH DataBase Management System is specifically designed for chromosome mapping with CEPH style pedigrees. It can output data in ped.out format for the LINKAGE package. This program can now be picked up via anonymous FTP from ftp.cephb.fr in pub/ceph_genotype_db.

DOLINK: This DOS custom database program by D. Curtis manages genetic data and sets up input files for linkage analysis. It is available from ftp.gene.ucl.ac.uk. The DOS and Windows versions of DOLINK program help manage genetic data and setup analysis. It is available with the C++ source allowing compilation on Unix host running X and possibly a Macintosh.

File Express: This is a DOS shareware database which can be used to hold data for DOLINK (largely superseded by QDB). It is available as fe51-a/b/c.zip via FTP from ftp.gene.ucl.ac.uk in /pub/packages/dcurtis.

LABMAN and LINKMAN: These are linkage analysis databases for holding linkage data and exporting it in various formats for linkage analysis. They are available via anonymous FTP from lenti.med.umn.edu in /pub/linkage/labman. These databases were developed by P. Adams of Columbia University.

LYNKSYS: This custom-made database program was written by J. Attwood and S. Bryant. Although they continue to use it, J. Attwood suggests using DOLINK instead. LINKSYS is not currently available at any FTP sites.

Map Manager: It is a program for the Macintosh which helps analyze the results of genetic mapping experiments using backcrosses, intercrosses, or recombinant inbred strains. In addition it also has tools for statistical analysis of experiments. The program was created by K. F. Manly at the Roswell Cancer Institute and is available via FTP from mcbio.med.buffalo.edu in /pub/MapMgr.

QDB: This is a database program available as DOS and Windows versions and with C++ source allowing compilation for X and possibly Macintosh. It is available as qdb16a.zip via FTP from ftp.gene.ucl.ac.uk in /pub/packages/dcurtis.

3.2) What programs are available for pedigree drawing?

One of the tricks of managing individuals in a mapping study is trying to get the database you are using to export your family data in a format acceptable for input into pedigree drawing programs. The marriage between these two can be of great assistance. However, some pedigree drawing programs have databases as a part of the package.

CYRILLIC: This is a pedigree editor for Windows with facilities for including marker data which you can then have it output the input files for LINKAGE. It is Windows-based, so input of the pedigree is very efficient. You also have a data form associated with each individual where you can store names and other pertinent data. It also has the ability to interface with most standard PC databases. This program is not public domain and is available from Cherwell Scientific Publishing. If you would like more information send email to csp@sable.ox.ac.uk and they would be very happy to send you a demo of the program. Version 2 of Cyrillic should be coming out late summer of 1995.

FTREE: This is a DOS pedigree program written by R. Go at the University of Alabama.

GENETREE: GeneTree 1.0 is a DOS package which provides a convenient way to draw family tree diagrams suitable for genetics or genealogy. The package consists of the GeneTree program, which draws pedigree diagrams using a command language; and SC, using a menu driven program that facilitates creation of GeneTree commands. GeneTree and SC are made available with program manuals, examples of family tree diagrams, and a GeneTree Quick Reference Guide. GeneTree is written in C. Note that it is a DRAWING program and does not compute genetic parameters. The GeneTree program is available from wijsman@max.u.washington.edu at a price of $125 (because of licensing fees from a private company which wrote one of the drivers used in the program).

KINDRED: This new DOS database program, distributed by Epicenter Software, is specifically designed for linkage analysis. A free demo is available by calling (818)-304-9487. In addition to database duties, this program will draw pedigrees, haplotype marker data, and can output data in LINKAGE format.

PEDPAK: This package is designed to handle large datasets for animals. The package was written and distributed by Alan Thomas, who is in Bath, England. The software is not public domain and must be purchased.

Pedigree/Draw: It is a Macintosh based program, written by B. Dyke, P. Mamelka, and J. MacCluer. It is available from bdyke@darwin.sfbr.org or Pedigree/Draw; Department of Genetics; Southwest Foundation for Biomedical Research; PO. Box 28147; San Antonio, TX 78228-0147. An upgrade from a previous version is $10, the current version is 4.4. Documentation costs $10 printed and the full package including documentation costs $45. There is a script which converts linkage format to Pedigree/Draw available via anonymous FTP at ftp.ee.pdx.edu in /pub/users/cat/rootd/convert.new.

PEDRAW: This program is a pedigree drawing program written by D. Curtis for DOS and available via FTP from ftp.gene.ucl.ac.uk in /pub/packages/dcurtis. The most current version is called pedraw16.zip. A companion program to PEDRAW is PEDHELP, it is a pop-up help for PEDRAW.

PAP: The Pedigree Analysis Package (PAP) is a set of FORTRAN 77 programs for computing likelihoods and simulating phenotypes of genetic models on pedigrees. It is available via gopher from corona.med.utah.edu in Publicly Accessible Software, probes(sts), etc./software/pap.

3.3 What linkage analysis helper programs are available?

CEPH2CRI: This program converts to output from the CEPH DBMS into the format useable in CRI- MAP. It can be found at ftp.gene.ucl.ac.uk in /pub/packages/linkage_utils.

EASISTAT: This is a DOS statistics package, it contains EASIGRAF which draws graphs of lod scores from the output of FASTMAP. The lod scores first need to be run through the TABLE utility, which is included in the DOLINK and FASTMAP packages. It is available as estat21.zip via anonymous FTP from ftp.gene.ucl.ac.uk in /pub/packages/dcurtis.

FIRSTORD: A demonstration of a method for preliminary ordering of loci based on two-point lod scores. It is available as DOS executable and C source called first11.zip from ftp.gene.ucl.ac.uk in /pub/packages/dcurtis.

LINKMEND: A program for converting LINKAGE-format files to MENDEL-format files. It is available by anonymous FTP from watson.hgen.pitt.edu as linkmend.tar.Z.

MAP: A program to convert LINKMAP output into a table of multipoint lod scores. It is available by anonymous FTP from watson.hgen.pitt.edu as map.tar.Z.

PEDPREP: A program for converting a MENDEL-format pedigree file ('pedm.dat') to a Pedigree/Draw file for graphical display on a Macintosh. It is available by anonymous FTP from watson.hgen.pitt.edu as pedprep.tar.Z.

RECODE: A program for recoding character or sized-allele data into numbered-allele data. It is available by anonymous FTP from watson.hgen.pitt.edu as recode.tar.Z.

3.4) Why are some programs used primarily for human chromosome mapping, while others are used for human disease mapping?

Any family can be used for chromosome mapping, so CEPH has picked a particular family "shape" and generated a large database with these families. Programs designed for chromosome mapping can be optimized for using these families, reducing the time needed for calculations. Only families afflicted with a disease can be used for disease gene mapping. As a result, programs designed for disease gene mapping need to be able to deal with arbitrary pedigrees. In addition, these programs need to be able to handle incomplete penetrance.

3.5) What programs are used for physical mapping?

CLINKAGE: This is the special version of the LINKAGE programs for 3-generation CEPH pedigrees and codominant markers. The PC and VAX versions are available by FTP from linkage.cpmc.columbia.edu. The Unix version is available from corona.med.utah.edu.

CHROMLOOK: This is a program for generating haplotypes of marker data in nuclear pedigrees with all individuals genotyped. It identified both the maternal and paternal recombination events, and provides the resulting haplotypes and recombinants in an easy-to-read format. It should be available via FTP server sometime this summer. It was written by Jonathan Haines and he can be contacted at haines@helix.mgh.harvard.edu.

CINTMAX: This program is an extensively modified version of CILINK. It uses map functions to model the transmission of gametes from parent to child. Some of these map functions are multilocus feasible, and so can be used with more than 3 loci at a time. It is available by anonymous FTP from watson.hgen.pitt.edu as cintmax.tar.Z.

CRI-MAP: This program has been used for chromosome mapping for years. It has options which can generate maps, calculate order probabilities, and printout recombination data. It works on .gen files with data from CEPH style families. It is written in K& R type C code, and the author Phil Green has successfully ran it on Unix, DOS, VMS, and Macintosh systems. It is not available via anonymous FTP. Phil Green distributes CRI-MAP freely ONLY to academics/academic institutions. Contact him at: Phil Green; Molecular Biotechnology Dept., FJ-20; Fluke Hall on Mason Rd.; Univ. of Washington; Seattle, WA 98195; USA; Phone (206) 685-4341; Fax (206) 685-7344; or email phg@u.washington.edu.

FASTMAP: This program produces quick approximation to multipoint lod score, available as a DOS executable and C source as fstmap11.zip from ftp.gene.ucl.ac.uk in /pub/packages/dcurtis.

MULTIMAP: This LISP based expert system uses an customized version of CRI-MAP to create a chromosome map. It is available via anonymous FTP from chimera.gene.cwru.edu. The authors T. Matise, M. Perlin, and A. Chakravarti continue to improve the code, add new functions, and provide excellent support. When used with the CRI-MAP chrompic option (to find double-recombinations to identify possible errors), it is incredibly useful. This is Unix-only (supported for DEC-Ultrix, HP9000, and Suns). The customized CRI-MAP version (called LISPCRI) is distributed at the FTP site, but was not meant to be used independently of MULTIMAP.

MAPMAKER: Dr. Eric Lander; Whitehead Institute; 9 Cambridge Center; Cambridge, MA 02142; mapm%mitwibr@mitvma.mit.edu. MAPMAKER is available via FTP at genome.wi.mit.edu in /pub/mapmaker3.

RHMAP: It is a set of three FORTRAN 77 programs that provide the means for a complete statistical analysis of RH mapping data. RH2PT is a program for data description and two-point analysis. It provides estimates of locus-specific retention probabilities and pairwise breakage probabilities, two-point lod scores for linkage of the various marker pairs, and linkage groups. RHMAP is now also available at the following URL http://www.sph.umich.edu/group/statgen/software. If you would like email notification of updates please send email to boehnke@umich.edu.

3.6) What programs are used for disease gene mapping?

APM: The Affected Pedigree Member Method distribution contains the new APM programs, a new file conversion utility, and a histogram/statistics generator. To build the entire distribution, you need C, Pascal, and FORTRAN compilers, and a make utility is also helpful. The programs which are built include: APM, a program to calculate the single locus statistic over one or several marker loci; SIM, a program to simulate pedigrees and, using output files of APM, test for asymptotic normality of the null distribution; APMMULT, a program to generate the multilocus statistic; SIMMULT, a program like SIM but which simulates recombination and uses the output of APMMULT; CHAPM, a program to convert LINKAGE files to APM files, or APM files of one format to APM files of another format; and HIST, a program to compute various statistical figures, plot a histogram, and compute empirical p-values. The APMember package by D. Weeks is available via anonymous FTP from watson.hgen.pitt.edu. Additionally, there are pre-compiled executables of the APM programs for Sun-OS and Sun-Solaris available as newapm.sunos.tar.Z newapm.solaris.tar.Z.

CLUMP: A Monte Carlo method for assessing significance of a case-control association study with a multi-allelic marker, available as DOS executable and C source. It is available as clump.zip via anonymous FTP from ftp.gene.ucl.ac.uk in /pub/packages/dcurtis.

ESPA: This is a program used for extended sib pair analysis. It comes in a DOS version and can only look at markers containing 5 alleles. It was written by Lodeijk Sandkuijl and can be obtained by writing to him at Voorstraat 27; Delft 2611 JK; THE NETHERLANDS.

ERPA: A program for carrying out nonparametric linkage analysis, available as DOS executable and C source. It is called erpa12.zip via anonymous FTP at ftp.gene.ucl.ac.uk in /pub/packages/dcurtis.

FASTLINK: This is a much faster implementation of the main programs in LINKAGE (LODSCORE, ILINK, MLINK, LINKMAP) in C. The code is faster due to the use of new and better algorithms for the time intensive parts of the computation. FASTLINK is distributed by A. A. Schaffer from the FTP site softlib.cs.rice.edu (cd pub/fastlink). Version 1 of FASTLINK was instigated by R. W. Cottingham Jr. with implementation done by R. M. Idury and A. A. Schaffer. Version 2 of FASTLINK includes further improvements implemented by A. A. Schaffer, S. K. Gupta, and K. Shriram, with guidance from R. W. Cottingham Jr. Version 2 includes the capability to recover gracefully from a crash of the computer on which FASTLINK is running. FASTLINK was initially intended for UNIX machines, but the distribution now includes instructions for porting to VMS as well as a version for DOS. FASTLINK allows you to compile in "fast" or "slow" mode (the slow version of FASTLINK is still much faster than the old LINKAGE programs). The "fast" version uses lots of memory, but uses the extra memory to contain some of the intermediate results which are repetitively recalculated in the "slow" version (and the old linkage package). Best speed can be obtained by setting up 300 megs of virtual memory on a Unix workstation and using the "fast" version. Schaffer maintains a mailing list of fastlink users (fastlink-list@cs.rice.edu) to answer queries and keep users up to date. Schaffer, Gupta, and other colleagues at Rice University have implemented parallel versions of FASTLINK for either a shared-memory multiprocessor or a network of UNIX workstations. This version is now available as FASTLINK 2.3P at the above mentioned FTP site. Write to schaffer@cs.rice.edu for more information.

GAS: It provides facilities for reading, writing, sectioning and performing statistical analyses on phenotypic and genotypic data and one of its features is sib pair analysis. It has been developed within the Department of Medicine at Oxford University and is available via FTP from well.ox.ac.uk in the directory pub/genetics/gas.

GREGOR: It is a piece of DOS based software for producing simulated genetic data. It does not perform linkage analysis, but it may be useful for testing methods or assumptions about linkage analysis. GREGOR is operated by a series of hierarchical menus that permit the user to define hypothetical genetic scenarios (gene positions and effects) and produce simulated data-sets for a variety of population structures. GREGOR is available by FTP from the site sifon.cc.mcgill.ca in pub/McGill-Contrib. Questions should be directed to the authors tinker@agradm.lan.mcgill.ca or mather@agradm.lan.mcgill.ca.

LINKAGE: This package of programs was developed by M. Lathrop with help from J. M. Lalouel, C. Jlier, and J. Ott. The LINKAGE package consists of several analysis and several utility programs. Versions are available for DOS, OS2, VAX, and Unix platforms. Here are some of the analysis programs: MLINK: 2-point lod-score calculations at fixed recombination distances; LINKMAP: multipoint lod score calculations at fixed distances; ILINK: calculates the recombination distance with the highest lod-score. Unix versions are available via gopher from corona.med.utah.edu in Publicly Accessible Software, probes(sts), etc./software/linkage, DOS and VMS versions are available from linkage.cpmc.columbia.edu, or on floppy disks, when you write to: Katherine Montague/Jurg Ott; Columbia University, Unit 58; 722 West 168th Street; New York, NY 10032. Send pre-formatted DOS disks if you request linkage by mail. You can send email to km165@columbia.edu if you need more information regarding mail requests for the LINKAGE package.

LIPED: This DOS program written by J. Ott calculates probabilities for linkage between disease markers and genetic markers. Its input file differentiates between phenotypes and genotypes. As a result, this program is easiest to use when your data is from "old-style" genetic-markers (such as blood phenotype data). This was one of the first programs to do linkage analysis calculations, the LINKAGE package is more commonly used now.

SAGE: Statistical Analysis Package for Genetic Epidemiology is composed of 18 programs: AGEON: Estimating the Distribution of Age-of-Onset, ASSOC: marker-trait Associations in Pedigree Data, BCROSS: Genetic Hypothesis for Quantitative Data on Inbred strains, their F1 and Backcross(es), CLUSTR: Power Transformation to Obtain Normality and Homoscedasticity from Clustered Data, FCOR: Family Correlations, FSP: Family Structure Program, LODLINK: Lod Score Linkage Analysis, MAPLOC: Mapping a Disease Related Trait Relative to a Set of Linked markers, MAXFUN: Function maximization Subroutine, REGC,REGD,REGTL,REGTN: Segregation Analysis Programs, RELATE: Relationship to Proband, SIBPAL: Sib-Pair Linkage Analysis, and DBSORT, RENUM, SPLIT: Toolkit Programs. Author Dr. R.C. Elston, address Department of Biometry and Genetics; Louisiana State University Medical Center; 1901 Perdido Street; New Orleans, Louisiana 70112, USA. The email contact address is sage@haldne.biogen.lsumc.edu. It is available for the following operating systems: VAX, SunOS 4.1.x, Apple Macintosh II, and DOS. This program is not shareware and must be bought.

X-LINKED APM: X-linked version of the APM programs (single-marker), see APM above for more information on APM. It is available by anonymous FTP from watson.hgen.pitt.edu as xlinkapm.tar.Z. Also, xlinkapm.readm is available there, which is a readme about the X-linked version of the APM programs.

3.7) What programs are available for running linkage simulations?

FASTSLINK: This is program is just like SLINK (see SLINK below), but it utilizes the enhancements incorporated into FASTLINK. It is available via anonymous FTP from watson.hgen.pitt.edu.

SIMAPM: Is the SLINK based simulation program for the APM package. This represents a hacked together package which only runs under a Unix system. You will need FORTRAN, Pascal, and C compilers to use this package. It is available via anonymous FTP from watson.hgen.pitt.edu

SIMLINK: This FORTRAN program developed by L. Ploughman and M. Boehnke simulates linkage analysis on a family, and gives you an estimate the probability, or power, of detecting linkage in a given family. It allows the researcher to determine whether a family has sufficient informativeness to detect linkage. SIMLINK requires large quantities of memory. It was written for DOS, but has been ported to many platforms. It is available from: Michael Boehnke; Department of Biostatistics; School of Public Health; University of Michigan; Ann Arbor, MI 48109-2029. No postage-money or blank disks are necessary to get SIMLINK sent to you. SIMLINK may be available via anonymous FTP soon. For further information send email to boehnke@umich.edu. SIMLINK is now also available at the following URL http://www.sph.umich.edu/group/statgen/software. If you would like email notification of updates please send email to boehnke@umich.edu.

SLINK: It is a Pascal program developed by D. Weeks, M. Lathrop, and J. Ott. It is similar to SIMLINK. It is more general than SIMLINK in that it allows for partial marker typing at the locus to be generated, but it runs slower than SIMLINK. It is available from linkage.cpmc.columbia.edu and watson.hgen.pitt.edu or on floppies (use the same address as for LINKAGE).

3.8) What programs are available to help detect errors in linkage data?

Typically the linkage packages in and of themselves will detect errors in linkage data that are obvious, such as impossible phenotypes and genotypes, and obvious errors in pedigrees. Typically the programs will just grind to halt and allow you to fix the error, and try again until you finally succeed. However, errors that "make sense" to linkage programs will not be detected.

GENO: It is a genotype entry/edit tool that will allow you to easily enter and manipulate genotyping data. You can also check the quality of your data with the built-in Mendelian inheritance checker. The author the of program is Matt Stephenson and can be reaced at stephenm@bioimage.mfldclin.edu. The program is available via FTP from dgabby.mfldclin.edu in /pub/geno.

GENOCHECK: It is an error checking program designed to identify individuals and loci that are likely to contain errors. the statistical method was designed to identify typing error, but is general enough to pinpoint any unlikely genotype still consistent with Mendelian inheritance. The author is Dr. Margaret Gelder Ehm the ftp site is at softlib.cs.rice.edu and it is in /pub/GenoCheck. It is written for Unix.

3.9) What programs help me recode genetic markers?

DOLINK can downcode alleles automatically. However, the main use of DOLINK is to prepare files for LINKAGE from a database. In addition P. Adams package LABMAN and LINKMAN have features for the recoding of alleles.

4.0) LINKAGE PACKAGE SPECIFIC INFORMATION

4.1) How do I get my CEPH data into CRI-MAP format?

You can output the file in linkage format and use link2gen in CRI-MAP. The disadvantage here is that your marker names are separated from your data and it's easy to make a mistake and get them mixed up. You can output the file in ped.out format and use CEPH2CRI mentioned above in the FAQ to do the conversion as well.

4.2) How do you calculate MAXHAP?

MAXHAP is the maximum possible number of haplotypes in your analysis. You multiply together the number of alleles at each locus used in a particular run; not all loci in your dataset, just the loci you are using in that particular calculation. Remember that the affection status counts as two alleles, regardless of the number of liability classes. For example, if a dataset has the following information: the liability classes, marker A has 3 alleles, marker B has 4 alleles, and marker C has 5 alleles and your run includes a LINKMAP run between affection status, marker A, and marker B, then your MAXHAP must be at least 2*3*4=24.

FASTLINK 2.3P includes an auxiliary program called ofm (optimize for maxhap) which can be used to automatically recompile the desired program with the ideal value of maxhap under the following assumptions: using UNIX or VMS (not DOS), running ILINK or LINKMAP or MLINK (not LODSCORE), the main script is produced by the LINKAGE auxiliary program LCP), and the locus file is produced by the LINKAGE auxiliary program PREPLINK; see README.ofm in the FASTLINK distribution.

4.3) When should you use binary coding instead of numeric allele coding?

Usually there is no advantage to coding disease loci as either binary or numeric using liability classes. Generally, binary coding is more complex in that we humans often have a hard time thinking that way. Some of the codominant phenotypes lend themselves to binary coding; for example, ABO blood types: A (101), B (011), O (001), AB (111), and unknown (000). Since you cannot distinguish AO from AA at the phenotype level you code both genotypes as (101), presence of A and O. In reality O represents absence of both A and B. However, do not code using (000), since it would be an unknown. Use of binary codes has decreased since DNA markers have come into use since they allow one to type an individual with respect to genotype. You can use binary codes if you have phenotypic data which does not allow for the discrimination of the underlying genotype exactly, and one can code it as the presence with 1 or absence with 0 of factors such as the A and B antigens. Binary codes allow the representing loci with codominant and dominant mode of inheritance, while allele number notation is good only for codominant loci. Few people use binary factor notation. They either use allele numbers for codominant loci, or affection status notation for dominant loci. The main reason why binary factor notation is still currently used is that CEPH's database is in that notation.

4.4) What do you do when allele frequencies not add up to 1, for example, when alleles are not present in a pedigree under study?

The best approach is to specify n+1 alleles, where there are n alleles actually observed in the pedigree. Use the correct allele frequencies for the n alleles, and for the n+1 allele, use 1 minus the sum of the frequencies of the observed alleles.

4.5) I use LINKAGE and/or FASTLINK, what references should I cite in my papers?

FASTLINK users should cite:

Cottingham, R. W. Jr., Idury, R. M., and Schaffer, A. A. "Faster Sequential Linkage Computations." American Journal of Human Genetics. 53:252-263, 1993.

Schaffer, A. A. , Gupta, S. K., Shriram, K., and Cottingham, R. W. Jr. "Avoiding Recomputation in Linkage Analysis". Human Heredity. 44(4):225-37, 1994 Jul-Aug.

In addition, all FASTLINK and LINKAGE users should also cite the LINKAGE papers:

Lathrop, G.M., Lalouel, J.M., Julier, C. , and Ott, J. "Strategies for Multilocus Analysis in Humans." PNAS. 81:3443-3446, 1984.

Lathrop, G.M. and Lalouel, J.M., "Easy Calculations of LOD Scores and Genetic Risks on Small Computers." American Journal of Human Genetics. 36:460-465, 1984.

Lathrop, G.M., Lalouel, J.M., and R. L. White. "Construction of Human Linkage Maps: Likelihood Calculations for Multilocus Analysis." Genetic Epidemiology. 3:39-52, 1986.

4.6) What is recoding of alleles all about anyway?

One of the problems with highly polymorphic markers is that they can increase the computational requirements of the computers by several orders of magnitude due to the large number of alleles present. This can put the computation of some lod scores out of reach for DOS computers and take many days on higher end systems. So it is important to use methods that reduce the number of alleles, and recoding will reduce the number of alleles in your calculations.

The method of recoding of alleles described by J. Ott in the Annals of Human Genetics, 42:255-257 (1978) works very well, but can only be done when the mode of inheritance of the disease is known. An article inspired by Ott's original work written M. Braverman in Computers and Biomedical Research, 18:24-36 (1985) extends the recoding of alleles in two ways: 1) it allows for pedigrees of arbitrary structure, and 2) it allows for missing/partially known marker phenotypes. It is usually possible to recode marker alleles to some extent even if the mode of inheritance of the disease is not known since what is still desired with respect to the marker is a labeling which preserves the available information about the source of each marker allele. It is important, however, where the full ancestry of alleles cannot be traced in a pedigree, that the recoded alleles maintain the allele frequencies appropriate to the original alleles. In a complex disorder, this may not be possible.

Another method is if the marker in question has 14 alleles in the general population, but only 9 alleles in the study population, it is possible to collapse the functional number of alleles to 9 or 10. Usually, adjust the allele frequencies to sum to 1 by dividing each allele frequency by the sum of the (observed) allele frequencies. For the latter all the allele frequencies remain the same, but the unobserved ones are collapsed into a single allele (and frequency). If there are 9 observed alleles (but there are 14 in the population), then rescaling the frequencies of the observed 9 alleles will also not produce quite correct results. Consider the unlikely example of a huge pedigree with only the most recent generation observed in which the observed 9 alleles all have very low and equal frequency. If there are distantly separated relatives who are affected there is some reasonable support for linkage since the alleles are rare. But if we rescale frequencies to 1/9 per alleles, then sharing of alleles isn't so unlikely. Coding the marker with 10 alleles produces correct results as it will produce the same lod scores as would coding the marker with 14 alleles.

4.7 What do you do when you get thetas greater than 0.5 when using LINKAGE?

This seems to occur when the GEMINI optimization procedure prefers to go for a local optimum of a theta greater than 0.5 as a result of the starting theta values being to high in a LINKAGE run using ILINK or LODSCORE. This can easily be fixed by modifying the starting theta direclty with LCP or editing the LCP generated script. One can also modify the starting value with PREPLINK or by editing the data file containing allele and disease frequencies. This can be an iterative process and one should change theta values by an order of magnitude until reasonable thetas are obtained. One must also be careful of having intial thetas too low, this can also cause problems in the form of erroneous values. One can also run MLINK to examine what is happening at different thetas to determine the best starting theta.

5.0) COMPUTER ADMINISTRATION AND OPTIMIZATION

5.1) How can I increase the speed of the LINKAGE/FASTLINK package on my workstation?

1. Use FASTLINK, which is the C version of the LINKAGE package with a few algorithmic improvements. It can increase the speed of your calculations by an order of magnitude.

2. Setting up lots of paging space, which uses the hard drive as virtual memory (300 megs is usually plenty). Note that paging space is the same as swap space. Then use the "fast" versions of FASTLINK.

3. Use GCC, which is the GNU/Free Software Foundation C compiler, to compile FASTLINK. GCC produces machine language that is about 10% faster than Sun's C compiler.

4. Install the generic small kernel instead of the generic kernel. The generic kernel has device files for almost everything, and can slow the system down. The generic small kernel is configured for a system without many devices and without many users. Installing a generic small kernel is an option during system installation on Sun workstations.

5. Reconfigure your kernel so it has only devices you need. This should give you a small improvement in overall system speed, but if you are already running the generic-small kernel, additional improvement may be so small that it's not worth the trouble. If the generic small kernel is insufficient for your system this step is a must. The generic kernel will slow down your workstation significantly and most of the device support is unnecessary.

6. Don't run your linkage analyses in the background, because running programs in the background gives them a lower priority. Either do the runs in the foreground or you can use the root password to nice the pedin process by -3 to compensate (negative nice values give a higher priority). If you need to log out, you can use the screen command and "detach" a session so you can log out without programs terminating. Later you can log back in and "reattach" the session, which continued to run while you were logged out. The screen command is available at prep.ai.mit.edu and is also on the O'Reilly Unix Power Tools CD- ROM. According to the Sun documentation, nicing below -10 can interfere with the operating system and actually reduce the process' speed. Running them at the standard default level of 0 is usually sufficient. Some people recommend to run a background job to using nice +19 (!). In this way, the job will not interfere with other normal processes like login.

7. Runs with 100% penetrance can run faster than runs with incomplete penetrance. Of course, if you have an unaffected obligate carrier, this won't work. In addition, incomplete penetrance runs may be necessary for your research to be "good".

8. Change the block size of your file system. One can increase performance of a file system by increasing the block size, thus decreasing the number of read-write operations. A block device, such as a hard disk, usually accesses a block of data simultaneously. Thus, if one is expecting to use large files, having large blocks will be an advantage. However, one usually trades the number of bytes lost to partial files since one has to increase the fragment size to a number larger than 1024, for example 2048. That is, each file or part of a file occupies 2048 bytes, a file of 100 bytes will still occupy 2048 bytes. Therefore, bigger blocks give faster bigger blocks with bigger fragments and more lost space.

9. It has been noted that you can increase the speed of programs which create/access large files in the /tmp directory by creating a tmpfs file system.

10. Of course, buying more RAM will increase your speed. It's been said that increasing RAM from 16 to 32 megs will result in a large increase in speed and increasing RAM from 32-64 megs will result in a significant increase. However, increasing beyond 64 megs is not particularly helpful.

6.0 MOLECULAR BIOLOGY ISSUES IN LINKAGE ANALYSIS

6.1 What screening sets are available for linkage analysis?

For humans there are the Weber lab screening sets: 3, 3A, 4, 4A, 5, 5A, and 6 . Primers for the markers within these sets are available from Research Genetics, both in unlabeled and fluorescent dye-conjugated forms. The information on these screening sets can be downloaded via FTP from dgabby.mfldclin.edu, they are in /pub.