NCBI Logo A Science Primer Titlebar
National Center for Biotechnology Information
 
About NCBI NCBI at a Glance A Science Primer Databases and Tools
Human Genome Resources Model Organisms Guide Outreach and Education News

About NCBI
Site Map

Science Primer:

Bioinformatics

Molecular Modeling

SNPs

ESTs

Microarray Technology

Molecular Genetics

Pharmacogenomics

Phylogenetics
 

 

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources


GENOME MAPPING: A GUIDE TO THE GENETIC HIGHWAY WE CALL THE HUMAN GENOME

 

Imagine you're in a car driving down the highway to visit an old friend who has just moved to Los Angeles. Your favorite tunes are playing on the radio, and you haven't a care in the world. You stop to check your maps and realize that all you have are interstate highway maps—not a single street map of the area. How will you ever find your friend's house? It's going to be difficult, but eventually, you may stumble across the right house.

 
Genetic maps serve to guide a scientist toward a gene, just like an interstate map guides a driver from city to city. Physical maps are more similar to street maps and allow a scientist to more easily home in on a gene's location.

This scenario is similar to the situation facing scientists searching for a specific gene somewhere within the vast human genome. They have available to them two broad categories of maps: genetic maps and physical maps. Both genetic and physical maps provide the likely order of items along a chromosome. However, a genetic map, like an interstate highway map, provides an indirect estimate of the distance between two items and is limited to ordering certain items. One could say that genetic maps serve to guide a scientist toward a gene, just like an interstate map guides a driver from city to city. On the other hand, physical maps mark an estimate of the true distance, in measurements called base pairs, between items of interest. To continue our analogy, physical maps would then be similar to street maps, where the distance between two sites of interest may be defined more precisely in terms of city blocks or street addresses. Physical maps, therefore, allow a scientist to more easily home in on the location of a gene. An appreciation of how each of these maps is constructed may be helpful in understanding how scientists use these maps to traverse that genetic highway commonly referred to as the "human genome".

 
 

PART I: GENETIC MAPS

Types of Landmarks Found on a Genetic Map

Genetic maps use landmarks called genetic markers to guide researchers on their gene hunt.

Just like interstate maps have cities and towns that serve as landmarks, genetic maps have landmarks known as genetic markers, or "markers" for short. The term marker is used very broadly to describe any observable variation that results from an alteration, or mutation, at a single genetic locus. A marker may be used as one landmark on a map if, in most cases, that stretch of DNA is inherited from parent to child according to the standard rules of inheritance. Markers can be within genes that code for a noticeable physical characteristic such as eye color, or a not so noticeable trait such as a disease. DNA-based reagents can also serve as markers. These types of markers are found within the non-coding regions of genes and are used to detect unique regions on a chromosome. DNA markers are especially useful for generating genetic maps when there are occasional, predictable mutations that occur during meiosis—the formation of gametes such as egg and sperm—that, over many generations, lead to a high degree of variability in the DNA content of the marker from individual to individual.

 
Commonly Used DNA Markers
  • RFLPs, or restriction fragment length polymorphisms, were among the first developed DNA markers. RFLPs are defined by the presence or absence of a specific site, called a restriction site, for a bacterial restriction enzyme. This enzyme breaks apart strands of DNA wherever they contain a certain nucleotide sequence.
  • VNTRs, or variable number of tandem repeat polymorphisms, occur in non-coding regions of DNA. This type of marker is defined by the presence of a nucleotide sequence that is repeated several times. In each case, the number of times a sequence is repeated may vary.
  • Microsatellite polymorphisms are defined by a variable number of repetitions of a very small number of base pairs within a sequence. Oftentimes, these repeats consist of the nucleotides, or bases, cytosine and adenosine. The number of repeats for a given microsatellite may differ between individuals, hence the term polymorphism--the existence of different forms within a population.
  • SNPs, or single nucleotide polymorphisms, are individual point mutations, or substitutions of a single nucleotide, that do not change the overall length of the DNA sequence in that region. SNPs occur throughout an individual's genome.
 
 

From Linkage Analysis to Genetic Mapping

Early geneticists recognized that genes are located on chromosomes and believed that each individual chromosome was inherited as an intact unit. They hypothesized that if two genes were located on the same chromosome, they were physically linked together and were inherited together. We now know that this is not always the case. Studies conducted around 1910 demonstrated that very few pairs of genes displayed complete linkage. Pairs of genes were either inherited independently or displayed partial linkage—that is, they were inherited together sometimes, but not always.

It is the behavior of chromosomes during meiosis that determines whether two genes will remain linked.

During meiosis—the process whereby gametes (eggs and sperm) are produced— two copies of each chromosome pair become physically close. The chromosome arms can then undergo breakage and exchange segments of DNA, a process referred to as recombination or crossing-over. If recombination occurs, each chromosome found in the gamete will consist of a "mixture" of material from both members of the chromosome pair. Thus, recombination events directly affect the inheritance pattern of those genes involved.

Because one cannot physically see crossover events, it is difficult to determine with any degree of certainty how many crossovers have actually occurred. But, using the phenomenon of co-segregation of alleles of nearby markers, researchers can reverse-engineer meiosis and identify markers that lie close to each other. Then, using a statistical technique called genetic linkage analysis, researchers can infer a likely crossover pattern, and from that an order of the markers involved. Researchers can also infer an estimate for the probability that a recombination occurs between each pair of markers.

 
An allele is one of the variant forms of a DNA sequence at a particular locus, or location, on a chromosome. Co-segregation of alleles refers to the movement of each marker during meiosis. If a marker tends to "travel" with the disease status, the markers are said to co-segregate.
 
Monotonic functions are functions that tend to move only in one direction.

If recombination occurs as a random event, then two markers that are close together should be separated less frequently than two markers that are more distant from one another. The recombination probability between two markers, which can range from 0 to 0.5, increases monotonically as the distance between the two markers increases along a chromosome. Therefore, the recombination probability may be used as a surrogate for ordering genetic markers along a chromosome. If you then determine the recombination frequencies for different pairs of markers, you can construct a map of their relative positions on the chromosome.

 
Linkage maps can tell you where markers are in relation to each other on the chromosome, but the actual "mileage" between those markers may not be so well defined.

But alas, predicting recombination is not so simple. Although crossovers are random, they are not uniformly distributed across the genome or any chromosome. Some chromosomal regions, called recombination hotspots, are more likely to be involved in crossovers than other regions of a chromosome. This means that genetic map distance does not always indicate physical distance between markers. Despite these qualifications, linkage analysis usually correctly deduces marker order, and distance estimates are sufficient to generate genetic maps that can serve as a valuable framework for genome sequencing.

 
 

Linkage Studies in Patient Populations: Genetic Maps and Gene Hunting

In humans, genetic diseases are frequently used as gene markers, with the disease state being one allele and the healthy state the second allele.

In humans, data for calculating recombination frequencies are obtained by examining the genetic makeup of the members of successive generations of existing families, termed human pedigree analysis. Linkage studies begin by obtaining blood samples from a group of related individuals. For relatively rare diseases, scientists find a few large families that have many cases of the disease and obtain samples from as many family members as possible. For more common diseases where the pattern of disease inheritance is unclear, scientists will identify a large number of affected families and will take samples from four to thirty close relatives. DNA is then harvested from all of the blood samples and screened for the presence, or co-inheritance, of two markers. One marker is usually the gene of interest, generally associated with a physically identifiable characteristic. The other is usually one of the various detectable rearrangements mentioned earlier, such as a microsatellite. A computerized analysis is then performed to determine whether the two markers are linked and approximately how far apart those markers are from one another. In this case, the value of the genetic map is that an inherited disease can be located on the map by following the inheritance of a DNA marker present in affected individuals but absent in unaffected individuals, although the molecular basis of the disease may not yet be understood, nor the gene(s) responsible identified.

 
 

Genetic Maps as a Framework for Physical Map Construction

Genetic maps are also used to generate the essential backbone, or scaffold, needed for the creation of more detailed human genome maps. These detailed maps, called physical maps, further define the DNA sequence between genetic markers and are essential to the rapid identification of genes.

 
 

PART II: PHYSICAL MAPS

 

Types of Physical Maps and What They Measure

Physical maps can be divided into three general types: chromosomal or cytogenetic maps, radiation hybrid (RH) maps, and sequence maps. The different types of maps vary in their degree of resolution, that is, the ability to measure the separation of elements that are close together. The higher the resolution, the better the picture.

The lowest-resolution physical map is the chromosomal or cytogenetic map, which is based on the distinctive banding patterns observed by light microscopy of stained chromosomes. As with genetic linkage mapping, chromosomal mapping can be used to locate genetic markers defined by traits observable only in whole organisms. Because chromosomal maps are based on estimates of physical distance, they are considered to be physical maps. Yet, the number of base pairs within a band can only be estimated.

RH maps and sequence maps, on the other hand, are more detailed. RH maps are similar to linkage maps in that they show estimates of distance between genetic and physical markers, but that is where the similarity ends. RH maps are able to provide more precise information regarding the distance between markers than can a linkage map.

The physical map that provides the most detail is the sequence map. Sequence maps show genetic markers, as well as the sequence between the markers, measured in base pairs.

 
 

How Are Physical Maps Made and Used?

RH Mapping

RH mapping, like linkage mapping, shows an estimated distance between genetic markers. But, rather than relying on natural recombination to separate two markers, scientists use breaks induced by radiation to determine the distance between two markers. In RH mapping, a scientist exposes DNA to measured doses of radiation, and in doing so, controls the average distance between breaks in a chromosome. By varying the degree of radiation exposure to the DNA, a scientist can induce breaks between two markers that are very close together. The ability to separate closely linked markers allows scientists to produce more detailed maps. RH mapping provides a way to localize almost any genetic marker, as well as other genomic fragments, to a defined map position, and RH maps are extremely useful for ordering markers in regions where highly polymorphic genetic markers are scarce.

 
Polymorphic refers to the existence of two or more forms of the same gene, or genetic marker, with each form being too common in a population to be merely attributable to a new mutation. Polymorphism is a useful genetic marker because it enables researchers to sometimes distinguish which allele was inherited.
 

Scientists also use RH maps as a bridge between linkage maps and sequence maps. In doing so, they have been able to more easily identify the location(s) of genes involved in diseases such as spinal muscular atrophy and hyperekplexia, more commonly known as "startle disease".

 
 

Sequence Mapping

Sequence tagged site (STS) mapping is another physical mapping technique. An STS is a short DNA sequence that has been shown to be unique. To qualify as an STS, the exact location and order of the bases of the sequence must be known, and this sequence may occur only once in the chromosome being studied or in the genome as a whole if the DNA fragment set covers the entire genome.

 
Common Sources of STSs
  • Expressed sequence tags (ESTs) are short sequences obtained by analysis of complementary DNA (cDNA) clones. Complementary DNA is prepared by converting mRNA into double-stranded DNA and is thought to represent the sequences of the genes being expressed. An EST can be used as an STS if it comes from a unique gene and not from a member of a gene family in which all of the genes have the same, or similar, sequences.
  • Simple sequence length polymorphisms (SSLPs) are arrays of repeat sequences that display length variations. SSLPs that are polymorphic and have already been mapped by linkage analysis are particularly valuable because they provide a connection between genetic and physical maps.
  • Random genomic sequences are obtained by sequencing random pieces of cloned genomic DNA or by examining sequences already deposited in a database.
 
 

To map a set of STSs, a collection of overlapping DNA fragments from a chromosome is digested into smaller fragments using restriction enzymes, agents that cut up DNA molecules at defined target points. The data from which the map will be derived are then obtained by noting which fragments contain which STSs. To accomplish this, scientists copy the DNA fragments using a process known as "molecular cloning". Cloning involves the use of a special technology, called recombinant DNA technology, to copy DNA fragments inside a foreign host.

First, the fragments are united with a carrier, also called a vector. After introduction into a suitable host, the DNA fragments can then be reproduced along with the host cell DNA, providing unlimited material for experimental study. An unordered set of cloned DNA fragments is called a library.

Next, the clones, or copies, are assembled in the order they would be found in the original chromosome by determining which clones contain overlapping DNA fragments. This assembly of overlapping clones is called a clone contig. Once the order of the clones in a chromosome is known, the clones are placed in frozen storage, and the information about the order of the clones is stored in a computer, providing a valuable resource that may be used for further studies. These data are then used as the base material for generating a lengthy, continuous DNA sequence, and the STSs serve to anchor the sequence onto a physical map.

 
 

The Need to Integrate Physical and Genetic Maps

As with most complex techniques, STS-based mapping has its limitations. In addition to gaps in clone coverage, DNA fragments may become lost or mistakenly mapped to a wrong position. These errors may occur for a variety of reasons. A DNA fragment may break, resulting in an STS that maps to a different position. DNA fragments may also get deleted from a clone during the replication process, resulting in the absence of an STS that should be present. Sometimes a clone composed of DNA fragments from two distinct genomic regions is replicated, leading to DNA segments that are widely separated in the genome being mistakenly mapped to adjacent positions. Lastly, a DNA fragment may become contaminated with host genetic material, once again leading to an STS that will map to the wrong location. To help overcome these problems, as well as to improve overall mapping accuracy, researchers have begun comparing and integrating STS-based physical maps with genetic, RH, and cytogenetic maps. Cross-referencing different genomic maps enhances the utility of a given map, confirms STS order, and helps order and orient evolving contigs.

 
 

NCBI and Map Integration

Comparing the many available genetic and physical maps can be a time-consuming step, especially when trying to pinpoint the location of a new gene. Without the use of computers and special software designed to align the various maps, matching a sequence to a region of a chromosome that corresponds to the gene location would be very difficult. It would be like trying to compare 20 different interstate and street maps to get from a house in Ukiah, California to a house in Beaver Dam, Wisconsin. You could compare the maps yourself and create your own travel itinerary, but it would probably take a long time. Wouldn't it be easier and faster to have the automobile club create an integrated map for you? That is the goal behind NCBI's Human Genome Map Viewer.

 
 

NCBI's Map Viewer: A Tool for Integrating Genetic and Physical Maps

The NCBI Map Viewer provides a graphical display of the available human genome sequence data as well as sequence, cytogenetic, genetic linkage, and RH maps. Map Viewer can simultaneously display up to seven maps, selected from a large set of maps, and allows the user access to detailed information for a selected map region. Map Viewer uses a common sequence numbering system to align sequence maps and shared markers as well as gene names to align other maps. You can use NCBI's Map Viewer to search for a gene in a number of genomes, by choosing an organism from the Map Viewer home page.

 

Map Viewer Getting Started

Need help using the NCBI Map Viewer? Try GETTING STARTED, a quick "how-to guide" on NCBI data mining tools designed for the novice user. GETTING STARTED using Map Viewer provides:
  • information on how you can use Map Viewer
  • descriptions of the Map Viewer layout
  • step-by-step information on using the Map Viewer
  • shortcuts for getting to where you need to go
 
Back to top
 
Revised: March 29, 2004.
  NCBI NLM NIH

  Privacy Statement Disclaimer Accessibility