Human genome data and search tips

PubMed

Nucleotide

Protein

Genome

Structure

Taxonomy

Homo sapiens genome data and search tips Revised October 5, 2011

The Map Viewer help document describes how to use the Map Viewer software. This page describes the data available for the human genome, and the search tips specific to that organism. You can also return to the Homo sapiens genome view search page. Separate documents provide information about the process used to assemble and annotate the human genome sequence, release notes for each build of the genome, and statistics for the current build. The Map Viewer home page allows you to search the genome data of any organism represented in MapViewer.

Scope of Data

Integrated data from various sources
Finished and draft sequence data
Contig assembly and annotation process (separate document)
Frequency of updates to Map Viewer data

Available Maps

Sequence Maps

Assembly

Assembly regions

Clone

Component
Contig
CpG Island
Ensembl Genes
Ensembl Transcripts
FISH Clone (seq)
GenBank DNA
Gene
Model Transcripts

NCI Clones

Phenotype
RefSeq Transcripts
Repeats
RNA Maps

Bos taurus RNA
Gallus gallus RNA
Homo sapiens RNA
Mus musculus RNA
Primates RNA

Rattus norvegicus RNA
Sus scrofa RNA

STS
TCAG Genes
TCAG Transcripts
Homo sapiens UniGene Clusters
Variation

Cytogenetic Maps

FISH Clone

Genes_Cytogenetic
Ideogram
Mitelman Breakpoint
Morbid/Disease
NCI FISH Clone

Genetic Maps

deCODE
Genethon
Marshfield

Radiation Hybrid Maps

GeneMap99-G3
GeneMap99-GB4
NCBI RH
Stanford G3
TNG
Whitehead-RH

Other Maps

Whitehead-YAC

Types of mapped objects, and maps on which they can be found

Clones
Components of Sequence Assembly
CpG Islands
Expression Data
GenBank Accessions
Genes
Phenotypes
Polymorphisms
STSs

Legend

Verbose Display Mode
Orientation ( ?)
Links to Related Resources (OMIM, sv, pr, dl, ev, hm)
STS maps

Colored dots indicate uniqueness of STS positions
Polymorphism Column
Detailed Marker Information

Constructing queries

Searchable Terms

Text Terms (symbols, names, MIM numbers, etc.)
Truncation

Map Positions or Regions

Allowable Values
View/Download Chromosome Region

Map Data (Data as Table View link)
Sequence Data ("Download/View Sequence/Evidence"
and sv, pr, dl, ev links)

Query options

Boolean Operators
Search Fields
Properties - limiting retrieval to mapped objects that have certain attributes
Advanced Search Page

Search Tips

Show Linked Entries: finding associated objects on other maps

FTP Data

Map Data
Sequence Data

Constructing URLs (these links lead to the general Map Viewer Help document)

Perform a search
Display a specific mapped object or chromosomal region

Scope of Data

Integrated Data from Various Sources

The Entrez Map Viewer integrates human sequence and map data from a variety of sources. The types of maps include sequence, cytogenetic, genetic linkage, radiation hybrid, and YAC contig. The next section, on Available Maps, provides additional detail about each source. The maps are integrated with each other as described in the Show Connections section of the general Map Viewer help document. The sequence data include both finished and draft high throughput genomic sequences (HTGs), as described below.
Separate documents provide an introduction to the information infrastructure developed at NCBI to integrate the various types of data generated by the Human Genome Project, the process used to assemble and annotate the human genome sequence, release notes for each build of the genome, and statistics for the current build.

Finished and draft sequence data

The sequence data include both finished and draft high throughput genomic sequences (HTGs). On the Component map, finished (phase 3) HTG sequences are shown in blue, and draft HTG sequences (phase 1 and 2) are shown in orange. Definitions of the various phases are provided on the HTGs home page.

Multiple algorithms are still being tested to assemble contigs from a combination of draft and finished genomic sequence, and to identify the genes, markers, SNPs and other features on that sequence. Therefore it is possible that sequence and/or features present in one version may not appear in the next. A separate document provides more detail about NCBI's Contig Assembly and Annotation Process.

Frequency of Updates to Map Viewer Data

Currently, the Map Viewer data files are updated with each full re-annotation run, which is approximately once per year. The update frequency will increase in the future, as development continues. The data are also available on NCBI's FTP site.

Available Maps

The maps for human include:

Map Name

Description

Sequence Maps

Assembly
Assembly Region

The Assembly and Assembly Region maps allow users to visualize all of the sequence data available for a given region of the genome, and separates the data by assembly. Full chromosome assemblies are shown on the Assembly map, and region assemblies are shown on the Assembly Region map.

Data are currently available from the following assemblies:

GRC assembly (reference) - the chromosome sequence assembled from finished and draft HTG sequences by the Genome Reference Consortium. The GRC assembly is composed of:

Primary Assembly - the complete reference assembly representing all chromosomes.
seven alternate loci representing the MHC region - These alternate assemblies are specific to chromosome 6.

ALT_REF_LOCI_1 - APD haplotype
ALT_REF_LOCI_2 - COX haplotype, which is similar to the DR52 haplotype annotated on build 36.3
ALT_REF_LOCI_3 - DBB haplotype
ALT_REF_LOCI_4 - MANN haplotype
ALT_REF_LOCI_5 - MCF haplotype
ALT_REF_LOCI_6 - QBL haplotype
ALT_REF_LOCI_7 - SSTO haplotype, which is similar to the DR53 haplotype annotated on build 36.3

ALT_REF_LOCI_8 - an alternate haplotype assembly for the CTG9 region on chromosome 4.
ALT_REF_LOCI_9 - an alternate haplotype assembly for the MAPT region on chromosome 17.
PATCHES - alternate assemblies for some regions of the Primary Assembly. Some patches will replace regions of the Primary Assembly in a future full assembly update (FIX PATCHES), whereas others will become additional alternate loci (NOVEL PATCHES).

HuRef - pure whole genome shotgun assembly of a single individual, described by Levy, et al., 2007 (PMID 17803354).
CRA_TCAGchr7v2 - the assembly of chromosome 7 from the The Center for Applied Genomics, TCAG.

The assembly map also acts as a filter through which all of the other sequence maps are viewed, allowing you to see the annotations that have been placed on the sequence data from each assembly.

When viewing the Assembly map, vertical lines indicate regions of the genome where sequence data from other assemblies is available. Blue lines represent the current (selected) assembly, and orange lines represent other assemblies. The GRC primary assembly is not represented. The alternate loci and patches are displayed as assembly regions instead of assembly units (e.g. MHC rather than ALT_REF_LOCI_1-7).

By default, the other sequence maps that are available for an organism display the features that have been annotated on the reference assembly.

The Maps&Options dialog box allows you to change the assembly being displayed in the Assembly map and any other sequence maps. Instructions on how to do this are provided in the section on Customizing The Display / Maps & Options Dialog Box / Select One or More Assemblies to Display.

Example: human Assembly map for chromosome 6, displayed beside the Genes_Sequence map. The Reference assembly is shown as a vertical blue line on the Assembly map and spans the length of the chromosome. The Celera and HuRef assemblies are shown as vertical orange lines. The ALT_REF_LOCI_1-7 assemblies are also shown as vertical orange lines and represent a curated sequence region for alternate haplotypes in the Major Histocompatibility Complex. The Genes_Sequence map, by default, shows the genes that have been mapped onto the reference assembly. To see the genes that have been annotated on the other assemblies, follow the instructions provided in the section on Select One or More Assemblies to Display.

Clone

Alignment of BAC end sequences to the assembled genomic sequence. During the alignment process, BAC ends are aligned to the genome and the best placement is selected, with the requirement that at least 50% of the BAC end had to align to the genome with >90% identity. If a BAC end sequence has two or more best placements on the genome, then each location will be used for clone placement. Clones shown in blue have an unambiguous best placement, whereas clones shown in black have multiple possible placements. Clones shown in green have discordant end alignments, and clones shown in orange have multiple placements with discordant end alignments.

When the Clone map is displayed as the Master map, the verbose display provides the clone name (linked to the Clone Registry database) and the BAC end sequence accessions (linked to dbGSS).

Component

Components of the human genome assembly. Shows the placement of individual GenBank sequence entries that were used to generate the genomic contigs. This represents a tiling path for the human genome sequence, based on the relationship of overlapping clones. It is assembled using the method described by the Genome Reference Consortium.

Finished (phase 3 HTG) GenBank sequence records are shown in blue. Draft GenBank sequence records (phase 1 and 2 HTG) are shown in orange. The High-Throughput Genomic Sequences (HTG) page provides additional information about the various phases.

Note: This map was called "GenBank" map through build 26 of the human genome data. The map name was changed to "Component" in build 27, December 2001. At that time, the "GenBank DNA" map was also added, described below.

Contig

Shows the chromosomal placement of contigs that have been assembled at NCBI using finished and draft high-throughput genomic (HTG) sequence data. Any individual contig can be assembled from finished sequence (phase 3 HTG), draft sequence (phase 1 and 2 HTG), or a mixture of both. Contig regions made from finished sequence are shown in blue, while regions made from draft sequence are shown in orange. The data are assembled using the method described by the Genome Reference Consortium. The High-Throughput Genomic Sequences page provides additional information about the various phases of HTG sequence data used in the assembly.

Note: The Component map shows the individual GenBank records used in assembling the contigs.

CpG Island

Shows regions of high G + C content on the assembled genome sequence. CpG islands are identified using the algorithm and "relaxed" cutoffs of Takai and Jones, 2002. The resulting islands are then converted into a histogram that reflects the frequency of islands in non-overlapping 1, 10, or 100 kb intervals (depending on the viewing magnification), and displayed in a log(10) scale.

relaxed criteria

200 bp min length
50% or higher G + C content
0.60 or higher observed CpG / expected CpG
post-processing: merge islands that are <= 100 bp apart

Ensembl Genes (assembly specific)

Map of annotated genes provided by Ensembl, based on the latest Ensembl release available at the time of the build. This map is only available for the reference assembly.

Ensembl Transcripts (assembly specific)

Map of annotated transcripts provided by Ensembl, based on the latest Ensembl release available at the time of the build. This map is only available for the reference assembly.

FISH Clone (seq)

Localization of FISH mapped clones. Clones placed onto the genomic sequence by using their clone insert sequences either in finished or draft form are shown as thick lines. Clones placed based on best placement of the BAC-end sequences (from GSS division of GenBank) are shown as thin lines, as described for the Clone map. When a clone insert sequence was used for contig assembly and if it spans a large region (e.g., > 1Mb), the clone was also marked to span the same region.

When the FISH Clone (seq) map is displayed as the Master map, the verbose display provides the clone name (linked to the Clone Registry database), the BAC end sequence accessions (linked to dbGSS), and the cytogenetic location. Clones can also be viewed based on cytogenetic positions on the FISH Clones map.

GenBank DNA

Shows the placement of human genomic DNA sequences from GenBank that were not used in the assembly of contigs. The line indicates the maximal extent of the alignment, and does not reflect gaps in the alignment. The clone alignments are color coded according to the type of sequence: blue for HTGS phase 3 or finished clones; orange for HTGS phase 1 or 2 clones; green for whole genome shotgun (WGS) contigs; and black for other sequences.

Gene
Genes that have been annotated on the genomic contigs. This includes known and putative genes placed as a result of alignments of mRNAs to the contigs, and gene predictions.

If multiple models exist for a single gene, corresponding to splicing variants, the Gene_Sequence map presents a flattened view of all the exons that can be spliced together in various ways. For example, if one splice variant uses exons 1, 3, 4, and another splice variant uses exons 2, 3, 4, the Gene_Sequence map shows exons 1, 2, 3, 4. (In comparison, the RefSeq Transcript map shows what combinations of exons are valid based on mRNA sequences from RefSeq and GenBank.)

Genes shown on the left of the grey line are transcribed in the - orientation (from bottom up), and those on the right in the + orientation (from top down).

When Gene_Sequence is selected as the Master map, the verbose display (detailed labeling, shown by default) includes arrows to the right of each gene name indicate its direction of transcription as well as links to:

OMIM - Online Mendelian Inheritance in Man (more...)

sv - sequence viewer (more...)

pr - protein (more...)

dl - view/download sequence data from a chromosome region (more...)

ev - evidence viewer (more...)

hm - HomoloGene (more...)

Additional information about these links is also provided below, under view/download sequence data from a chromosome region.

Gene models are shown in different colors, depending on the quality of the alignment of defining RNA RefSeqs to reference genome, and the maintenance of the coding sequence based on the placement.

Color Description Sequence identity and CDS effect

Blue Gene model CDS has very good agreement with the genome Identical or few mismatches

Green Gene model aligns to genome, but the CDS predicted where aligned has poor identity with the CDS annotated on the defining mRNA similar

Brown The CDS of the model has poor or no alignment to a CDS on the genome. poor or none

Black Gene model was provided by an outside source External

Additional Notes:

In general, a gene model is shown in blue if there is a clean alignment between a RefSeq or GenBank mRNA sequence and the genomic sequence, and if there is an exact match between the protein product that was annotated in the mRNA sequence record and the conceptual translation of the genomic sequence gene model.

A gene model is shown in brown if there is some discrepancy between the mRNA sequence and the gene model, either in the alignment of the two and/or in their protein products. Examples of the former can include gaps, or the alignment of an mRNA to two or more genomic regions. Examples of the latter can include differences between the amino acid sequence given in an mRNA sequence record and the conceptual translation of the corresponding gene model, or premature termination of a coding region in the genomic sequence. Both of those can be caused by base pair mismatches between the mRNA and genomic sequence.

Models with Interim GeneIDs (evidence code I) may be paralogs, genes not yet curated, duplications because of assembly errors, or pseudogenes. The genome assembly and annotation pipeline assigns interim IDs when there is no unambiguous solution to what they should be. Interim GeneIDs for protein-coding genes are associated with RefSeq XM_* accessions (model mRNAs), although supporting alignments may (or may not) include RefSeq NM_* accessions (known mRNAs). The RefSeq web site contains more information about RefSeq and RefSeq accessions.

Model Transcripts

Models generated by Gnomon.
Gnomon uses a combination of homology searching (protein and transcript alignments) and ab initio modeling to predict both complete and partial coding sequences. Please note that this process may not accurately represent alternatively spliced transcripts. The labels on the map are linked to the protein record of the highest scoring match to the model's predicted protein. Gnomon models are also included in the Gene and RefSeq Transcript maps, in regions where known RefSeq transcripts have not yet been identified.

Models are color coded based on their level of support:

Color Category Description

Blue complete support every exon is supported by a transcript or protein alignment

Green partial support model constructed from a combination of alignments and ab initio modeling

Dark Brown pseudogene model has frameshifts and/or premature stops

Light Brown ab initio model based entirely on ab initio prediction

NCI Clone

A subset of clones shown on the FISH clone (seq) map that are from the NCI.

Phenotype

Shows the placement of loci associated with phenotypes on the assembled human genome sequence. Phenotypes include those described in Online Mendelian Inheritance in Man (OMIM), and quantitative trait loci (QTLs).

OMIM - While the OMIM resource itself shows the location of phenotypes (when known) in cytogenetic coordinates, the phenotype map shows the location in sequence coordinates. Thus it is now easier, when querying by a disease name, to know if it has been placed on a sequence map at all.

If the phenotype is associated with a known gene, the placement of the gene is determined by aligning its sequence data to the human genome sequence.

QTLs - If the phenotype is placed by linkage or association to mapped markers, the phenotype is placed by the position of that marker or markers. The data are represented as single points along the chromosome, as each QTL is currently associated with the marker that gave the highest LOD score. At present, there is no step to extend the range defined by the markers to reflect the level of confidence in any boundary marker.

RefSeq Transcripts
Diagrams of the RefSeq RNAs that are mapped on the genomic contigs. Known RefSeq transcripts have accession prefixes beginning with NM_ or NR_, and model RefSeq transcripts have accession prefixes beginning with XM_ or XR_. The Transcript map and Gene_Sequence map are built in the same way, using the same types of evidence, described above. However, the Gene_Sequence map shows a view of all the exons in a gene, while the Transcript map shows the combinations of exons (i.e., splice variants) that are valid, based on mRNA sequences.

Repeats

Position of repetitive elements, calculated using RepeatMasker v3.2.6 using these flags:

-w flag --invoking MaskerAid
-no_is
-cutoff 255
-frag 20000

RNA Maps

The RNA maps show mRNAs from a given organism aligned to the assembled human genomic sequence that has been repeat-masked and dusted. Each alignment is the single best placement for that sequence in the current build of the human genome. It can be queried by sequence accession.

The RNA maps include:

Bos taurus RNA - individual Bos taurus (cow) mRNAs aligned to the assembled human genome.
Gallus gallus RNA - individual Gallus gallus (chicken) mRNAs aligned to the assembled human genome.
Homo sapiens RNA - individual Homo sapiens (human) mRNAs and ESTs aligned to the assembled human genome.
Mus musculus RNA - individual Mus musculus (mouse) mRNAs aligned to the assembled human genome.
Primates RNA - individual primate (non-human) mRNAs aligned to the assembled human genome.
Rattus norvegicus RNA - individual Rattus norvegicus (rat) mRNAs aligned to the assembled human genome.
Sus scrofa RNA - individual Sus scrofa (pig) mRNAs aligned to the assembled human genome.

The display for RNA maps differs from the Hs UniGene map in that what are displayed here are the alignments [thicker lines] and putative introns [thinner lines] of mRNAs best placed at that position. Green lines indicate ESTs; blue indicates cDNAs. In contrast, the "UniGene" map is a summary of probable splicing events, with connections to UniGene for the clusters that contain those sequences.

STS Placement of STSs from a variety of sources onto the genomic data using Electronic-PCR (e-PCR). The markers are from RHdb, GDB, GeneMap'99 (gene-based markers), Stanford G3 RH map (both gene and non-gene markers), TNG map, Whitehead RH map and YAC maps (both gene and non-gene markers), Genethon genetic map, Marshfield genetic map, and several chromosome-specific maps, such as the NHGRI map for chromosome 7 and the Washington University map of chromosome X .

TCAG Genes (assembly-specific)
Map of annotated genes provided by The Center for Applied Genomics (TCAG) at the Hospital for Sick Children on their assembly of chromosome 7.

TCAG Transcripts (assembly-specific)
Map of annotated transcripts provided by The Center for Applied Genomics (TCAG) at the Hospital for Sick Children on their assembly of chromosome 7.

Homo sapiens UniGene Clusters

The UniGene map show human mRNA and EST sequences aligned to the assembled human genomic sequence that has been repeat-masked and dusted. Only ESTs supplied with orientation are used. Each alignment is the single best placement for that sequence in the current build of the human genome.

The display of the UniGene map varies according to the span of sequence being displayed.

For large spans of sequence (greater than 10 million bases), the Map Viewer displays histograms that show the density of ESTs and mRNAs aligned to a region, the UniGene clusters to which they belong, and the number of sequences from each UniGene cluster.

For smaller spans of sequence (i.e., higher resolutions, showing less than 10 million bases), the Map Viewer displays the above information plus blue lines that indicate exon/intron structure:

thick blue lines indicate aligned regions (putative exons)
thin blue lines indicate connections between aligned regions (putative introns). Regions are connected if they come from a single transcript, or from a set of 'chained' transcripts that share at least one common intron/exon splice junction. (For example, if transcript B shares one intron/exon splice junction with transcript A, and a different splice junction with transcript C, then A, B, and C will be chained together into one transcript.)
a light grey bar shades the region that encompasses all the alignments consistent with a given set of evidence (putative mRNA), and therefore indicates the span of a model

Alignments are grouped by common structure. If two or more transcripts share at least one intron/exon splice junction, the alignments of those transcripts are merged into a single model. If two or more transcripts do not share any intron/exon splice junction, they are shown as separate models.

The UniGene map displays differ from those labeled as Xx_RNA in that what is labeled here is a summary of probable splicing events. The 'RNA' maps (not to be confused with the RefSeq 'RNA' map) show the mRNAS best placed at that position.

Variation

Alignment of genetic variation data from dbSNP onto the genomic sequence. more...

Cytogenetic Maps

FISH Clones

BAC clones that were mapped to cytogenetic bands using fluorescent in situ hybridization (FISH). When viewing FISHClone as the master map, the source of FISH data are indicated in parentheses. These clones have also been aligned to the genomic sequence data on the FISH Clone (seq) map.

Genes_Cytogenetic

Cytogenetic locations of genes as reported in Entrez Gene, which includes map locations from OMIM, the Human Gene Nomenclature Committee, and the other valued collaborators.

Ideogram

Ideogram of the G-banding pattern at the 850 band resolution.

Mitelman Breakpoint

Genome-wide map of chromosomal breakpoints, based on the Mitelman Database of Chromosome Aberrations in Cancer, by Drs. Mitelman, Mertens, and Johansson, http://cgap.nci.nih.gov/Chromosomes/Mitelman.

Morbid Cytogenetic map locations of disease genes described in OMIM.

NCI FISH Clone A subset of clones shown on the FISH Clones map that are from the NCI.

Note: the genes on all cytogenetic maps are ordered based on cytogenetic band. At present, order within a band is not being calculated.

Genetic Linkage Maps

deCODE deCODE high resolution genetic map, from deCODE genetics, Iceland. The map has a total length of 2161.71 cM and is described by Kong, A., et al. in "A high-resolution recombination map of the human genome," Nat Genet., 2002 Jul;31(3):225-6. more...

Genethon Microsatellite map, described by Dib, C., et al. in "A comprehensive genetic map of the human genome based on 5,264 microsatellites," Nature, 1996 Mar 14;380(6570):152-4. more...

Marshfield Comprehensive human linkage map incorporating >8000 polymorphic markers. Total sex-averaged genetic distance is 3500 cM. (Broman et al., Comprehensive human genetic maps: Individual and sex-specific variation in recombination. American Journal of Human Genetics, 1999 63:861-869) more...

Radiation Hybrid Maps

GeneMap99-G3

7,061 STS markers mapped onto the G3 RH panel by the International Radiation Hybrid Consortium (Schuler GD, et al., Science, October 25, 1996 , and Deloukas, et al., Science, October 23, 1998).
Scale = cR₁₀₀₀₀.
Total number of centiRays across the genome = 125,853.
Resolution = 42 cR₁₀₀₀₀ per megabase.
The GeneMap'99 home page provides additional details about the project.

GeneMap99-GB4

45,758 STS markers mapped onto the GB4 RH panel by the International Radiation Hybrid Consortium (Schuler GD, et al., Science, October 25, 1996 , and Deloukas, et al., Science, October 23, 1998).
Scale = cR₃₀₀₀.
Total number of centiRays across the genome = 11,524.
Resolution = 3.84 cR₃₀₀₀ per megabase.
The GeneMap'99 home page provides additional details about the project.

NCBI RH

NCBI Integrated Radiation Hybrid Map contains 23,723 markers from both the G3 and GB4 RH panels of GeneMap'99. Those markers were mapped with respect to 1084 framework markers (a subset of markers common to the G3 and GB4 panels). All markers from both panels were interpolated onto the GB4 scale. The article by R. Agarwala et al. provides detail about the integration strategy, as well as the methods used to evaluate the quality of the integrated map.

Stanford G3

Includes 11,458 STS markers (both gene-based and non-gene-based) mapped onto the G3 RH panel. more...
Scale = cR₁₀₀₀₀.
Total number of centiRays across the genome = 124,349.
Resolution = 41.5 cR₁₀₀₀₀ per megabase.
A subset of the markers from this map were used in the GeneMap99-G3 map.

Stanford TNG

The TNG map includes over 37,000 markers. more...
Scale = cR₅₀₀₀₀.
Resolution = 1 cR₅₀₀₀₀ is approximately 2 kbp.
On average, there is one ordered STS per 94 kbp.

Whitehead-RH Includes 6,193 STS markers mapped onto the GB4 RH panel. more...
Scale = cR₃₀₀₀.
Total number of centiRays across the genome = 11,042.
Resolution = 3.7 cR₃₀₀₀ per megabase.

Note: the RH maps described above are static and will not be updated with additional markers.

Other Maps

Whitehead-YAC STS content map of 10,850 STS markers placed onto 16,494 YACs with an average intermarker distance of 276 kilobases. The scale shown on the ruler for this map indicates the ordinal from the top of the chromosome. For example, a unit of 30 represents the 30th marker from the top of the chromosome. more...

Note: In addition to the maps listed above, NCBI offers some additional mapping information resources. For example, a comparative Human/Mouse Homology Map is not displayed in the Entrez Map Viewer, but is available for your use. The NCBI Site Map lists a number of resources for various organisms in the Genomes and Maps section.

Types of objects and maps on which they can be found

Clones

Sequence maps

Clone
Component (formerly GenBank)

Components of Sequence Assembly

Sequence maps

Component (formerly GenBank)

CpG Islands

Sequence maps

CpG Island

Expression Data

Sequence maps

Hs_UniGene (human mRNA aligned to genomic contigs;
note that alignments of mouse mRNAs to human contigs are in the Mm_RNA map)

GenBank Accessions

Sequence maps

Component (formerly GenBank)
GenBank DNA
Genes_Sequence
UniGene

Cytogenetic maps

Genes_Cytogenetic

Genes

Sequence maps

Genes_Sequence
Ab initio
UniGene

Cytogenetic maps

Genes_Cytogenetic

Phenotypes

Sequence maps

Phenotype

Cytogenetic maps

Mitelman Breakpoint
Morbid

Polymorphisms

Sequence maps

Variation

STSs

Sequence maps

STS

Genetic linkage maps

Genethon
Marshfield

Radiation hybrid maps

GeneMap99-G3
GeneMap99-GB4
NCBI RH
Stanford G3
Stanford TNG
Whitehead-RH

Legend

Verbose Mode

By default, the master map at the right side of the display is shown in verbose mode, which provides descriptive information (as available) for each object on the master map.

Orientation

Object Location Symbol Meaning

Plus strand Genes shown to the right of the grey line are transcribed in the + orientation (from top down); contigs with a + orientation are read from top down

Minus strand Genes shown to the left of the grey line are transcribed in the - orientation (from bottom up); contigs with a - orientation are read from bottom up

Unknown ? The orientation of the map element is unknown.

Links to Related Resources

Each map element displayed in your search results will be associated with a number of links (when available) that lead to additional information. The links include:

Linked Text Link Action Description

Map element Map View The results of a search list the map elements that contain your search term. Those elements can be present in one or more maps. Following the link for a particular map element leads to a graphical view of the chromosomal region that contains the element.

OMIM Online Mendelian Inheritance in Man Links to the corresponding entry in Online Mendelian Inheritance in Man, a continuously updated catalog of human genes and genetic disorders.

sv Sequence Viewer Graphically shows the position of the map element within the sequence region. The display includes a graphic depiction of the coding region (CDS), RNA, and gene features that have been annotated on that sequence region. A 2 Kb section of sequence is shown below that, with corresponding graphic annotations of the features. The left and right arrows at either end of the sequence data allow you to move upstream and downstream.

pr Protein Links to the corresponding protein sequence record in the Entrez Protein database.

dl Download Sequence
Opens a form that allows you to download a region of a chromosome. The form has two parts: (1) the top part allows you to enter chromosome coordinates in text boxes, and (2) the bottom part displays the NT_* contigs (or portions of them) that are found in that chromosome region.

Note that part 1 shows the position (base span) of the region on the chromosome, and part 2 shows the position of the region on the contig. The "strand" column for each contig shows whether that contig is on the plus or minus strand of the chromosome. Therefore, if a contig is on the minus strand, increasing the value of the 3' chromosome coordinate will decrease the value of the 5' contig coordinate.

The options to "Display, Save to Disk, and View Evidence" allow you to view the individual contigs in the region (or portions of them, depending on the chromosome region specified).

By default, the dl link beside each gene displays the chromosome and contig coordinates for the span of that gene. To view/save additional sequence data upstream and downstream of the gene, simply adjust the chromosome coordinates and press the "Change Region" button. Note that the contig coordinates will also change.

ev Evidence Viewer Graphical display of the biological evidence supporting a particular gene model. It displays all RefSeq models, GenBank mRNAs, annotated known or potential transcripts, and ESTs that align to the genomic sequence region of interest. (more...)

hm HomoloGene a resource of curated and calculated orthologs for genes as represented by UniGene or by annotation of genomic sequences. (more about HomoloGene...)

STS Maps Legend

Colored dots indicate uniqueness of STS positions

The righmost edge of the verbose display includes columns of colored dots that indicate which maps have data for each marker. The color of the dots indicates whether an STS has been mapped to a unique position on that map:

green dot

marker has been mapped to only one location on the chromosome being displayed

green dot with black slash

marker has been mapped to multiple locations on the chromosome being displayed

green and yellow dot

marker has been mapped to the chromosome being displayed, and also to another chromosome

yellow dot

marker has been mapped to one location, but on a different chromosome from the one being displayed

yellow dot with black slash

marker has been mapped to multiple locations on a different chromosome from the one being displayed

For example, if you are viewing chromosome 2, a yellow dot indicates that the map named in the column header has placed that marker in a single location on another chromosome.

Polymorphism Column

The polymorphism column indicates whether the marker has been used to detect a polymorphism, with Y for yes and N for no.

Detailed Marker Information

To see detailed mapping information about a marker, follow the link for that marker to its UniSTS record.

Constructing queries

Searchable Terms

Text terms

The viewer supports searching on any text term that may describe an element on any map. These include:

symbols
alternate symbols
A search for D2S2300 will retrieve the marker named AFM261YB1. Both terms refer to the same primer pair; (GDB uses the former, and Genethon uses the latter). The terms are therefore considered synonyms and either term will retrieve the marker.
current names or text words that are part of names
A search for actin will retrieve map objects containing actin in their descriptions.
If multiple terms are entered, they will automatically be combined with a Boolean AND. Also, adjacency searches are not supported at present. For example, a query entered as "cell adhesion" will be processed as cell AND adhesion and will retrieve records with descriptions that contain cell matrix adhesion as well as cell adhesion. The section on Boolean Operators provides information about additional options.
object identifiers [more...]
object types [more...]
additional searchable terms are listed in the Search Fields section of this document

The search program looks for the query term in all of the maps mentioned above.

Truncation

Search terms can also be truncated at the right end only, using an asterisk (*) as a wild card to represent zero to many characters. See the truncation section of the general Map Viewer Help document for more details.

Map Positions

As noted in the Search By Position section of the Entrez Map Viewer general help document, there are three main ways to search by map position from the Map View of a chromosome:

enter a range of interest in the Region text boxes in side bar
click on the region of interest in the chromosome thumbnail graphic in the sidebar
click on a region of interest in the enlarged Map View of the chromosome

Allowable Values

For human, the following types of map positions can be entered in the Region text boxes noted in option 1:

cytogenetic bands - if the master map is a cytogenetic map, you can enter band numbers in formats such as 9p23 to 9p11, or 22q12.1 to 22q13.2. The range search will not work if the chromosome number is omitted from one or both of the bands (e.g., 9p23 to p11 will not work).

symbols - you can enter gene symbols, marker names, or alternate symbols or marker names to display a region of the chromosome between those mapped elements. Note that both mapped elements must be present on the maps that share the same coordinate system in order for the range search to work properly.

numerical positions - can be used if the master map is a genetic map, radiation hybrid map, YAC map, or sequence map. It is not necessary to specify units. The Map Viewer will interpret the range in the units of the master map (centiMorgans, centiRays, ordinal units, or bases, respectively). For example:

cM - if the master map is a genetic linkage map, the search program interprets the region number as a centiMorgan (cM) unit (e.g., 1 or 12 or 12.5)

cR - if the master map is a radiation hybrid map, the program interprets the region number as a centiRay (cR) unit (e.g., 10 or 10.5 or 234).

ordinal units - if the master map is a YAC map, the program interprets the region number as an ordinal from the top of the chromosome

base pairs - if the master map is a sequence map, you can enter base pair positions in the following types of formats:
1000000 or 1,000,000 or 1M or 1000K or 1.1M or 1.2K

It is not necessary to enter a value in both Region text boxes. If you enter a value (e.g., 9q21) only in the upper box, the Map Viewer will display the region of the chromosome starting from that point and ending at that q telomere. If you enter the value only in the lower box, the Map Viewer will display the region of the chromosome starting at the p telomere and ending at that value.

View/Download Chromosome Region

You can view or download map and sequence data for chromosome regions from the graphic displays, as explained below, or by FTP.

Map Data

Map data for a chromosome region or a complete chromosome can be viewed/downloaded by using the Data as Table View option. It is accessible from the blue sidebar of a chromosome display. The Table View shows tab delimited output for the chromosomal region that was shown in the graphic display, and for each map that was shown in the display. If only sequence maps are displayed, the Table View gives the additional option of viewing/downloading the data for the complete set of sequence maps, even if only a subset was shown in the graphic display.

Sequence Data

Sequence data can be downloaded for a chromosome region of interest by following either of the following links in the graphic display of sequence maps:

A "Download/View Sequence/Evidence" link is displayed above the map graphics when a sequence map is displayed as the master map. That link leads to a page that allows you to download or view data for every contig in the entire chromosome region being displayed.

A "dl" (download) link appears in the "Links" column when Gene_Sequence is the master map. A dl link is provided for each element on that map, and leads to a page that allows you to download or view data for the contig region that contains that map element.

When the Gene_Sequence map is the master map, the links column also includes the following links that allow you to view and/or download sequence data in a selected region. (Additional links, OMIM and hm, are also shown for map elements on the Gene_Sequence map. However, they not listed in the table below because the following links focus on viewing/downloading sequence data. In contrast, the OMIM and hm links point to Online Mendelian Inheritance in Man and HomoloGene, respectively.)

sv sequence viewer Graphically shows the position of the map element within the sequence region, including the coding region (CDS), RNA, and gene features that have been annotated on that region. A 2 Kb section of sequence is shown below that, with corresponding graphic annotations of the features. The left and right arrows at either end of the sequence data allow you to move upstream and downstream.

pr protein links to the corresponding protein sequence record in the Entrez Protein database.

dl download sequence
Opens a form that allows you to download a region of a chromosome. The form has two parts: (1) the top part allows you to enter chromosome coordinates in text boxes, and (2) the bottom part displays the NT_* contigs (or portions of them) that are found in that chromosome region.

Note that part 1 shows the position (base span) of the region on the chromosome, and part 2 shows the position of the region on the contig. The "strand" column for each contig shows whether that contig is on the plus or minus strand of the chromosome. Therefore, if a contig is on the minus strand, increasing the value of the 3' chromosome coordinate will decrease the value of the 5' contig coordinate.

The options to "Display, Save to Disk, and View Evidence" allow you to view the individual contigs in the region (or portions of them, depending on the chromosome region specified).

By default, the seq link beside each gene displays the chromosome and contig coordinates for the span of that gene. To view/save additional sequence data upstream and downstream of the gene, simply adjust the chromosome coordinates and press the "Change Region" button. Note that the contig coordinates will also change.

ev evidence viewer Displays the biological evidence supporting a particular gene model. It displays all RefSeq models, GenBank mRNAs, annotated known or potential transcripts, and ESTs that align to the genomic sequence region of interest. (more...)

Query options

Boolean Operators

If multiple terms are entered, they will automatically be combined with a Boolean AND, as mentioned in the Text Terms section above. Adjacency searches are not supported at present. For example, a query entered as cell adhesion will be processed as cell AND adhesion and will retrieve records with descriptions that contain cell matrix adhesion as well as cell adhesion.
You can choose to use any Boolean operators (AND, OR, NOT) in your query. Boolean operators must be written in upper case.
The general syntax for a Boolean Query is:
term[field] BOOLEAN term[field] BOOLEAN term[field]
The available search fields and their corresponding abbreviations (qualifiers) are listed below.
By default, Boolean operators are processed from left to right. The order in which Entrez processes a search statement can be changed by enclosing individual concepts in parentheses. The terms inside the parentheses are processed first as a unit and then incorporated into the overall strategy. Additional details about Boolean Operators are provided in the Entrez Help document.

Search fields

If desired, you can restrict the search for a term to a particular field by placing the field qualifier in square brackets [] after the term. It is not necessary to include a space between the search term and the field specifier.
If no field qualifier is used, the system will search all fields. For example, a search for cancer will retrieve records which contain that term in any field. A search for cancer[dis] will only retrieve records which contain that term in the disease field.
"Disease" refers to diseases on the OMIM Morbid Map and the Mitelman Breakpoint Map.
Terms can be combined with Boolean operators, as described above.
The Advanced Search page (see example) also provides the ability to restrict your search to specific fields, and to limit retrieval to mapped objects that have desired properties.

Search field Description Qualifier

accession the nucleotide accession of a GenBank component or the nucletide or protein accessions for RefSeqs [accession], [acc], [accn]

chromosome the chromosome number [chr]

disease disease or Mitelman breakpoint name [dis]

id the integer identifier for a particular type of object; useful in combination with type [id]

map name the name of the map
(The general Map Viewer Help document provides a list of map names. Use the character string in the "URL value" column.) [map_name],[map]

MIM number the MIM number for a phenotype or gene
from Online Mendelian Inheritance in Man [mim]

properties various attributes associated with a mapped object; additional details in the Properties section below [prop]

symbol the gene symbol or other short name; includes clone names, marker names, and alternate symbols (also referred to as aliases or synonyms; see Text Terms section above for example) [sym]

title gene, disease, or Mitelman breakpoint names; includes symbols [title], [ti], [titl]

type type of mapped object; most useful in combination with id
Options are: clone, component, contig, gs_tran, gene, mim, mitel, snp, sts, tag, transcript, unigene [obj_type]

Properties: limiting retrieval to mapped objects that have certain attributes

The Advanced Search page (see example) allows you to limit retrieval to mapped elements that have certain attributes, or properties, listed below. The properties indented under has_snp apply only to mapped markers associated with reported genetic variations.
"Disease" refers to diseases on the OMIM Morbid Map and the Mitelman Breakpoint Map.

Property Description

disease_known mapped object associated with a known disease; data in this category are currently available only for genes

on_seq mapped object that is present on one of the sequence maps

in_clone mapped object that falls within the boundaries of a FISH mapped clone

in_gene Any part of the marker position on sequence map is within a 2kb interval 5' of the most 5' feature of gene (CDS, mRNA, gene), OR the marker position is within a 500 base interval 3' of the most 3' feature of the gene. Both strands of sequence are examined for gene features, so a marker can potentially be a variation on multiple genes at a single location.

has_NM mapped object connected with a RefSeq mRNA (which has an accession number in the format NM_123456)

has_STS mapped object that is not an STS but is connected to an STS (e.g., a gene or clone that contains known STSs)

has_snp mapped object that contains a reported single nucleotide polymorphism (SNP), insertion, deletion, or other small variation [or mapped marker that has a link (see show connections) to a marker on the Variation map] [or mapped marker that is associated with a gene that has...]

Variation only -- the properties below apply only to objects on the Variation map

in_transcript Any part of marker position overlaps with mRNA location (or overlaps with UTR/intron and mRNA feature is missing), BUT marker position is not within the coding region of the transcript.

in_CDS Any part of the marker position overlaps with a coding sequence (CDS) region (or overlaps with exon region in the unlikely case an exon is annotated but CDS is missing).

has_genotype A variation object that is connected with genotype information.

has_linkout A variation object for which submitter links are available.

het_80_90 heterozygosity of 80-90%

het_90plus heterozygosity of >90%

Advanced Search Page

The Advanced Search page (see example) allows you to use a number of query options by simply checking boxes or radio buttons that represent various search fields, properties, object types. It also allows you to limit your query to one or more chromosomes.
The Advanced Search page is accessible from the header region of the genome view page (described in the general Map Viewer Help document).

Search Tips

Show Linked Entries:
finding associated objects on other maps

The human genome search page provides an option to "Show linked entries" under the text box. If that option is not checked, the search system will only retrieve map elements that contain your search term in their descriptions. If that option is checked, the search system will retrieve the latter, plus associated map elements that do not necessarily contain the search string. Examples are below.

Note: Do not use the "Show linked entries" option if you anticipate your search will retrieve a large number of map elements. It will cause your search results to be extremely long.

Making connections between disease phenotypes on the Morbid map and STSs

To find STSs associated with a disease specific phenotype:

search for a term associated with the phenotype, such as "PSORS7" (for psoriasis susceptibility 7); alternatively, you can search for its MIM number 605606
this will retrieve map elements that contain the search term in their descriptions; however, it will not retrieve associated elements, such as STS markers, unless they also contain the search term. In many cases, they do not.
to retrieve those associated elements, check the "show linked entries" option under the "search for" box, and press the "Find" button again
the search output will now include map elements on other maps, such as STSs, SNPs, and clones, which did not necessarily contain your search term in their descriptions.

To find a disease phenotype associated with specific STSs:

search for an STS name, for example, D1S197 (or D1S200 or D1S207)
this will retrieve the STS with that name (as well as STSs on other maps that use alternate names for the same primer pair)
to retrieve the associated phenotype, check the "show linked entries" option under the "search for" box, and press the "Find" button again
the search output will now include elements on other maps, such as phenotypes on the Morbid map, SNPs, clones, etc., which did not necessarily contain your search term in their descriptions.

To see a list of all the disease phenotypes from OMIM that have links to associated STSs:

on the human genome search page, select the advanced search option
In the "Type of mapped object" section, select "clear," then check only "OMIM"
in the "Search only records" section, select "with STS known"
press the "Find" button
the search output will list all the Morbid map elements that contain links to STSs
to then see the STSs and other map elements associated with those Morbid map phenotypes, check the "show linked entries" option under the "search for" box, and press the "Find" button again

How links are made between disease phenotypes on the Morbid map and STSs

For known genes, links are established in an automated way by using e-PCR to compare the data in UniSTS against the mRNAs for those genes.

For disease phenotypes with no known genes, links are based on published references. If an article about a disease cites an STS, a link to that STS is provided in Map Viewer through the "show linked entries" function.

FTP Data

FTP Map Data

The Map Viewer data are available in the ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/mapview/ directory of NCBI's FTP site.

Map data can also be downloaded for a complete chromosome or chromosome region by using the Data as Table View option, which is accessible from the blue sidebar of a chromosome display. More information about this option is provided in View/Download Chromosome Region, above.

FTP Sequence Data

The ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/ directory of the NCBI FTP site contains one folder for each chromosome, which includes genomic contigs (NT_* records) built from finished and unfinished sequence data. The contigs are available in various formats, described below. The contig assembly and annotation process is described in a separate document.

hs_chr*.asn ASN.1 format

hs_chr*.fa.gz FASTA format

hs_chr*.gbk.gz GenBank flat file format

hs_chr*.gbs GenBank summary format
(this format does not contain sequence data, but instead contains a "CONTIG" field, showing how the contig is assembled from individual GenBank accessions)

In addition, sequence data can be downloaded from the graphic display of sequence maps, as explained in View/Download Chromosome Region, above.

Constructing URLs to link to Map Viewer

If you would like to create WWW links to the Map Viewer, the instructions for constructing URLs are given in the the general Map Viewer Help document. You can construct URLs that either perform a search or display a specific mapped object or chromosomal region.

Questions or Comments?
Write to the NCBI Service Desk

Object Location	Symbol	Meaning
Plus strand		Genes shown to the right of the grey line are transcribed in the + orientation (from top down); contigs with a + orientation are read from top down
Minus strand		Genes shown to the left of the grey line are transcribed in the - orientation (from bottom up); contigs with a - orientation are read from bottom up
Unknown	?	The orientation of the map element is unknown.

Linked Text	Link Action	Description

Map element	Map View	The results of a search list the map elements that contain your search term. Those elements can be present in one or more maps. Following the link for a particular map element leads to a graphical view of the chromosomal region that contains the element.

OMIM	Online Mendelian Inheritance in Man	Links to the corresponding entry in Online Mendelian Inheritance in Man, a continuously updated catalog of human genes and genetic disorders.
sv	Sequence Viewer	Graphically shows the position of the map element within the sequence region. The display includes a graphic depiction of the coding region (CDS), RNA, and gene features that have been annotated on that sequence region. A 2 Kb section of sequence is shown below that, with corresponding graphic annotations of the features. The left and right arrows at either end of the sequence data allow you to move upstream and downstream.
pr	Protein	Links to the corresponding protein sequence record in the Entrez Protein database.
dl	Download Sequence	Opens a form that allows you to download a region of a chromosome. The form has two parts: (1) the top part allows you to enter chromosome coordinates in text boxes, and (2) the bottom part displays the NT_* contigs (or portions of them) that are found in that chromosome region. Note that part 1 shows the position (base span) of the region on the chromosome, and part 2 shows the position of the region on the contig. The "strand" column for each contig shows whether that contig is on the plus or minus strand of the chromosome. Therefore, if a contig is on the minus strand, increasing the value of the 3' chromosome coordinate will decrease the value of the 5' contig coordinate. The options to "Display, Save to Disk, and View Evidence" allow you to view the individual contigs in the region (or portions of them, depending on the chromosome region specified). By default, the dl link beside each gene displays the chromosome and contig coordinates for the span of that gene. To view/save additional sequence data upstream and downstream of the gene, simply adjust the chromosome coordinates and press the "Change Region" button. Note that the contig coordinates will also change.
ev	Evidence Viewer	Graphical display of the biological evidence supporting a particular gene model. It displays all RefSeq models, GenBank mRNAs, annotated known or potential transcripts, and ESTs that align to the genomic sequence region of interest. (more...)
hm	HomoloGene	a resource of curated and calculated orthologs for genes as represented by UniGene or by annotation of genomic sequences. (more about HomoloGene...)

Search field	Description	Qualifier
accession	the nucleotide accession of a GenBank component or the nucletide or protein accessions for RefSeqs	[accession], [acc], [accn]
chromosome	the chromosome number	[chr]
disease	disease or Mitelman breakpoint name	[dis]
id	the integer identifier for a particular type of object; useful in combination with type	[id]
map name	the name of the map (The general Map Viewer Help document provides a list of map names. Use the character string in the "URL value" column.)	[map_name],[map]
MIM number	the MIM number for a phenotype or gene from Online Mendelian Inheritance in Man	[mim]
properties	various attributes associated with a mapped object; additional details in the Properties section below	[prop]
symbol	the gene symbol or other short name; includes clone names, marker names, and alternate symbols (also referred to as aliases or synonyms; see Text Terms section above for example)	[sym]
title	gene, disease, or Mitelman breakpoint names; includes symbols	[title], [ti], [titl]
type	type of mapped object; most useful in combination with id Options are: clone, component, contig, gs_tran, gene, mim, mitel, snp, sts, tag, transcript, unigene	[obj_type]

Property	Description
disease_known	mapped object associated with a known disease; data in this category are currently available only for genes
on_seq	mapped object that is present on one of the sequence maps
in_clone	mapped object that falls within the boundaries of a FISH mapped clone
in_gene	Any part of the marker position on sequence map is within a 2kb interval 5' of the most 5' feature of gene (CDS, mRNA, gene), OR the marker position is within a 500 base interval 3' of the most 3' feature of the gene. Both strands of sequence are examined for gene features, so a marker can potentially be a variation on multiple genes at a single location.
has_NM	mapped object connected with a RefSeq mRNA (which has an accession number in the format NM_123456)
has_STS	mapped object that is not an STS but is connected to an STS (e.g., a gene or clone that contains known STSs)
has_snp	mapped object that contains a reported single nucleotide polymorphism (SNP), insertion, deletion, or other small variation [or mapped marker that has a link (see show connections) to a marker on the Variation map] [or mapped marker that is associated with a gene that has...]
Variation only -- the properties below apply only to objects on the Variation map
in_transcript	Any part of marker position overlaps with mRNA location (or overlaps with UTR/intron and mRNA feature is missing), BUT marker position is not within the coding region of the transcript.
in_CDS	Any part of the marker position overlaps with a coding sequence (CDS) region (or overlaps with exon region in the unlikely case an exon is annotated but CDS is missing).
has_genotype	A variation object that is connected with genotype information.
has_linkout	A variation object for which submitter links are available.
het_80_90	heterozygosity of 80-90%
het_90plus	heterozygosity of >90%