Genes & Expression

Databases

A collection of genomics, functional genomics, and genetics studies and links to their resulting datasets. This resource describes project scope, material, and objectives and provides a mechanism to retrieve datasets that are often difficult to find due to inconsistent annotation, multiple independent submissions, and the varied nature of diverse data types which are often stored in different databases.

Database that groups biomedical literature, small molecules, and sequence data in terms of biological relationships.

ClinVar

A resource under development to provide a public, tracked record of reported relationships among human variation and observed health status. ClinVar has a projected launch date for late 2011. 

Consensus CDS (CCDS)

A collaborative effort to identify a core set of human and mouse protein coding regions that are consistently annotated and of high quality.

An archive and distribution center for the description and results of studies which investigate the interaction of genotype and phenotype. These studies include genome-wide association (GWAS), medical resequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits. 

A searchable database of genes, focusing on genomes that have been completely sequenced and that have an active research community to contribute gene-specific data. Information includes nomenclature, chromosomal localization, gene products and their attributes (e.g., protein interactions), associated markers, phenotypes, interactions, and links to citations, sequences, variation details, maps, expression reports, homologs, protein domain content, and external databases.

A public functional genomics data repository supporting MIAME-compliant data submissions. Array- and sequence-based data are accepted and tools are provided to help users query and download experiments and curated gene expression profiles.

Stores curated gene expression and molecular abundance DataSets assembled from the Gene Expression Omnibus (GEO) repository. DataSet records contain additional resources, including cluster tools and differential expression queries.

Stores individual gene expression and molecular abundance Profiles assembled from the Gene Expression Omnibus (GEO) repository. Search for specific profiles of interest based on gene annotation or pre-computed profile characteristics.

Genes and Disease

Summaries of information for selected genetic disorders with discussions of the underlying mutation(s) and clinical features, as well as links to related databases and organizations.

Genetic Testing Registry (GTR)

A voluntary registry of genetic tests and laboratories, with detailed information about the tests such as what is measured and analytic and clinical validity.  GTR also is a nexus for information about genetic conditions and provides context-specific links to a variety of resources, including practice guidelines, published literature, and genetic data/information. The initial scope of GTR includes single gene tests for Mendelian disorders, as well as arrays, panels and pharmacogenetic tests.

A gene homology tool that compares nucleotide sequences between pairs of organisms in order to identify putative orthologs. Curated orthologs are incorporated from a variety of sources via the Gene database.

A database of  human genes and genetic disorders. NCBI maintains current content and continues to support its searching and integration with other NCBI databases.  However, OMIM now has a new home at omim.org, and users are directed to this site for full record displays.

Probe

A public registry of nucleic acid reagents designed for use in a wide variety of biomedical research applications, together with information on reagent distributors, probe effectiveness, and computed sequence similarities.

A collection of human gene-specific reference genomic sequences. RefSeq gene is a subset of  NCBI’s RefSeq database, and are defined based on review from curators of locus-specific databases and the genetic testing community. They form a stable foundation for reporting mutations, for establishing consistent intron and exon numbering conventions, and for defining the coordinates of other biologically significant variation. RefSeqGene is a part of the Locus Reference Genomic (LRG) Collaboration.

A database that provides sets of transcript sequences that appear to come from the same transcription locus (gene or expressed pseudogene), together with information on protein similarities, gene expression, cDNA clone reagents, and genomic location.

UniGene Library Browser

This database contains libraries of Expressed Sequence Tags (ESTs) organized by organism, tissue type and developmental stage.

Downloads

FTP: Gene

This site contains three directories: DATA, GeneRIF and tools. The DATA directory contains files listing all data linked to GeneIDs along with subdirectories containing ASN.1 data for the Gene records. The GeneRIF (Gene References into Function) directory contains PubMed identifiers for articles describing the function of a single gene or interactions between products of two genes. Sample programs for manipulating gene data are provided in the tools directory. Please see the README file for details.

FTP: Gene Expression Omnibus (GEO) Profiles and Datasets

This site contains GEO data in two formats: SOFT (Simple Omnibus in Text Format) and MINiML (MIAME Notation in Markup Language). Summary text files and supplementary data are also available. Please see the README.TXT file for more information.

FTP: HomoloGene

This site contains data for each build of HomoloGene, beginning with build 35. Complete data for each build are provided in XML, and a data summary is provided in tab-delimited text format.

FTP: UniGene

This site contains individual directories for each organism with data in UniGene. The data for each species includes the unique sequence for each UniGene cluster, all sequences in each cluster in FASTA format and library information for the cluster. See the README file for further details.

Submissions

BioProject Submission

An online form that provides an interface for researchers, consortia and organizations to register their BioProjects. This serves as the starting point for the submission of genomic and genetic data for the study. The data does not need to be submitted at the time of BioProject registration.

Database of Genotype and Phenotype (dbGaP) Data Submission Policies

Guidelines and requirements for submitting genotype and phenotype association data to dbGaP.

Gene Expression Omnibus (GEO) Web Deposit

Submit expression data, such as microarray, SAGE or mass spectrometry datasets to the NCBI Gene Expression Omnibus (GEO) database.

GeneRIF

GeneRIF provides a simple mechanism to allow scientists to add to the functional annotation of genes in the Gene database.

Submission Portal

A single entry point for submitters to link to and find information about all of the data submission processes at NCBI. Currently, this serves as an interface for the registration of BioProjects and BioSamples and submission of data for WGS and GTR. Future additions to this site are planned.

Tools

BLAST RefSeqGene

Performs a BLAST search of the genomic sequences in the RefSeqGene/LRG set. The default display provides ready navigation to review alignments in the Graphics display.

Digital Differential Display (DDD)

A tool for comparing EST profiles in order to identify genes with significantly different expression levels.

Tools that provide access to data within NCBI's Entrez system outside of the regular web query interface. They provide a method of automating Entrez tasks within software applications. Each utility performs a specialized retrieval task, and can be used simply by writing a specially formatted URL.

Gene Expression Omnibus (GEO) BLAST

Tool for aligning a query sequence (nucleotide or protein) to GenBank sequences included on microarray or SAGE platforms in the GEO database.

An integrated application for viewing and analyzing sequence data. With Genome Workbench, you can view data in publically available sequence databases at NCBI, and mix these data with your own data.

Provides special browsing capabilities of maps and assembled sequences for a subset of organisms. You can view and search an organism's complete genome, display maps, and zoom into progressively greater levels of detail, down to the sequence data for a region of interest.

Phenotype-Genotype Integrator (PheGenI)

Supports finding human phenotype/genotype relationships with queries by phenotype, chromosome location, gene, and SNP identifiers. Currently includes information from dbGaP, the NHGRI GWAS Catalog, and GTeX. Displays results on the genome, on sequence, or in tables for download.

Sequence Viewer

Provides a configurable graphical display of a nucleotide or protein sequence and features that have been annotated on that sequence. In addition to use on NCBI sequence database pages, this viewer is available as an embeddable webpage component. Detailed documentation including an API Reference guide is available for developers wishing to embed the viewer in their own pages.