NCBI » GEO » Info » GEO OverviewLogin

GEO Overview

General overview

GEO is an international public repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomics data submitted by the research community.

The three main goals of GEO are to:

  1. Provide a robust, versatile database in which to efficiently store high-throughput functional genomic data (see Data organization)
  2. Offer simple submission procedures and formats that support complete and well-annotated data deposits from the research community (see Submission guide)
  3. Provide user-friendly mechanisms that allow users to query, locate, review and download studies and gene expression profiles of interest (see Query and analysis)

Please see the GEO Documentation listings to find more information about various aspects of GEO.

Data organization

GEO records are organized as follows:

Schematic overview of GEO data submission Text description of the array Text tab-delimited table of the array template Text description of a biological sample Text tab-delimited table of processed hybridization result Original raw data file Text description of the overall experiment Original raw data file DataSet Profile
Platform

Platform records are supplied by submitters

A Platform record is composed of a summary description of the array or sequencer and, for array-based Platforms, a data table defining the array template.Each Platform record is assigned a unique and stable GEO accession number (GPLxxx). A Platform may reference many Samples that have been submitted by multiple submitters.

Example Platform record »

A Text description of the array or sequencer
B Text tab-delimited table of the array template
Sample

Sample records are supplied by submitters

A Sample record describes the conditions under which an individual Sample was handled, the manipulations it underwent, and the abundance measurement of each element derived from it. Each Sample record is assigned a unique and stable GEO accession number (GSMxxx). A Sample entity must reference only one Platform and may be included in multiple Series.

Example Sample record »

C Text description of the biological sample and protocols to which it was subjected
D Text tab-delimited table of processed hybridization result
(may optionally include raw data columns)
E Original raw data file, or processed sequence data file
Series

Series records are supplied by submitters

A Series record links together a group of related Samples and provides a focal point and description of the whole study. Series records may also contain tables describing extracted data, summary conclusions, or analyses. Each Series record is assigned a unique and stable GEO accession number (GSExxx).

Example Series record »

F Text description of the overall experiment
G Tar archive of original raw data files, or processed sequence data files

Selected primary records undergo an upper-level of rendering into DataSet and gene Profile records:

DataSet

DataSet records are assembled by GEO curators

As explained above, A GEO Series record is an original submitter-supplied record that summarizes an experiment. These data are reassembled by GEO staff into GEO Dataset records (GDSxxx).

A DataSet represents a curated collection of biologically and statistically comparable GEO Samples and forms the basis of GEO's suite of data display and analysis tools.

Samples within a DataSet refer to the same Platform, that is, they share a common set of array elements. Value measurements for each Sample within a DataSet are assumed to be calculated in an equivalent manner, that is, considerations such as background processing and normalization are consistent across the DataSet. Information reflecting experimental factors is provided through DataSet subsets.

Both Series and DataSets are searchable using the GEO DataSets interface, but only DataSets form the basis of GEO's advanced data display and analysis tools including gene expression profile charts and DataSet clusters. Not all submitted data are suitable for DataSet assembly and we are experiencing a backlog in DataSet creation, so not all Series have corresponding DataSet record(s).

For more information, see About GEO DataSets page.

Example DataSet record »

H Cluster image
Profile

Profiles are derived from DataSets

A Profile consists of the expression measurements for an individual gene across all Samples in a DataSet. Profiles can be searched using the GEO Profiles interface.

For more information, see About GEO Profiles page.

Example Profile records »

I Profile image

Query and Analysis

GEO data can be retrieved and analyzed in several ways:

  • To look at a particular GEO record for which you have the accession number, use the GEO accession box located on the GEO homepage or at the top of each GEO record.
  • To download data, see the various options described on the Download GEO data page.
  • To quickly locate data relevant to your interests, search GEO DataSets and GEO Profiles:
    • GEO DataSets is a study-level database which users can search for studies relevant to their interests. The database stores descriptions of all original submitter-supplied records, as well as curated DataSets. More information about GEO DataSets and how to interpret GEO DataSets results pages can be found on the About GEO DataSets page.

    • GEO Profiles is a gene-level database which users can search for gene expression profiles relevant to their interests. More information about GEO Profiles and how to interpret GEO Profiles results pages can be found on the About GEO Profiles page.

    GEO DataSet and GEO Profiles searches may be effectively performed by simply entering appropriate keywords and phrases into the search box. However, given the large volumes of data stored in these databases, it is often useful to perform more refined queries in order to filter down to the most relevant data. Examples and full details about how to perform sophisticated queries are provided in the Querying GEO DataSets and GEO Profiles page. Additionally, the Limits and Advanced Search tools, linked at the head of the GEO DataSets and GEO Profiles pages, assist greatly in the construction of complex queries:

  • Once you have identified a DataSet of interest there are several features on the DataSet record that help identify interesting gene expression profiles within that study, including a t-test tool and clusters. Full information about these features is provided on the About GEO DataSets page.
  • Once you have identified gene expression profiles of interest there are several links on the Profile records that help identify additional genes of interest, including similarly expressed genes or genes within close proximity on the chromosome. Full information about these links is provided on the About GEO Profiles page.
Last modified: May 11, 2012