NCBI » GEO » Info » Frequently Asked QuestionsLogin

Frequently Asked Questions

Submission

Query and search

Submission

What is GEO?

The Gene Expression Omnibus (GEO) is a public repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomic data submitted by the scientific community. In addition to data storage, a collection of web-based interfaces and applications are available to help users query and download the studies and gene expression patterns stored in GEO. For more information about various aspects of GEO, please see our documentation listings and publications.

Why should I submit my data to GEO?

There are several good reasons for submitting your data to us. The most likely reason is that the journal in which you are publishing your research requires deposit of microarray data to a MIAME-compliant public repository like GEO. We endeavor to make data deposit procedures as straightforward as possible and will provide as much assistance as you require to get your data submitted to GEO. If you have problems or questions about the submission procedures, just e-mail us at geo@ncbi.nlm.nih.gov and one of our curators will quickly get back to you. In addition to satisfying possible journal requirements for publication, there are other significant benefits to depositing data with GEO. Your data receive long term archiving at a centralized repository, and are integrated with other NCBI resources which afford greatly increased usability and visibility. You may also include links back to your own project websites within your submission, again increasing visibility of your research. Journal publication is not a requirement for data submission to GEO.

How do I submit my data to GEO?

To submit data, you first need to establish your identity with us by setting up your own GEO account, with a private username and password. The contact information you supply will be displayed on your GEO records. As explained on the Submitting data page there are several deposit formats you can use to submit your data to GEO. These include spreadsheets, plain text, and XML formats. Regardless of the submission method you choose, the final GEO records will look the same and contain equivalent information. If you have any problems with the submission process, please do not hesitate to e-mail us at geo@ncbi.nlm.nih.gov and we will be happy to provide assistance.

When do I submit my data to GEO?

Many journals require accession numbers for microarray or sequence data before acceptance of a paper for publication. Also, reviewers and editors may need access to your data during the review process. Thus, data should be deposited in GEO before a manuscript describing the data is sent to a journal for review. GEO processing times is approximately 5 business days after completion of submission, so it is important to make your submission well in advance of when you require the accession numbers for your manuscript. Your records may remain private until your data are published. Once your submissions have been approved, you can cite the GEO accession number(s) in your manuscript and you can generate an access link by which editors and reviewers can access your private submissions.

When will my data receive GEO accession numbers?

Processing time normally takes approximately 5 business days after completion of submission. After you complete the submission, your data are put into a queue to await review by a curator. Please understand that we receive hundreds of study submissions per week, and processing times can vary depending on submission volume. Thus it is important to make your submission well in advance of when you require the accession numbers for your manuscript. If format or content problems are identified with your submission, a curator will contact you by e-mail explaining how to address the issue. Please address the issues raised by curators; failure to do so may result in processing delays or removal of the records. Once your records pass review, the curator will send you an e-mail confirming your GEO accession numbers and their release dates. If you do not receive an e-mail from us within 5 business days of your submission, please first check your spam or junk e-mail folders because some systems recognize GEO e-mail correspondence as spam, then e-mail us at geo@ncbi.nlm.nih.gov to inquire about your submission. Do not quote GEO accession numbers in manuscripts until you have received an approval e-mail notice from a GEO curator.

I'm a reviewer, how do I access and evaluate pre-publication data?

Reviewers should expect to receive a reviewer URL with the manuscript. This URL allows anonymous, confidential access to the private GEO records cited in the paper. Detailed information is provided in these Guidelines for reviewers and journal editors.

Does GEO support MIAME?

Yes. GEO encourages submitters to supply MIAME compliant data. To assist submitters in providing data that comply with MIAME, GEO submission procedures are designed to closely follow the MIAME checklist. If you provide all requested information, your submission will be MIAME-compliant. Processing delays may occur if your submission lacks critical MIAME components. Submitters and reviewers are encouraged to refer to the MIAME checklist and to use it as a guide to determine what information should be included when describing a microarray study. Note that MIAME compliance is determined by the content provided, not by the submission format or route. If you have any comments or concerns regarding this issue, please e-mail us at geo@ncbi.nlm.nih.gov.

What kinds of data will GEO accept?

GEO was designed around the common features of most of the high-throughput and parallel molecular abundance-measuring technologies in use today. These include data generated from microarray and high-throughput sequence technologies, for example:

  • Gene expression profiling by microarray or next-generation sequencing (see examples)
  • Non-coding RNA profiling by microarray or next-generation sequencing (see examples)
  • Chromatin immunoprecipitation (ChIP) profiling by microarray or next-generation sequencing (see examples)
  • Genome methylation profiling by microarray or next-generation sequencing (see examples)
  • Genome variation profiling by array (arrayCGH) (see examples)
  • SNP arrays (see examples) (see human subject FAQ)
  • Serial Analysis of Gene Expression (SAGE) (see examples)
  • Protein arrays (see examples)

The GEO database has a flexible and open design that is responsive to developing trends. If you have questions about whether GEO can accept your data type, please do not hesitate to contact us.

Does GEO store raw data?

Yes. All microarray submitters are required to provide raw data with their submissions. Raw data facilitates the unambiguous interpretation of the data and potential verification of the conclusions as described in the MIAME guidelines. Raw data may be supplied either within the Sample record data tables or as external supplementary data files, e.g., Affymetrix CEL or GenePix GPR scan files. Supplementary data files for public records are made available from the GEO FTP site.

How are submitters authenticated?

In their first submission to GEO, submitters are asked to create a GEO account, with a confidential username and password. This account can be used to submit additional data in the future without re-entering contact information, as well as to authenticate the submitter when updating or editing an existing GEO accession number. We will send all e-mail correspondence, approvals, and reminders to the contact e-mail addresses provided in the GEO account - please be sure to inform us if your contact e-mail address changes (please see How can I make edits to my contact information?).

Can I keep my data private while my manuscript is being prepared or under review?

Yes. GEO records may remain private until a manuscript describing the data is published (journal publication is not a requirement for data submission to GEO). During the submission process you are prompted to specify a release date for your records. Although the maximum allowable limit is three years, this date may be brought forward or pushed back at any time (please see How can I make corrections to data that I already submitted?) - or you can e-mail us at any time to request a change of release date. This feature allows a submitter to deposit data and receive a GEO accession number to quote in a manuscript before the data become public. We will send you an e-mail reminder 10 days before the scheduled release date, inviting you to postpone the release date as necessary. It is important to inform us as soon as your manuscript is published so that we can release your records and link them with PubMed. Submitters also have the opportunity to create a private access link that allows collaborators or reviewers confidential, read-only access to private data before manuscript publication.

Can I keep my data private after my manuscript is published?

No. If GEO accession numbers are quoted in a publication, the records must be released so that the data are accessible to the scientific community. If GEO accession numbers are found to be quoted in a publication before the scheduled release date, GEO staff are obligated to release those records, even if a second manuscript describing the same data is pending.

How can I allow reviewers access to my private records?

After your records have been approved, you can create an access link to your private submissions using the Click here to create a reviewer access link near the top of your Series (GSExxx) record. The reviewer URL that is generated can be sent to the journal editor who will circulate it to reviewers requiring access to your private data.

How can I make corrections to data that I already submitted?

You may perform updates and edits at any time to any of your submissions. Please refer to the Updating your GEO records page for instructions.

How can I delete my records?

Only GEO staff can remove records from the database; it is necessary to e-mail us at geo@ncbi.nlm.nih.gov to request deletion of specific accession numbers. Please keep in mind that updating records is preferable to deleting records (see How can I make corrections to data that I already submitted? section above). Note that a validation-only tool is available on the Direct Deposit page - this feature allows you to validate and test your SOFT or MINiML files without actually submitting (and then deleting) the records. If the accessions in question have been published in a manuscript, we cannot delete the records. Rather, a comment will be added to the record indicating the reason the submitter requested withdrawal of the data, and the record content adjusted/deleted accordingly.

Can I submit data derived from human subjects?

For all studies involving human subjects, it is the submitter's responsibility to ensure that the data and files supplied to GEO protect participant privacy in accordance with all applicable laws, regulations and institutional policies. Make sure to remove any direct personal identifiers from your submission. These identifiers are listed in http://privacyruleandresearch.nih.gov/research_repositories.asp, footnote 1. If there are patient privacy concerns regarding making data fully public through GEO, please submit to NCBI's dbGaP database. dbGaP has controlled access mechanisms and is an appropriate resource for hosting sensitive patient data.

How can I make edits to my contact information?

After logging in with your username and password, follow the View your account link on the home page where you will find an Edit button. Edits to contact information will be applied immediately to all existing records submitted under that account. If you need the contact information to remain unedited on existing records, but different contact details to appear on new records, it is necessary to open a separate account and submit new data under that account.

Can I submit an extracted or summary subset of data?

No. Sample records should be supplied as complete hybridization tables and Platform records should contain meaningful, trackable sequence identifier information. The principal reason we maintain this archive and the rationale behind many journals' requirement for data deposit into GEO is so that the community can access and comprehensively re-examine data that form the basis of scientific reporting. Therefore, we do not accept partial data sets. We do understand the various reasons and difficulties some researchers have with sharing data. However, the demand from users and journal editors together with our need to maintain a useful and transparent database has led to our policy of only accepting complete data sets. If you have any questions or concerns regarding this issue, please e-mail us.

Query and search

Who can use GEO data?

Anybody can access and download public GEO data. There are no login requirements. For more information, please read these copyright and data disclaimers.

What kinds of retrievals are possible in GEO?

There are several ways to retrieve GEO data, please see the Query and analysis overview and the Download GEO data instructions for details. These methods range from performing simple or sophisticated queries of the GEO DataSets and GEO Profiles databases, entering a valid GEO accession number in the Accession Display bar, browsing the list of current GEO repository contents, or downloading data from the GEO FTP site.

How can I query and analyze GEO data?

Once you have found a curated DataSet or Series of interest there are several features available that help identify interesting gene expression profiles within that study. Curated DataSets include a find genes feature, cluster heatmaps and a t-test sample comparison tool. Once you have identified gene expression profile charts of interest there are several types of neighbors links on the Profile records that help identify related genes of interest. If no curated DataSet is available, it may be appropriate to analyze the Series using GEO2R which compares groups of Samples and identifies differentially expressed genes. Alternatively, if you prefer to perform your own analysis using your favorite microarray software package, the value matrix tables within the DataSet full SOFT files available from the DataSet records, or the Series Matrix files linked at the foot of Series records may prove suitable.

Can GEO data be accessed programmatically?

Yes. Users can take advantage of NCBI's Entrez programming utilities to access data stored in GEO DataSets and GEO Profiles. Additionally, BioConductor users may be interested in the GEOquery package which parses GEO SOFT files for integration with BioConductor 'R' analysis resources, see publication.

Can I get notified when new data is available?

Yes. This can be accomplished using a My NCBI account; register here. Once you are registered and logged in to My NCBI, construct a search for data relevant to your interests in GEO DataSets. For example, if you are only interested in studies performed on Platform GPL96 search with GPL96[GEO Accession]; to see any apoptosis studies search with apoptosis; or if you want to see all new studies search with all[filter]. Next to the search box, you should see a Save Search option. You will be presented with the option to receive e-mail alerts when new data matching your search criteria have been added to the database. This database is updated daily.

Can I cite data I find in GEO as evidence to support my own research?

Yes. Users often cite data they find in GEO to support their own studies, please see the list of third-party usage citations and guidelines for Citing data you find in GEO.

What is the difference between a Series and a DataSet?

A GEO Series (GSExxx) is an original submitter-supplied record that summarizes a study. These data are reassembled by GEO staff into curated GEO Datasets (GDSxxx). A DataSet represents a collection of biologically- and statistically-comparable Samples processed using the same Platform. Information reflecting experimental variables is provided through DataSet subsets. Both Series and DataSets are searchable using the GEO DataSets interface, but only DataSets form the basis of GEO's advanced data display and analysis tools including gene expression profile charts and DataSet clusters, see the Data organization document for more information. Not all submitted data are suitable for DataSet assembly and we are experiencing a backlog in DataSet creation, so not all Series have a corresponding DataSet record(s). When a curated DataSet is not available, it may be appropriate to analyze the Series using GEO2R which compares groups of Samples and identifies differentially expressed genes.

Why can't I find gene profile charts or DataSet clusters for my study of interest?

As explained in the What is the difference between a Series and a DataSet? FAQ above, suitable submitter-supplied GEO Series records are reassembled by GEO staff into curated DataSets. At periodic intervals, these DataSets are then indexed and loaded into GEO Profiles and GEO DataSets, which allows users to query gene names, visualize charts and clusters, and more. If your Series of interest has not yet been assembled into a DataSet these features will not be available, but it may be appropriate to analyze the Series using GEO2R which compares groups of Samples and identifies differentially expressed genes.

What do the red bars and blue squares represent in GEO profile charts?

In GEO Profile charts, the red bars represent values extracted from original GEO Sample records as supplied by submitters. For single channel data, values are assumed to be submitted as normalized signal count data, reflecting the relative measure of abundance of each transcript. For Affymetrix data, the "detection call" (A=absent, P=present, M=marginal) data are taken into consideration, if supplied (absent calls faded out). For dual channel experiments values are normalized log ratios, and SAGE values reflect "tags per million" counts. The blue squares represent the percentile ranked value of a spot compared to all other spots within that Sample. That is, all values within each Sample are rank ordered and placed into rank percentile 'bins'. This gives an indication of the relative expression level of that gene compared to all other genes on the array. Value profiles are plotted on a scale that fits each individual gene, whereas rank data are always plotted on a scale of 0-100%.

Why can't I find supplementary/raw data for my study of interest?

Supplementary data are made available for download from GEO's public FTP site and from links at the bottom of Series records and GEO DataSets results pages. All microarray submitters have been asked to provide supplementary data (for example, Affymetrix .cel files) to accompany their GEO records. If supplementary data links are not provided for your study of interest, we suggest that you contact the submitter directly to encourage that they supply supplementary data files to GEO so that we may make them available to the scientific community.

What data types are provided with next-generation sequence submissions?

Processed sequence data files: GEO hosts processed sequence data files either within the Sample tables, or linked at the bottom of Sample and Series records as Supplementary files on the FTP site. Requirements for processed data files are not yet fully standardized and will depend on the nature of the study, but typically can include alignment, peak, and/or count data.

Raw sequence data files: Raw data have been uploaded to NCBI's Sequence Read Archive (SRA) database, and linked from the bottom of Sample and Series records. The SRA database has ceased to generate static 'fastq' format dumps for download. Instead, data are made available in 'sra' format, which users can download and convert into their desired format using the SRA SDK toolkit. For more information, see the SRA Download Guide. If you have questions about SRA format or the SRA toolkit, please contact the SRA team at sra@ncbi.nlm.nih.gov.

What is GEO BLAST?

The GEO BLAST tool queries the GEO Profiles database for molecular abundance profiles of interest based on nucleotide sequence similarity. The GEO BLAST database contains all GenBank sequences represented on selected microarray Platforms or SAGE libraries in GEO. This interface is helpful in identifying sequence homologs of interest, e.g., related gene family members or for cross-species comparisons.

Last modified: September 5, 2012