Metagenome Submission Guide

Introduction

Uncultured microorganisms comprise the majority of the planet's biological diversity. However, due to the varied environments and conditions in which these organisms reside, many of these cannot be cultured by standard techniques. Culture-independent methods are essential for understanding the genetic diversity, population structure, and ecological roles of these microorganisms.

Metagenomics is the culture-independent genomic analysis of a community of microorganisms. It provides a community-wide assessment of metabolic function and bypasses the need for the isolation and lab cultivation of individual species. The analysis of metagenomic data provides a way to identify new organisms and isolate complete genomes from unculturable species that are present within an environmental sample.

Metagenome Projects within the BioProjects Database may consist of raw sequence reads collected from an ecological or organismal source (submitted to the Trace Archive or Sequence Read Archive), assembled contigs and/or scaffolds derived from the raw sequence data, including partial genomes from taxonomically defined organisms (submitted as a WGS project), and in some cases, supporting sequences such as 16S ribosomal RNAs or fosmids (in regular GenBank). All of these sequences are linked together with a common BioProject ID.

This guide explains how to submit a metagenome project to GenBank, including information on registering BioProjects and submitting sequences to the Trace Archive, Sequence Read Archive and GenBank.

Guidelines for bacterial genome submissions and eukaryotic genome submissions.

If you do not understand any of the instructions presented here or you have questions, please contact us by email at genomes@ncbi.nlm.nih.gov prior to creating your submission.

Table of Contents

  1. Register your Metagenome Project
  2. Submitting sequences to the Trace and Sequence Read Archives
  3. Submitting a WGS project
  4. Other types of metagenome sequence data (eg, fosmids, 16S rRNA)
  5. What happens next

Register your Metagenome Project

Please register your metagenome project on the BioProject registration page as a Metagenome BioProject prior to preparing your submission to GenBank.  A locus-tag prefix will be assigned that should be used if annotation is included with your sequences.  If your project only involves the sequencing of a single gene (eg, 16S ribosomal RNA), it should be described as a Targeted Locus/Loci BioProject (see below for additional information about submitting these types of sequences to GenBank).

Each metagenome project that is registered is assigned a BioProject ID, which will appear on all sequence records associated with that project. Please use this BioProject ID in any correspondence regarding your metagenome project as well as include it within your sequence submissions.

Please also include information about your project, including a detailed description of the isolation source, and the scope of the project.

Submitting sequences to the Trace and Sequence Read Archives

Raw sequence data should be submitted to the Trace Archive or NCBI Sequence Read Archive (SRA).

Traditional gel-capillary reads (including DNA sequence chromatograms, base calls, and quality estimates obtained from sequencing instruments like ABI 3730) should be submitted to the Trace Archive. Sequence reads obtained using next-generation sequencers (eg, 454, Illumina, ABI solid, Helicos) should be deposited in the Sequence Read Archive.

Submitters must include the BioProject ID within the ancillary field ncbi_project_id (Trace Archive) or project_id (Sequence Read Archive) when formatting their submission. Contact sra@ncbi.nlm.nih.gov for questions about submitting to the Trace Archive and NCBI Sequence Read Archive.

Submitting a WGS project

Contigs that have been assembled from raw reads can be submitted as a WGS project. In addition, we may accept raw reads that have not been assembled into contigs, if there are a significant number of assembled contigs also included within the WGS project. However, reads shorter than 200bp should not be included unless they are part of multi-component scaffolds. Contig multiple alignments can be submitted to the Trace Assembly Archive. These records map sequence reads to the contig consensus sequences that were submitted as contigs of the WGS project.

Records in a WGS project can contain annotations, and an entire project is updated as sequencing progresses. Supercontig or assembly information can be sent in agp format, which will allow us to make CON records that indicate how the pieces of the WGS submission are put together. More information about generating agp files for submission can be found here.

Individual contigs within a WGS project that have not been identified taxonomically will be assigned a specific metagenome taxonomy designation . These sequences will remain in GenBank within the ENV division. If the organism has been identified, then the sequence will have a specific taxid that corresponds to that specific organism. Please contact us at genomes@ncbi.nlm.nih.gov before generating your submission so that we can advise how to include the various types of source information.

Other types of metagenome sequence data

Metagenome projects can include other types of sequence data such as 16S ribosomal RNA, fosmid sequences, and/or GSS data. Please include the BioProject ID in any correspondence or submission files that are sent to GenBank. The organism names for these sequences will be "uncultured" (eg, "uncultured bacterium") rather than "metagenome", though they will have the same BioProject ID as all the sequences submitted for the project. These types of data can be submitted to GenBank as follows:

  • 16S ribosomal RNA sequences can be submitted to GenBank using the sequence submission tools Sequin or tbl2asn. Prepared submissions should be emailed to gb-sub@ncbi.nlm.nih.gov or uploaded via Sequin MacroSend. Accession Numbers are generally assigned within two working days.
  • Fosmid end sequences and GSS data can be submitted to the GSS division of GenBank. These sequences must not be annotated. Please contact batch-sub@ncbi.nlm.nih.gov with any questions you have regarding this type of submission.
  • Fosmids, BACs, other genomic fragments assembled from raw reads and/or annotated sequences should be submitted to GenBank, as described above for 16S ribosomal RNA sequences. Please contact gb-admin@ncbi.nlm.nih.gov with any questions you have regarding these types of submissions as well as for instructions regarding their format.

What happens next

Once we receive your metagenome submission(s), a member of our staff will conduct an initial review and will contact you by email. Once we have assigned Accession Numbers to the sequence records associated with your metagenome project and contacted you with any issues, we will prepare it for release to the public database.

You can choose to have your metagenome project and sequence submission(s) released immediately or to be kept confidential until a certain date or publication of the work, whichever is first. If you wish your metagenome project to be held until publication, we ask that you provide us with the expected publication date and also notify us in a timely manner of the upcoming publication and the relevant citation details. This will allow us to coordinate the release of your metagenome project and sequence submission(s) with the appearance of the paper. Please provide at least two weeks notice of any upcoming publication.

Last updated: 2012-08-27T09:28:31-04:00