Annotation Examples

mRNA sequence

Top
Relevant feature information for a mRNA (cDNA) sequence encoding a protein:
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.

Example:

Homo sapiens prolidase (PEPD) mRNA, complete cds.

                      source          1..1888
                                      /organism="Homo sapiens"
                                      /chromosome="19"
                                      /map="19q12-q13.2"
                                      /cell_type="fibroblasts"
                                                 
                      gene            1..1888
                                      /gene="PEPD"
                                                 
                      CDS             17..1498
                                      /gene="PEPD"
                                      /EC_number="3.4.13.9"
                                      /note="imidodipeptidase"
                                      /product="prolidase"
       
         

Prokaryotic gene

Top
Relevant feature information for a prokaryotic genomic sequence encoding a protein:
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.

Example:

Escherichia coli RecA protein (recA) gene, complete cds.

                 source          1..3300
                                 /organism="Escherichia coli"
                                 /strain="K-12"
                                 
                 gene            783..1961
                                 /gene="recA"
                                 
                 CDS             783..1961
                                 /gene="recA"
                                 /function="DNA repair protein"
                                 /product="RecA protein"
          

Eukaryotic gene

Top
Relevant feature information for a eukaryotic genomic sequence encoding a protein:
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.

Example:

Caenorhabditis elegans tyrosine kinase PTK-2 (ptk-2) gene, complete cds.

                 source          1..3180
                                 /organism="Caenorhabditis elegans"

                 gene            211..3011
                                 /gene="ptk-2"
                                 
                 mRNA            join(211..288,533..703,763..890,940..1024,
                         1084..1380,1838..1962,2018..2099,2301..3011)
                                 /gene="ptk-2"
                                 /product="protein kinase PTK-2"
                                 
                 CDS             join(250..288,533..703,763..890,940..1024,
                                 1084..1380,1838..1962,2018..2099,2301..2456)
                                 /gene="ptk-2"
                                 /product="protein kinase PTK-2"               
         

rRNA and/or ITS

Top
Relevant feature information for a genomic sequence containing structural RNAs and/or spacers:
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.

Example:

Saccharomyces cerevisiae 18S ribosomal RNA gene, partial sequence; internal transcribed spacer 1, 5.8S ribosomal RNA gene and internal transcribed spacer 2, complete sequence; and 28S ribosomal RNA gene, partial sequence.

                 source          1..540
                                 /organism="Saccharomyces cerevisiae"
                                 /strain="UMD 334"

                 rRNA            <1..5
                                 /product="18S ribosomal RNA"
                                 
                 misc_RNA        6..178
                                 /product="internal transcribed spacer 1"
                                 
                 rRNA            179..377
                                 /product="5.8S ribosomal RNA"
                                 
                 misc_RNA        378..519
                                 /product="internal transcribed spacer 2"
                                 
                 rRNA            520..>540
                                 /product="28S ribosomal RNA"

            

Promoter region

Top
Relevant feature information for promoter, genomic 5' flanking sequence, or genomic 3' flanking sequence:
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.

Example:

Homo sapiens enhancer-binding protein 2 (EBP2) gene, promoter region and partial cds.
  
              source          1..3061
                              /organism="Homo sapiens"
                              /chromosome="15"
                              /map="15q13"
                              /cell_line="H441"
                              /tissue_type="lung"
                             
              gene            1..>3061
                              /gene="EBP2"
                              
              promoter        1..2947
                              /gene="EBP2"
                              
              TATA_signal     2918..2923
                              /gene="EBP2"
                             
              mRNA            2948..>3061
                              /gene="EBP2"
                              /product="enhancer-binding protein 2"
                             
              5'UTR           2948..3010
                              /gene="EBP2"
                             
              CDS             3011..>3061
                              /gene="EBP2"
                              /product="enhancer-binding protein 2"
         

Viral sequence

Top
Relevant feature information for a viral sequence:
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.

Example:

Human adenovirus 3 strain RKI-4263/07 hexon (H) gene, partial cds.  

           source          1..1520
                           /organism="Human adenovirus 3"
                           /mol_type="genomic DNA"
                           /strain="RKI-4263/07"
                           /serotype="3"
                           /host="Homo sapiens"
                           /db_xref="taxon:45659"
                           /country="Germany"
                           /collection_date="Apr-2007"

           gene            <1..>1520
                           /gene="H"
                           
           CDS             <1..>1520
                           /note="major capsid protein"
                           /codon_start=1
                           /product="hexon"
        

HIV-1

Top
Relevant feature information for an HIV-1 sequence:

We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.

Example:

HIV-1 isolate X clone 5601 from USA, complete genome.

               source          1..9720
                               /organism="Human immunodeficiency virus type 1"
                               /clone="5601"
                               /isolate="X"
                               /country="USA"

               LTR             1..634

               gene            789..2291
                               /gene="gag"

               CDS             789..2291
                               /gene="gag"
                               /product="gag protein"

               gene            2084..5095
                               /gene="pol"
                               
               CDS             2084..5095
                               /gene="pol"
                               /product="pol protein"

               gene             5040..5618
                               /gene="vif"
                               
               CDS             5040..5618
                               /gene="vif"
                               /product="vif protein"

               gene             5558..5848
                               /gene="vpr"

               CDS             5558..5848
                               /gene="vpr"
                               /product="vpr protein"

               gene             5829..8476
                               /gene="tat"

               CDS             join(5829..6043,8386..8476)
                               /gene="tat"
                               /product="tat protein"

               gene             5968..8660
                               /gene="rev"

               CDS             join(5968..6043,8386..8660)
                               /gene="rev"
                               /product="rev protein"

               gene             6060..6305
                               /gene="vpu"
                               
               CDS             6060..6305
                               /gene="vpu"
                               /product="vpu protein"

               gene            6223..8802
                               /gene="env"
                               /pseudo
                               
               gene             8804..9070
                               /gene="nef"
                               
               CDS             8804..9070
                               /gene="nef"
                               /product="nef protein"

               LTR             9086..9719
               
               polyA_signal    9612..9617

          

Influenza viruses

Top
Relevant feature information for Influenza sequences:

For Influenza A and B submissions, use the Influenza Virus Resource Annotation webtool to create a feature table: http://www.ncbi.nlm.nih.gov/genomes/FLU/Database/annotation.cgi

Example:

Influenza A virus (A/Wisconsin/28/2011 (H1N1)) segment 8 nuclear export protein (NEP) and nonstructural protein 1 (NS1) genes, complete cds.
             source          1..864
                             /organism="Influenza A virus (A/Wisconsin/28/2011(H1N1))"
                             /mol_type="viral cRNA"
                             /strain="A/Wisconsin/28/2011"
                             /serotype="H1N1"
                             /host="Homo sapiens"
                             /segment="8"
                             /country="USA"
                             /collection_date="01-Dec-2011"
                             /note="C1 passage(s)"

             gene            1..838
                             /gene="NEP"                
                             /gene_synonym="NS2"
                           
             CDS             join(1..30, 503..838)
                             /gene="NEP"
                             /note="nonstructural protein 2"
                             /product="nuclear export protein"

             gene            1..660
                             /gene="NS1"                
                           
             CDS             1..660
                             /gene="NS1"
                             /product="nonstructural protein 1"
                           
          

Transposon or insertion sequence

Top
Relevant feature information for transposons or insertion sequences: Optional:
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.

Example:

Bacillus subtilis strain RS2 transposon BLT transposase (tnpA) gene, complete cds

             source          1..1221
                             /organism="Bacillus subtilis"
                             /strain="RS2"

             repeat_region   21..1127
                             /rpt_type="dispersed"
                             /mobile_element="transposon: BLT"

             repeat_region   21..61
                             /rpt_type=inverted
                           
             gene            128..1034
                             /gene="tnpA"                
                           
             CDS             128..1034
                             /gene="tnpA"
                             /product="transposase"
                           
             repeat_region   1085..1127
                             /rpt_type=inverted

        

Microsatellite sequence

Top
Relevant feature information for a microsatellite sequence:
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.

Example #1:

Chorthippus parallelus clone IIB-G5 microsatellite sequence.

             source          1..288
                             /organism="Chorthippus parallelus"
                             /mol_type="genomic DNA"
                             /db_xref="taxon:37639"
                             /clone="IIB-G5"

             repeat_region   1..288
                             /rpt_type=tandem
                             /satellite="microsatellite"
        

Example #2:

Noturus exilis voucher KU 40271 microsatellite Noex254 sequence.

             source          1..556
                             /organism="Noturus exilis"
                             /mol_type="genomic DNA"
                             /specimen_voucher="KU 40271"
                             /db_xref="taxon:61323"
                             /clone="Noex_02_03_H06"
                             /PCR_primers="fwd_seq: catgtttgcacaaagggaaa, rev_seq:
                             atgtggatgcagattgtgga"

             repeat_region   77..100
                             /rpt_type=tandem
                             /rpt_unit_range=77..100
                             /rpt_unit_seq="ca"
                             /satellite="microsatellite:Noex254"
        

Repeat regions

Top
Relevant feature information for sequences containing repeat regions:
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.

Example:

Homo sapiens repeat regions

             source          1..2050
                             /organism="Homo sapiens"
                             /chromosome="6"
                             /map="6q25"
                             
             repeat_region   8..126
                             /rpt_type=dispersed
                             /rpt_family="B2" 
                                               
             repeat_region   197..344
                             /rpt_type="direct"
                             /rpt_unit="197..220"
                                                  
             repeat_region   389..673
                             /rpt_family="AluSx"
                             /rpt_type=dispersed
                             
             repeat_region   847..876
                             /rpt_type="tandem"
                             /rpt_unit="ca"
                             /satellite="microsatellite:BT21"
                             
        

Pseudogene

Top
Relevant feature information for a pseudogene sequence:
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.

Example:

Mus musculus DNA methyltransferase (Dmt1) pseudogene, complete sequence.

             source          1..2131
                             /organism="Mus musculus"
                             /strain="SvJ/129"
                             
             gene            123..1444
                             /gene="Dmt1"
                             /note="DNA methyltransferase 1"
                             /pseudo
        

Translocation and/or fusion protein

Top
Relevant feature information for a sequence resulting from a chromosomal translocation: if the translocation results in a fusion protein, please include:
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.

Example:

Homo sapiens SYT/SSX4 fusion protein mRNA, complete cds.

             source          1..2935
                             /organism="Homo sapiens"
                             /tissue_type="sarcoma"
                             /map="t(18;X)(q11.2;p11.2)"

             source          1..1242
                             /organism="Homo sapiens"
                             /chromosome="18"
                             /map="18q11.2"

             CDS             1..1479
                             /product="SYT/SSX4 fusion protein"

             source          1243..2935
                             /organism="Homo sapiens"
                             /chromosome="X"
                             /map="Xp11.2"

             3'UTR           1480..2935
        

Cloning vector

Top
Relevant feature information for a cloning vector Optional:
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.

Example:

Cloning vector pRB223, complete sequence

             source          1..4361
                             /organism="Cloning vector pRB223"
                             
             gene            86..1276
                             /gene="tet"
                             
             CDS             86..1276
                             /gene="tet"
                             /product="tetracycline resistance protein"

             RBS             1905..1909
                             /note="Shine-Dalgarno sequence"
                             
             rep_origin      2535
                             
             gene            complement(3293..4194)
                             /gene="bla"
                             
             CDS             complement(3293..4153)
                             /gene="bla"
                             /product="beta-lactamase"

             misc_feature    4069..4125
                             /note="multiple cloning site"

             RBS             complement(4161..4165)
                             /gene="bla"     
                             /note="Shine-Dalgarno sequence"

             promoter        complement(4188..4194)
                             /gene="bla"
        

Gapped sequence

Top

A gapped sequence includes both known, directly sequenced data and unknown data. The unknown sections of sequence are represented by strings of 'nnn' between the known, directly sequenced, contiguous data. All pieces of a gapped sequence must be from the same source and be in the same orientation and in the correct order.

Relevant feature information for a gapped sequence:
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.

Example:

Homo sapiens MHC class I antigen (HLA-B) gene, HLA-B_458_01445 allele, exons 2, 3 and partial cds.

         source          1..788
                         /organism="Homo sapiens"
                         /mol_type="genomic DNA"
                         /db_xref="taxon:9606"

         gene            <1..>788
                         /gene="HLA-B"
                         /allele="HLA-B_458_01445"

         mRNA            join(<1..270,513..>788)
                         /gene="HLA-B"
                         /allele="HLA-B_458_01445"
                         /product="MHC class I antigen"

         CDS             join(<1..270,513..>788)
                         /gene="HLA-B"
                         /allele="HLA-B_458_01445"
                         /codon_start=3
                         /product="MHC class I antigen"
                         /protein_id="ACR38915.1"
                         /db_xref="GI:238055051"
                         /translation="SHSMRYFDTAMSRPGRGEPRFISVGYVDDTQFVRFDSDAASPRE
                         EPRAPWIEQEGPEYWDRNTQIFKTNTQTDRESLRNLRGYYNQSEAGSHTLQSMYGCDV
                         GPDGRLLRGHDQSAYDGKDYIALNEDLRSWTAADTAAQITQRKWEAARVAEQDRAYLE
                         GTCVEWLRRYLENGKDTLERA"

         exon            1..270
                         /gene="HLA-B"
                         /allele="HLA-B_458_01445"
                         /number=2

         gap             271..512
                         /estimated_length=242

         exon            513..788
                         /gene="HLA-B"
                         /allele="HLA-B_458_01445"
                         /number=3
      

Phylogenetic or population set

Top
Relevant feature information for population or phylogenetic studies:

A set comprises a group of sequences that represent the same gene or locus in different organisms or in different isolates, strains, or clones of the same organism. A set can be, for example, phylogenetic (different organisms), population (same organism), or environmental (unclassified or unknown organisms).


We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.

EST submissions

Top
Please submit directly to dbEST: the EST division of GenBank.

GSS submissions

Top
Please submit directly to dbGSS: the GSS division of GenBank.

STS submissions

Top
Relevant feature information for STS submissions:
OR
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.

HTGS submissions

Top
Requirements for HTGs submissions:

FLICs submissions

Top
Relevant feature information for FLIC submissions: Optional:
We strongly suggest that you provide as much of the above information as possible to ensure the most complete annotation of your sequence. If any of this information is not known, please inform us.