Modifiers for FASTA Definition Lines
NCBI Logo Modifiers for FASTA Definition Lines
Sequin Entrez BLAST OMIM Taxonomy Structure

General Format

Source information contained within FASTA definition lines can be automatically fielded to the appropriate feature or descriptor using Sequin or tbl2asn. Listed below are the currently available modifiers. You may include as many modifiers as you like, but each must be bounded by a set of brackets. The name of the modifier must be written exactly as shown in the list below. An example of a string of modifiers is

[organism=Mus musculus] [strain=BALB/c] [chromosome=5] [sex=male] [tissue-type=testis] [moltype=mRNA]

Do not use hard returns between the bracketed data. The FASTA definition line must be a single line of text and can not contain a hard return. If you have trouble importing your FASTA sequences, please confirm that a hard return was not inserted by your editing software.

Source Modifier List

Descriptions of these modifiers can be found in the Sequin help documentation. These source modifiers should be used in the format

 [modifier=text]
  • acronym
  • anamorph
  • authority
  • bio-material
  • biotype
  • biovar
  • breed
  • cell-line
  • cell-type
  • chemovar
  • chromosome
  • clone
  • clone-lib
  • collected-by
  • collection-date
  • common
  • country
  • cultivar
  • culture-collection
  • dev-stage
  • ecotype
  • endogenous-virus-name
  • forma
  • forma-specialis
  • fwd-PCR-primer-name
  • fwd-PCR-primer-seq
  • genotype
  • group
  • haplogroup
  • haplotype
  • host
  • identified-by
  • isolate
  • isolation-source
  • lab-host
  • lat-lon
  • linkage-group
  • map
  • note
  • organism
  • pathovar
  • plasmid-name
  • plastid-name
  • pop-variant
  • rev-PCR-primer-name
  • rev-PCR-primer-seq
  • segment
  • serogroup
  • serotype
  • serovar
  • sex
  • specimen-voucher
  • strain
  • sub-species
  • subclone
  • subgroup
  • substrain
  • subtype
  • synonym
  • teleomorph
  • tissue-lib
  • tissue-type
  • type
  • variety

Culture-collection has a mandatory format of "institution code:collection code:culture_id". However, collection code is not required. Specimen-voucher and bio-material have optional structured formats.

Other modifiers do not include any submitter provided text. The format for these modifiers in the FASTA definition line is

[modifier= ]
or
[modifier=TRUE]

Modifiers using this format are

environmental-sample
germline
metagenomic
rearranged
transgenic

Descriptors with Controlled Vocabulary

Many of the descriptors that refer to the molecule sequenced and the genetic code can be edited using the FASTA definition line. In all cases, these descriptors have a controlled vocabulary and should only be added when their values differ from the default value.

  • The default molecule type is genomic DNA. If the submission was derived from mRNA, you can change from the default in the Organism and Sequences page during submission with Sequin or you can add this information to the FASTA definition line. When using tbl2asn to submit an mRNA sequence, you must specify the molecule type in the FASTA definition line. To do this, use

    [moltype=mRNA]

  • In order to specify a genetic code for an organism that is not yet listed in the NCBI Taxonomy Browser , you can use the modifiers "gcode" or "mgcode" in the FASTA definition line. The inclusion of mgcode is only necessary if the sequence is derived from the mitochondrion. In both cases, the value of the modifier must be the integer assigned to the appropriate genetic code. For example,

    [gcode=1] 
    or
    [mgcode=5]
    

    would set the nuclear genetic code to "The Standard Code" (translation table 1) or the mitochondrial genetic code to "The Invertebrate Mitochondrial Code" (translation table 5).

     

    Questions or Comments?
    Write to the NCBI Service Desk

    Revised August 26, 2010.