DOE Genomes
-

Human Genome Project Information


How Many Genes Are in the Human Genome?

Basic Information
 FAQs
 Glossary
 Acronyms
 Links
 Genetics 101
 Publications

 Media Guide

About the Project
 What is it?
 Goals
 Landmark Papers
 Sequence Databases
 Timeline
 History
 Ethical Issues
 Benefits
 Genetics 101
 FAQs

Medicine &
the New Genetics

 Home
 Gene Testing
 Gene Therapy
 Pharmacogenomics

 Disease Information
 Genetic Counseling

Ethical, Legal,
Social Issues

 Home
 Privacy Legislation

 Gene Testing
 Gene Therapy
 Patenting
 Forensics
 Genetically Modified Food
 Behavioral Genetics
 Minorities, Race, Genetics
 Human Migration

Education
 Teachers
 Students
 Careers
 Webcasts
 Images
 Videos
 Chromosome Poster
 Presentations
 Genetics 101
 
Genética Websites en Español

Research
 Home
 Sequence Databases
 Landmark Papers
 Insights

Publications
 Chromosome Poster
 Primer Molecular Genetics
 List of All Publications

  ???Search This Site

 

 Contact Us
 Privacy Statement

 Site Stats and Credits
 Site Map

Although the completion of the Human Genome Project was celebrated in April 2003 and sequencing of the human chromosomes is essentially "finished," the exact number of genes encoded by the genome is still unknown. October 2004 findings from The International Human Genome Sequencing Consortium, led in the United States by the National Human Genome Research Institute (NHGRI) and the Department of Energy (DOE), reduce the estimated number of human protein-coding genes from 35,000 to only 20,000-25,000, a surprisingly low number for our species (7). Consortium researchers have confirmed the existence of 19,599 protein-coding genes in the human genome and identified another 2,188 DNA segments that are predicted to be protein-coding genes.

In 2003, estimates from gene-prediction programs suggested there might be 24,500 or fewer protein-coding genes (1). The Ensembl genome-annotation system estimates them at 23,299.

When analysis of the draft human genome sequence was published by the International Human Genome Sequencing Consortium on February 15, 2001, the paper estimated only about 30,000 to 40,000 protein-coding genes, much lower than previous estimates of about 100,000. This lower estimate came as a shock to many scientists because counting genes was viewed as a way of quantifying genetic complexity. With about 30,000, the human gene count would be only one-third greater than that of the simple roundworm C. elegans, which has about 20,000 genes (2).

Studies since the publication of the draft genome sequence have generated widely different estimates. An analysis by scientists at Ohio State University suggested between 65,000 and 75,000 human genes (3), and another study published in Cell in August 2001 predicted a total of 42,000 (4).

Although the exact number of human genes is still uncertain, a winner of GeneSweep was announced in May 2003. GeneSweep was an informal gene-count betting pool that began at the 2000 Cold Spring Harbor Laboratory Genome Meeting. Bets ranged from around 26,000 to more than 150,000 genes. Since most gene-prediction programs were estimating the number of protein-coding genes at fewer than 30,000, GeneSweep officials decided to declare the contestant with the lowest bet (25,947 by Lee Rowen of the Institute of Systems Biology in Seattle) the winner (1).

It could be years before a truly reliable gene count can be assessed. The reason for so much uncertainty is that predictions are derived from different computational methods and gene-finding programs. Some programs detect genes by looking for distinct patterns that define where a gene begins and ends ("ab initio" gene finding). Other programs look for genes by comparing segments of sequence with those of known genes and proteins (comparative gene finding). While ab initio gene finding tends to overestimate gene numbers by counting any segment that looks like a gene, comparative gene finding tends to underestimate since it is limited to recognizing only those genes similar to what scientists have seen before. Defining a gene is problematic because small genes can be difficult to detect, one gene can code for several protein products, some genes code only for RNA, two genes can overlap, and many other complications (5).

Even with improved genome analysis, computation alone is simply not enough to generate an accurate gene number. Clearly, gene predictions will have to be verified by labor-intensive work in the laboratory before the scientific community can reach any real consensus (6).


Related Websites

NCBI Human Genome - Release notes for the most current build of the human genome from the National Center for Biotechnology Information (NCBI) used in its genome browser called Map Viewer.

UCSC Human Genome Browser Gateway - Genome browser maintained by the Genome Bioinformatics Group of the University of California, Santa Cruz. Human genome data based on the most recent build available from NCBI.

Ensembl Human Genome - The most current human genome release available from the European Bioinformatics Institute's human genome browser. The Ensembl release is derived from the NCBI human genome build.


Related Articles

Nature Web Focus: The Human Genome- Nature Publishing Group maintains this website that links to scientific articles reporting the finished genome sequence for each human chromosome.

Updated Summaries of Public Draft of Human Genome Sequence

  • International Human Genome Sequencing Consortium. 2004. "Finishing the Euchromatic Sequence of the Human Genome," Nature 431, 931-945. Available online.
  • Schmutz, J., et al. 2004. "Human Genome: Quality Assessment of the Human Genome Sequence," Nature 429, 365-368. Available online.

Summary of Public Draft of Human Genome Sequence
Lander, E., et al. 2001. "Initial Sequencing and Analysis of the Human Genome," Nature 409, 860-921. Available online.

Summary of Celera's Draft of Human Genome Sequence
Venter, J. Craig, et al. 2001. "The Sequence of the Human Genome," Science 291, 1304-1351. Available online.

"The Nature of the Number," 2000. Nature Genetics 25, 127-28. (Editorial).

Aparicio, S. 2000. "How to Count...Human Genes," Nature Genetics 25, 129-30.

Ewing, B. and P. Green. 2000. "Analysis of Expressed Sequence Tags Indicates 35,000 Human Genes," Nature Genetics 25, 232-34.

Crollius, H. R., et al. 2000. "Estimate of Human Gene Number Provided by Genome-Wide Analysis Using Tetraodon nigroviridis DNA Sequence," Nature Genetics 25, 235-38.

Liang, F., et al. 2000. "Gene Index Analysis of the Human Genome Estimates Approximately 120,000 Gene," Nature Genetics 25, 239-40.


References

  1. Pennisi, E. 2003. "A Low Number Wins the GeneSweep Pool," Science 300, 1484.
  2. Claverie, J. 2001. "Gene Number. What if There are Only 30,000 Human Genes?" Science
    291
    , 1255–7.
  3. Briggs, H. 2001. "Dispute Over Number of Human Genes," BBC News Online.
  4. Wright, F., et al. 2001. "A Draft Annotation and Overview of the Human Genome," Genome Biology 2, 1-18.
  5. Pennisi, E. 2003. "Gene Counters Struggle to Get the Right Answer," Science 301, 1040-1041.
  6. Hollon, T. 2001. "Human Genes: How Many?" The Scientist 15, 1.
  7. Stein, L. D. 2004. "Human Genome: End of the Beginning," Nature 431, 915-916.
    Available online.


Last modified: Friday, September 19, 2008

Home * Contacts * Disclaimer

Document Use and Credits
Publications and webpages on this site were created by the U.S. Department of Energy Genome Program's Biological and Environmental Research Information System (BERIS). Permission to use these documents is not needed, but please credit the U.S. Department of Energy Genome Programs and provide the website http://genomics.energy.gov. All other materials were provided by third parties and not created by the U.S. Department of Energy. You must contact the person listed in the citation before using those documents.

Base URL: www.ornl.gov/hgmis

Site sponsored by the U.S. Department of Energy Office of Science, Office of Biological and Environmental Research, Human Genome Program