In English | En español
Questions About Cancer? 1-800-4-CANCER

Cancer Genetics Overview (PDQ®)

  • Last Modified: 08/08/2012

Page Options

  • Print This Page
  • Print This Document
  • View Entire Document
  • Email This Document

Genome-wide Association Studies (GWAS)

GWAS are showing great promise in identifying common, low-penetrance susceptibility alleles for many complex diseases,[1] including cancer. This approach can be contrasted with linkage analysis, which searches for genetic-risk variants cosegregating within families that have a high prevalence of disease. While linkage analyses are designed to uncover rare, highly penetrant variants that segregate in predictable heritance patterns (e.g., autosomal dominant, autosomal recessive, X-linked, and mitochondrial), GWAS are best suited to identify multiple, common, low-penetrance genetic polymorphisms. GWAS are conducted under the assumption that the genetic underpinnings of complex phenotypes, such as prostate cancer, are governed by many alleles, each conferring modest risk. Most genetic polymorphisms genotyped in GWAS are common, with minor allele frequencies greater than 1% to 5% within a given population (e.g., men of European ancestry). GWAS capture a large portion of common variation across the genome.[2,3] The strong correlation between many alleles located close to one another on a given chromosome (called linkage disequilibrium) allows one to “scan” the genome without having to test all 10 million known single nucleotide polymorphisms (SNPs). With GWAS, researchers can test 500,000 to 1 million SNPs per study and ascertain almost all common inherited variants in the genome.

In a GWAS, allele frequency for each SNP is compared between cases and controls. Promising signals–in which allele frequencies deviate significantly in case and control populations–are validated in replication cohorts. To have adequate statistical power to identify variants associated with a phenotype, large numbers of cases and controls, typically thousands of each, are studied. Since up to 1 million SNPs are evaluated in a GWAS, false-positive findings are expected to occur frequently when using standard statistical thresholds. Therefore, stringent statistical rules are used to declare a positive finding, usually using a threshold of P < 1 × 10-7.[4-6]

To date, well over 100 cancer-risk variants have been identified by well-powered GWAS and validated in independent cohorts. These studies have revealed convincing associations between specific inherited variants and cancer risk. However, the findings should be qualified with a few important considerations:

  1. GWAS reported thus far have been designed to identify relatively common genetic polymorphisms. It is very unlikely that an allele with high frequency in the population by itself contributes substantially to cancer risk. This, coupled with the polygenic nature of tumorigenesis, means that the contribution by any single variant identified by GWAS to date is quite small, generally with an odds ratio for disease risk of less than 1.5. In addition, despite extensive genome-wide interrogation of common polymorphisms in tens of thousands of cases and controls, GWAS findings to date do not account for even half of the genetic component of cancer risk.[7]

  2. Variants uncovered by GWAS are not likely to directly contribute to disease risk. As mentioned above, SNPs exist in linkage disequilibrium blocks and are merely proxies for a set of variants–both known and previously undiscovered–within a given block. The causal allele is located somewhere within that linkage disequilibrium block.

  3. Admixture by groups of different ancestry can confound GWAS findings (i.e., a statistically significant finding could reflect a disproportionate number of subjects in the cases versus controls, rather than a true association with disease). Therefore, GWAS are typically powered to analyze a single predominant ancestral group. As a result, many populations remain underrepresented in genome-wide analyses.

The implications of these points are discussed in greater detail in the following PDQ summaries: Genetics of Breast and Ovarian Cancer, Genetics of Colorectal Cancer, and Genetics of Prostate Cancer. Additional details can be found elsewhere.[8]


  1. Wellcome Trust Case Control Consortium.: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447 (7145): 661-78, 2007.  [PUBMED Abstract]

  2. The International HapMap Consortium.: The International HapMap Project. Nature 426 (6968): 789-96, 2003.  [PUBMED Abstract]

  3. Thorisson GA, Smith AV, Krishnan L, et al.: The International HapMap Project Web site. Genome Res 15 (11): 1592-3, 2005.  [PUBMED Abstract]

  4. Evans DM, Cardon LR: Genome-wide association: a promising start to a long race. Trends Genet 22 (7): 350-4, 2006.  [PUBMED Abstract]

  5. Cardon LR: Genetics. Delivering new disease genes. Science 314 (5804): 1403-5, 2006.  [PUBMED Abstract]

  6. Chanock SJ, Manolio T, Boehnke M, et al.: Replicating genotype-phenotype associations. Nature 447 (7145): 655-60, 2007.  [PUBMED Abstract]

  7. Ioannidis JP, Castaldi P, Evangelou E: A compendium of genome-wide associations for cancer: critical synopsis and reappraisal. J Natl Cancer Inst 102 (12): 846-58, 2010.  [PUBMED Abstract]

  8. Jorgenson E, Witte JS: Genome-wide association studies of cancer. Future Oncol 3 (4): 419-27, 2007.  [PUBMED Abstract]