National Cancer Institute Home at the National Institutes of Health |

Spatial Data Analysis

NCI develops and extends methodology for spatial data analysis to improve the identification of patterns of cancer rates and trends and to highlight areas in need of cancer control interventions. Areas of active research include:

Environmental Exposure Assessment

  • GIS can provide information about potential environmental exposures that cannot be obtained through traditional epidemiologic methods.
  • Study in south central Nebraska demonstrated use of satellite imagery to reconstruct historical crop patterns (Ward et al., Env Health Perspectives, 2000).
  • A Landsat infrared image

    Landsat Imagery
    Color infrared display
    of bands 4,2,1

  • A black and white aerial photograph with the locations of crops indicated

    Farm Service Agency
    Historical aerial photos
    with crops noted

  • A land cover map developed from the Landsat and FSA images

    Classified Land Cover Map

Example of a land cover map for an epidemiologic study of non-Hodgkin's lymphoma

Example: Epidemiologic Study of Non-Hodgkin's Lymphoma (NHL)

In an epidemiologic study of non-Hodgkin's lymphoma, NCI:

  • mapped residences, then assessed proximity of residences to specific crop;
  • assigned probabilities of exposure based on available pesticide use data for each crop; and
  • demonstrated that zones of potential exposure to agricultural pesticides and proximity measures can be determined for residences.

Return to Top

Statistical Modeling

Map of SEER registry locations in the US
  • Cancer incidence prediction project goal is to model data from NCI cancer registries (which cover 470 counties) to predict the number of cases in all states.
  • Use hierarchical Poisson regression models to characterize associations between cancer incidence/mortality and sociodemographic/lifestyle factors by county.
  • These factors explain spatial variation so well that no spatial correlation is needed in the model.
  • Extensions of original models:
    • Spatio-temporal prediction of cancer rates by state
    • Predicted incidence is used to predict prevalence
    • Predicted incidence is used to calculate % completeness of case ascertainment for each cancer registry

Covariate data available for all counties:

  • cancer mortality rates
  • sociodemographic factors (income, schooling, etc.)
  • medical facilities
  • cancer screening utilization
  • smoking, obesity, no insurance

Output: Predicted Incidence Rates

  • Smoothed by County

    US map of smoothed predicted incidence rates for female breast cancer
  • Absolute Rates

    Maps showing absolute rates of predicted cancer incidence in each US state in 1999, for each sex and for a variety of cancer sites
  • Relative Rates

    Maps showing relative rates of predicted cancer incidence in each US state in 1999, for each sex and for a variety of cancer sites

Pickle, Feuer, Edwards. U.S. Predicted Cancer Incidence, 1999: Complete maps by county
and state from spatial projection models.
NIH Pub No 03-5435, 2003.

Return to Top

Outlier Detection for Cancer Surveillance

Lung cancer mortality rates
among white males, 1950-69

Observed rates:

Map showing non-smoothed observed rates of lung cancer among white males, by US county. County borders are sharply delineated.

Smoothed rates (expected pattern):

Map showing smoothed rates of lung cancer among white males, by US county. Regions are more broadly grouped.
  • Can we detect significant outliers (unusual occurrences) of the # of new cancer cases?
  • Applied an empirical Bayes data mining algorithm to test data (DuMouchel & Pregibon, Proc KDD, 2001; Lincoln Technologies, Inc)
  • Method assumes Poisson distribution of # cases, estimates Relative Risk = observed/expected
  • Lung cancer mortality, white males, 1950-69
    • Smoothed map provided expected # cases per county
    • Algorithm compared actual # cases to this expectation
    • Found known "hot spot" in MT, site of copper smelter (Lee & Fraumeni, JNCI, 1969)

      U.S. map showing smoothed lung cancer mortality rates. An arrow indicates a region in Montana with an especially high rate.

Return to Top

Cluster Identification

  • Are apparent map clusters real or random noise?
  • SaTScan software identifies most likely significant cluster over space, time or both
  • Algorithm: spatial scan statistic for Poisson or Bernoulli event data, adjusts for population heterogeneity & covariates
  • Originally identified circular clusters, new version scans for elliptical clusters, various shapes & angles
  • Software:
  • Recently extended to clusters of survival rates

Developed by Martin Kulldorff: Stat in Med, 1995, 1996; Communications in Statistics, 1997; Am J Epidemiology, 1997; Am J Public Health, 1998.

Examples of likely cluster of breast cancer mortality rates in the US

Examples of likely cluster of breast cancer mortality rates in the US [D]

Return to Top