National Cancer Institute Home at the National Institutes of Health |

SEER 9 Delay Model

For each cancer site, many combinations of covariates were considered in prediction models of delay probabilities. Potential covariates included delay time, year of diagnosis, age at diagnosis, sex, race, and reporting year effect (Zou et al., 2009). Models were evaluated by fitting the SEER 9 models using annual submissions from 1983 through 2010 , with a maximum 30 year delay, then predicting the counts for the 2011 submission. For each cancer site, the model that minimized the sum of squared prediction errors was chosen as the default final model. However, to choose a more parsimonious model, we added an additional selection step in which possible competing models were selected using the following criteria:

  • the competing model had fewer number of parameters of the default model, and
  • the percent change between the prediction errors of the competing and the default models per extra parameter (i.e., percent change in prediction errors divided by the difference in the numbers of parameters between the two models) was less than 1 percent.

If more than one competing model met the criteria, the model with the smallest percentage change per extra parameter was generally selected. However, if there are other competing models that had fewer parameters and the differences between their percentage changes per extra parameter and the smallest one did not exceed 0.02, the competing model with the fewest number of parameters (rather than the model with the smallest percentage change per extra parameter) was selected. The chosen model was then refitted using all data (1983-2011 submissions, 1981-2009 diagnosis years) to estimate delay distributions and calculate delay adjusted estimates of the cancer counts.

Age-adjusted (using the 2000 US standard million population) cancer incidence rates were then calculated with and without adjusting for reporting delay. Joinpoint linear regression was used to obtain the annual percentage changes for the 1975-2009 incidence rates for the data series with and without delay adjustment. Because the delay distribution was assumed complete after 30 years, incidence rates for diagnosis years prior to 1982 were not reporting-adjusted. In joinpoint regression analyses, up to five change points (i.e, 6 trend-line segments) were allowed, and these were modeled to fall at either whole years or midway between diagnosis years. Change points were constrained to be at least 2 years away from both the beginning and the end of the data series and at least 2 years apart. Models were fitted using weighted least squares (weighted by appropriate variances of age-adjusted incidence rates) of the joinpoint regression software.

Results show that adjusting for delay tends to raise cancer incidence rates in more current reporting years. While this adjustment increases the rate of change over the most recent diagnosis years, it probably will only rarely cause the detection of a new joinpoint, although this is possible. See Clegg et al. (2002) for details on the impact of reporting-delay adjustment to SEER cancer incidence rates.