Follow-Back Year and Age-Conditional Probability of Cancer

DevCan utilizes incidence rates which represent the first instance of a specific cancer type for each individual. Follow-back is the process of checking back in the SEER data to be sure that only the first incident case of a cancer type is included for each person. The follow-back year is the earliest year we check for previous occurrences.

We use the same follow-back year for every individual in each data set. For example, when using the SEER 18 data, even though we may be able to go back to 1973 for a person diagnosed in Iowa, for a person diagnosed in Kentucky we only have data back to 1995 (the 1995-1999 cases from California excluding SF/SJM/LA, Kentucky, Louisiana and New Jersey are not publicly available). Thus, we use 1995 as the follow-back year for all people in the SEER 18 data. The purpose of this policy is to prevent biases. An example of a possible bias from using different follow-back years for different people could occur in comparing age conditional probabilities of lung cancer between Hispanics and Non-Hispanics for the years 2000-2002. In this case a bias could result since Los Angeles and San Jose-Monterey, which were added to SEER in 1992, have a higher percentage of Hispanics than the rest of SEER. If we checked each individual for other incidences of lung cancer as far as we could, then a higher proportion of Hispanics would be checked only from 1992 until the present, in contrast to Non-Hispanics, a higher percentage of whom could be checked for other incidences as far back as 1973. Thus, the Non-Hispanics may have a lower incidence of first lung cancer since more of the incidences found during 2000-2002 would not have been the first incidence found. By using the same follow-back year for every person in the data set, we avoid such biases. July through December 2005 Louisiana cases/populations were not included in the incidence rates, but the cases were considered for follow-back purposes (i.e. to identify first instances of a specific type of cancer for individuals).

Each tumor record in the SEER data includes sequence information which allows us to determine if the diagnosis is the first for that person. If the sequence information indicates that the person had tumors prior to the follow-back period, we have to make assumptions about the prior tumors. For DevCan's purposes, we make the assumption that these tumors were invasive, but cannot be attributed to a particular cancer site. Therefore, using less than the maximum follow-back for each registry will cause a slight increase in the individual cancer site incidence rates and a slight decrease in the incidence rate for all sites combined (invasive only).