Rationale for the Model-Based Small Area Estimates

National surveys that are large enough to provide estimates at the state, health service area, or county level usually have to be conducted by telephone to be feasible from a cost standpoint. These surveys often have lower response rates than would be ideal, and cannot capture the responses of those who do not have telephones. In addition, estimates for many of the 800+ health service areas and the 3000+ counties in the United States are unstable due to small sample sizes in these areas. Other area level estimates have been available, but do not directly address these issues (see Other Area Level Estimates).

This section supports the case for producing model-based estimates of cancer risk factors and screening behaviors for states, health service areas, and counties. The need arises from the possibility that estimates obtained solely from telephone surveys like the Behavioral Risk Factor Surveillance System (BRFSS) have nonresponse bias (due to difference between survey respondents and non-respondents) and/or non-coverage bias (due to difference between those who live in households with and without telephones). The proposed method rests on the assertion that improved estimates can be obtained by using a model to correct for potential nonresponse and non-coverage bias in the estimates from the BRFSS. The section is divided into six sub-sections. In the first three sub-sections, we demonstrate the potential for these two biases by showing both national- and county-level descriptive statistics for the three estimates described in the Methodology section. Then, the final three sub-sections summarize the following:

  • The county level variation in the non-telephone coverage rate;
  • BRFSS state level response rates and their yearly variation; and
  • County level covariates used in the analysis and their correlation with county prevalence rate estimates.

On this page:

  1. Differences between NHIS Telephone and Non-telephone Households
  2. Comparison of BRFSS to NHIS Telephone and Non-telephone Households
  3. Demonstration of Statistically Significant Differences Between NHIS Telephone-only & BRFSS
  4. Census 2000 Estimate of Non-telephone Households
  5. BRFSS State Level Response Rates
  6. Correlation of County Covariates with Prevalence Rates

Differences between NHIS Telephone and Non-telephone Households

For six outcomes and two time periods, Table 2 presents national level prevalence estimates for telephone and non-telephone households (with standard errors) from National Health Interview Survey (NHIS) as well as the sample size. The table also presents the difference in prevalence estimates for telephone and non-telephone households. Large negative differences are observed for all smoking outcomes and large positive differences are observed for cancer screening outcomes. None of the 95% confidence intervals for the difference contain 0, which indicates statistically significant differences of the prevalence estimates for telephone and non-telephone households in all cases. The large differences of Table 2 indicate potential non-coverage bias for telephone surveys.

Table 2. National level prevalence estimates for telephone and non-telephone households and their difference (with standard errors) from NHIS
Outcome Time period Telephone homes Non-telephone homes Difference:
Telephone - non Telephone ± standard error
Sample size Estimate ± standard error Sample size Estimate ± standard error
Current smoking: males 1997-1999 39,954 25.49 ± 0.29 2,482 51.39 ± 1.39 -25.90 ± 1.42
2000-2003 51,711 24.08 ± 0.25 2,666 48.31 ± 1.21 -24.24 ± 1.24
Current smoking: females 1997-1999 53,657 21.08 ± 0.23 2,302 44.34 ± 1.34 -23.25 ± 1.36
2000-2003 68,855 19.66 ± 0.21 2,332 38.87 ± 1.31 -19.21 ± 1.32
Ever smoking: males 1997-1999 39,954 53.33 ± 0.33 2,482 64.47 ± 1.41 -11.14 ± 1.45
2000-2003 51,711 50.53 ± 0.30 2,666 61.65 ± 1.10 -11.12 ± 1.14
Ever smoking: females 1997-1999 53,657 40.44 ± 0.28 2,302 52.69 ± 1.33 -12.26 ± 1.36
2000-2003 68,855 38.66 ± 0.26 2,332 49.06 ± 1.30 -10.39 ± 1.32
Mammogram within 2 years 1997-1999 19,733 65.65 ± 0.40 467 37.16 ± 2.76 28.49 ± 2.79
2000-2003 20,098 70.60 ± 0.41 391 34.75 ± 2.87 35.85 ± 2.90
Pap smear test within 3 years 1997-1999 32,392 79.77 ± 0.30 1,317 74.44 ± 1.45 5.34 ± 1.48
2000-2003 32,523 80.33 ± 0.28 1,160 73.62± 1.60 6.70 ± 1.62

For six outcomes and two time periods, Table 3 presents the national level direct estimates of prevalence (and standard errors) obtained from NHIS telephone households and from BRFSS. It also presents the differences in the prevalence estimates along with standard errors. From the table, we can see that the smoking prevalence differences between NHIS telephone only and BRFSS for the period 2000-2003 were all negative and significant, while no significant differences were found for the period of 1997-1999 except the ever smoking prevalence for the female group. We can also see that the four screening differences were all negative and statistically significant, and were larger than the smoking prevalence differences. In both time periods, the BRFSS estimate for mammography is more than six percentage points larger than the NHIS telephone household estimate. The large screening prevalence differences suggest the possibility of nonresponse bias for the BRFSS.

Table 3. National level estimates prevalence estimates for NHIS telephone only, BRFSS, and their difference (with standard errors)
Outcome Time period NHIS telephone only BRFSS NHIS telephone only - BRFSS
Estimate St. Error Estimate St. Error Estimate St. Error
Current smoking: males 1997-1999 25.49 0.29 25.27 0.17 0.22 0.34
2000-2003 24.08 0.25 24.86 0.13 -0.78* 0.28
Current smoking: females 1997-1999 21.08 0.23 20.67 0.13 0.41 0.26
2000-2003 19.66 0.21 20.41 0.10 -0.75* 0.23
Ever smoking: males 1997-1999 53.33 0.33 53.33 0.19 0.00 0.38
2000-2003 50.53 0.30 53.19 0.15 -2.66* 0.34
Ever smoking: females 1997-1999 40.44 0.28 41.10 0.16 -0.66* 0.32
2000-2003 38.66 0.26 41.56 0.12 -2.90* 0.29
Mammogram within 2 years 1997-1999 65.65 0.40 72.98 0.18 -7.33* 0.44
2000-2003 70.60 0.41 76.66 0.17 -6.06* 0.44
Pap smear test within 3 years 1997-1999 79.77 0.30 82.29 0.13 -2.52* 0.33
2000-2003 80.33 0.28 83.46 0.13 -3.13* 0.31

* Indicates the difference is statistically significant with p-value<0.05.

[Return to top]

Comparison of BRFSS to NHIS Telephone and Non-telephone Households

Figures 1 and 2 investigate whether there are systematic differences at the county level for the three estimates (NHIS telephone, NHIS non-telephone, and BRFSS telephone). These figures show the mean level (dot) and the standard deviation (a measure of the variation) for the three county level direct estimates for male (18+) current smoking prevalence (yearly) and for mammography prevalence (female 40+, 2 year periods) respectively. From these figures we have the following conclusions:

  • The large differences between the red dots (NHIS non-telephone) and green dots (BRFSS telephone) for both outcomes indicate that non-coverage bias occurs systematically over counties.
  • The differences in levels between the blue dots (NHIS telephone) and green dots (BRFSS telephone) for both outcomes are smaller, and it is hard to tell whether the differences are systematic across counties.

Figure 1: Figure 1: Means and standard deviations of the county level direct estimates for male (18+) current smoking prevalence (yearly)

Figure 1: Means and standard deviations of county level estimates of males current smoking rates

Figure 2: Means and standard deviations of the county level direct estimates for mammography prevalence (female 40+, 2 year period)

Figure 2: Means and standard deviations of county level direct estimates of mammography rates

[Return to top]

Demonstration of Statistically Significant Differences Between NHIS Telephone-only & BRFSS

Table 3 (above) demonstrates at the national level, the NHIS telephone household direct estimate is smaller than the BRFSS direct estimate for both cancer screening outcomes.

In this section, we analyze the county level differences, NHIS telephone only - BRFSS, for mammography for the three year period 1997-1999. Again, we demonstrate that the differences obtained at the national level occur systematically at the county level. In particular, we show that for the vast majority of precise county estimates, the NHIS telephone household estimate is substantially smaller than the BRFSS estimate (equivalently, NHIS telephone only - BRFSS is negative). We show the practical significance of the difference in Figure 3 and the statistical significance of the difference in Figure 4. These two figures are identical in the following:

  • The two axes indicate the county level effective sample size for the two surveys.
  • The values are only shown for counties with effective sample size of 25 or more for both surveys.
  • The outcome is color coded into 5 levels where dark blue indicates a negative value and dark red a positive value.
  • The difference is most accurately estimated in the upper right quadrant, where the effective sample sizes for both surveys are greater than 100.

The two figures differ in the outcome that is color coded; in particular, we show

  • Figure 3 (practical significance): The outcome is the difference between the NHIS telephone household estimate and the BRFSS estimate (NHIS telephone only - BRFSS) for mammography (labeled Diff in the figure).
  • Figure 4 (statistical significance): The outcome is the standardized value of the differences used in Figure 3. The standardized difference (labeled ZDiff in the figure) is obtained by dividing the difference by its standard error.

In Figure 3, we use 5 percentage points as a threshold for practical significance. The figure label shows that for any value colored dark blue, the difference is less than -5 (equivalently the NHIS estimate is at least 5 percentage points less than the BRFSS estimate). The dominant color in the figure is dark blue. Also, in the upper right quadrant, ten of the eleven (91%) of the dots are dark blue. Thus, for most counties, and, especially for the counties with precisely estimated differences, the NHIS telephone household estimate is systematically 5 percentage points less than the BRFSS estimate.

Figure 3. Percent differences (NHIS telephone only – BRFSS)
of mammography, Females 40+ in households with telephones,
1997-1999 for counties with effective sample size >= 25 for both surveys

% Mammography, Female 40+ in households with telephones

In Figure 4, we use two standard deviations as measures of statistical significance (essentially equivalent to a 0.05 significance level). A dark blue dot indicates a statistically significant difference (and that the NHIS estimate is smaller than the BRFSS estimate). In the upper right quadrant (i.e., counties with effective sample sizes greater than 100 for both surveys), six of the eleven (56%) of the dots are dark blue. Thus, for the counties with precisely estimated differences, the majority of the differences between the two estimates are statistically significant, and additionally the NHIS telephone household estimate is less than the BRFSS estimate.

Figure 4. Percent standardized differences (NHIS telephone only – BRFSS)
of mammography, Females 40+ in households with telephones,
1997-1999 for counties with effective sample size >=25 for both surveys

% Females 40+ in households with telephones, 1997-1999 standardized difference (NHIS Phone – BRFSS)

In summary, for counties with precisely estimated differences (counties with effective sample size greater than 100 for both surveys):

  • The NHIS prevalence estimates are at least 5 percentage points smaller than the BRFSS prevalence estimates in almost all cases.
  • The majority of these differences are statistically significant at the 0.05 significance level.

[Return to top]

Census 2000 Estimate of Non-telephone Households

In the model-based approach, the prevalence rate for all households is calculated through a weighted average of the telephone and non-telephone household estimates (see Methodology for more information). The weight used in the calculation is the proportion of households in the county with a telephone obtained from the Census 2000 long-form question:

"Is there telephone service available in this house, apartment or mobile home from which you can make and receive calls?"

The proportion who answered no to this question is used as the county's telephone non-coverage percentage. Figure 5 provides a county level map of the non-coverage percentage and shows the following:

  • The non-coverage is small for most counties with the majority being less than 3 percent.
  • There are a small number of counties with non-coverage percentages over 7 percent (the largest is 46% in Apache County, Arizona).

U.S. Non-Telephone Service Coverage

US Non-telephone service coverage from Census 2000

[Return to top]

BRFSS State Level Response Rates

Figure 6 is a time-series ("spaghetti") plot of the BRFSS state response rates, which are published in the BRFSS Quality Report series, for the period 1997-2003 and shows the following:

  • There is considerable variation in overall levels between state response rates.
  • There is appreciable correlation between response rates over years for most states.

The figure also contains a smoothed curve showing the yearly median and another smoothed curve showing the population weighted mean. The weighted mean lies below the median due to the well known empirical observation that survey response rates typically decrease with increasing population and urbanicity. That is, the more populous states have a lower response rate so the average response rate weighted by population is lower than the median for each year.

Figure 6. BRFSS state response rates for years 1997-2003 with median and population means superposed

BRFSS state response rates for years 1997-2003 with median and population means superposed

Figures 7 and 8 show maps of the state response rates averaged over the two time periods used in our model-based estimates: 1997-1999 and 2000-2003 respectively. There is more potential for nonresponse bias in states with low response rates, but a low response rate does not guarantee a nonresponse bias for any outcome. These figures show the potential for nonresponse bias since there is wide variation between the states and the response rates vary about 50%.

Figure 7. BRFSS Average State Response Rate 1997-1999

 BRFSS Average State Response Rate 1997-1999

Figure 8. BRFSS Average State Response Rate 2000-2003

BRFSS Average State Response Rate 2000-2003

[Return to top]

Correlation of County Covariates with Prevalence Rates

In the second level of the hierarchical model, the 26 county covariates listed in Table 1 are used to predict the theoretical prevalence rates for telephone and non-telephone households. The second level of the model allows the estimates from counties with inadequate data to "borrow strength" from other similar counties. To minimize the possible influence of extreme county covariates values in the model-based estimation, we:

  • transformed each covariate using the logarithmic transformation except indicator variables, and
  • normalized the transformed values to have mean 0 and standard deviation 1.

Figures 9 and 10 show the correlation of two of the county level covariates with selected prevalence rates. Figure 9 shows a scatter plot of the county level BRFSS estimate of the percentage of mammography within 2 years (age 40+) for 1997-1999 versus log of county per capita income. The figure shows a value only for those 1296 (i.e., 42% of all) counties with a reliable estimate obtained from 25 or more responses per county. The figure shows an approximately linear relationship with a Pearson correlation of 0.71. Since per capita income increases with county socioeconomic status (SES), this figure illustrates the positive correlation between county SES and mammography screening.

Figure 9. Scatter plot of county level BRFSS estimate of mammogram within 2 years (age 40+) vs.
log per capita income (normalized) for years 1997-2003: only counties with BRFSS sample size >= 25

BRFSS estimate of mammogram within 2 year (age 40+) vs. log per capita income (normalized) for years 1997-2003

Figure 10 shows a scatter plot of the county level BRFSS estimate of the percentage of male current smoking (age 18+) for 1997-1999 versus log of median home value. Again, the figure shows only those counties with a reliable estimate obtained from 25 or more responses. The figure shows an approximately linear relationship with Pearson correlation of -0.60. Since median home value increases with county SES, this figure illustrates the negative correlation between county SES and smoking.

Figure 10. Scatter plot of county level BRFSS estimate of current male smoking (age 18+) vs.
log median home value (normalized) for years 1997-2003: only counties with BRFSS sample size >= 25

BRFSS estimate of current male smoking (age 18+) vs. log median home value (normalized) for years 1997-2003

In summary, these two figures suggest that the county level covariates are correlated with the prevalence rates. This fact is useful in the prediction of the prevalence rate for counties without a substantial sample. The model-based approach uses this correlation in the second stage regression of the theoretical prevalence rates on the 26 county covariates.

[Return to top]