Federal Committee on Statistical Methodology
Office of Management and Budget
FCSM Home ^
Methodology Reports ^

 

 

Statistical Policy Working Paper 17 - Survey Coverage


Click  HERE for graphic.

 

 



                MEMBERS OF THE FEDERAL COMMITTEE ON

                      STATISTICAL METHODOLOGY


(April 1990)   Maria E. Gonzalez (Chair) office of Management and Budget     Yvonne M. Bishop Daniel Kasprzyk Energy Information Bureau of the Census Administration   Daniel Melnick Warren L. Buckler National Science Foundation Social Security Administration Robert P. Parker Charles E. Caudill Bureau of Economic Analysis National Agricultural Statistical Service David A. Pierce Federal Reserve Board John E. Cremeans Office of Business Analysis Thomas J. Plewes Bureau of Labor Statistics Zahava D. Doering Smithsonian Institution Wesley L. Schaible Bureau of Labor Statistics Joseph K. Garrett Bureau of the Census Fritz J. Scheuren Internal Revenue service Robert M. Groves Bureau of the Census Monroe G. Sirken National Center for Health C. Terry Ireland Statistics National Computer Security Center Robert D. Tortora Bureau of the Census Charles D. Jones Bureau of the Census           PREFACE     The Federal Committee on Statistical Methodology was organized by the Office of Management and Budget (OMB) in 1975 to investigate methodological issues in Federal statistics. Members of the committee, selected by OMB on the basis of their individual expertise and interest in statistical methods, serve in their personal capacity rather than as agency representatives. The committee conducts its work through subcommittees that are organized to study particular issues and that are open to any Federal employee who wishes to participate in the studies. working papers are prepared by the subcommittee members and reflect only their individual and collective ideas.   The Subcommittee on Survey Coverage studied the survey errors that can seriously bias sample survey data because of undercoverage of certain subpopulations or because of overcoverage of other subpopulations. The purpose of this report is to heighten the awareness of survey planners and data users regarding the existence and effects of coverage error, and to provide survey researchers with information to evaluate the trade-offs between coverage error and survey costs. The report profiles selected methods for controlling and measuring the effects of coverage errors using examples from Federal sampling frames and surveys. The report includes seven case studies based on Federal surveys that illustrate selected aspects of coverage errors.   The Subcommittee on Survey Coverage was cochaired by Cathryn S. Dippo of the Bureau of Labor Statistics, Department of Labor, and Gary M. Shapiro of the Bureau of the Census, Department of Commerce.           MEMBERS OF THE SUBCOMMITTEE ON   SURVEY COVERAGE     Cathryn S. Dippo (Co-chair) Bureau of Labor Statistics (Labor)   Gary M. Shapiro (Co-chair) Bureau of the Census (Commerce)   Raymond R. Bosecker National Agricultural Statistics Service (Agriculture)   Vicki Huggins Bureau of the Census (Commerce)   Roy Kass Energy Information Administration (Energy)   Gary L. Kusch Bureau of the Census (Commerce)   Melanie Martindale Defense Manpower Data Center (Defense)   D.E.B. Potter Agency for Health Care Policy and Research (Health and Human Services)           ACKNOWLEDGMENTS   This report is the result of the collective work and many meetings of the Subcommittee on Survey Coverage. All of the subcommittee members made significant contributions to the text of the report, taking responsibility for various sections of the report during the long period of preparation.   All of the members of the Federal Committee on Statistical Methodology reviewed several drafts and made many important suggestions. The subcommittee wishes to recognize in particular the valuable contributions made by the following committee members: Yvonne Bishop, Joseph Garrett, Charles Jones, Daniel Kasprzyk, Fritz Scheuren Monroe Sirken, and Robert Tortora. The subcommittee also benefitted significantly from an outside review of the final draft by Steven Heeringa and Benjamin Tepping.   The subcommittee also thanks the following persons: John Paletta and Richard Pratt for preparing the Current Population Survey and Producer Price Index case studies, respectively; Robert Casady and Charles Cowan for contributing to the section on sample design strategies; and Rosalie Epstein of the Bureau of Labor Statistics for editing the report.           TABLE OF CONTENTS   Page   LIST OF TABLES. . . . . . . . . . . . . . . . . . . . . . . . . vii   LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . .viii   EXECUTIVE SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . 1   CHAPTER 1. Coverage errors occurring before sample selection. 3   1.1 Conceptual or relevance error . . . . . . . . . . . . . 4   1.2 Frame construction and maintenance. . . . . . . . . . . 8   1.2.1. Classification of frame errors. . . . . . . .13   Missing elements; clusters of elements appearing on list; blanks or foreign elements; duplicate elements; incorrect auxiliary information   1.2.2. Frame maintenance . . . . . . . . . . . . . .15   New frame elements; inactive frame elements; misclassified elements; out-of-scope elements; split-out or combined frame elements   1.2.3. Match-merging of independent source lists . .21   1.3. Sample design strategies to minimize coverage error . .22   1.3.1. Defining target population to equal frame population. . . . . . . . . . . . . . . . . .23   1.3.2. Random-digit dialing sampling . . . . . . . .23   1.3.3. Multiple frame sampling . . . . . . . . . . .24   1.3.4. Sampling rare populations . . . . . . . . . .25   1.3.5. Estimation procedures . . . . . . . . . . . .27   1.4. Evaluation methods. . . . . . . . . . . . . . . . . . .28   1.4.1. Macro-level analysis. . . . . . . . . . . . .28   1.4.2. Micro-level analysis. . . . . . . . . . . . .29     CHAPTER 2. Coverage errors occurring after initial sample selection. . . . . . . . . . . . . . . . . . . . .31   2.1. Incorrect association of frame with reporting unit(s) .31   2.1.1. Location errors . . . . . . . . . . . . . . .31   2.1.2. Classification errors . . . . . . . . . . . .33           2.1.3. Temporal errors . . . . . . . . . . . . . . .36   2.2. Listing errors. . . . . . . . . . . . . . . . . . . . .38   2.2.1. Area segment listing errors . . . . . . . . .39   Studies measuring error; an alternative to area listing   2.2.2. Household listing errors. . . . . . . . . . .43   Motivational causes; lack of correspondence between survey designer's and respondent's residency concepts; effect of household listing errors; methods for reducing household listing errors   2.2.3. Nonhousehold listing errors . . . . . . . . .47   2.3. Other nonsampling errors. . . . . . . . . . . . . . . .47   2.3.1. Recording errors. . . . . . . . . . . . . . .47   2.3.2. Responses from nonsampled units . . . . . . .49   2.3.3. Coverage errors resulting from nonresponse. .50     CONCLUSION. . . . . . . . . . . . . . . . . . . . . . . . . . . .53   APPENDIX A. CASE STUDIES   Introduction . . . . . . . . . . . . . . . . . . . . . . . .55   A.1. Annual Survey of Manufactures (ASM) . . . . . . . . . .56   A.2. National Long-term Care Survey (NLTCS). . . . . . . . .61   A.3. National Master Facility Inventory (NMFI) . . . . . . .65   A.4. Producer Price Index (PPI). . . . . . . . . . . . . . .71   A.5. Quarterly Agricultural Surveys (QAS). . . . . . . . . .77   A.6. Monthly Report of Industrial Natural Gas Deliveries . .83   A.7. Current Population Survey (CPS) . . . . . . . . . . . .89   APPENDIX B. GLOSSARY OF ACRONYMS . . . . . . . . . . . . . . .96   APPENDIX C. GLOSSARY OF TERMS. . . . . . . . . . . . . . . . .97   REFERENCES. . . . . . . . . . . . . . . . . . . . . . . . . . . 106           LIST OF TABLES   Number Title Page   1. Selected sampling frames used for Federal surveys. . . . . .10 2. Scope of frame versus population of interest for selected surveys. . . . . . . . . . . . . . . . . . . . . . . . . . .33 3. Reinterview classification of units originally classified as noninterview: October 1966 . . . . . . . . . . . . . . . . .34 4. Reinterview classification of units originally classified as noninterview: April to September 1966. . . . . . . . . . . .35 5. Reinterview classification of units originally classified as noninterview: 1987 35 6. Type B rates for the Survey of Income and Program Participation and the Current Population Survey, 1985-87 (percent). . . . . . . . . . . . . . . . . . . . . . . . . .35 7. Selected surveys in which the frame sampling unit and the final sampling unit are the same . . . . . . . . . . . . . .38 8. Selected surveys in which the frame sampling unit and the final sampling unit differ . . . . . . . . . . . . . . . . .38 9. Examples of surveys requiring field listing 39 10. Comparison of A.C. Nielsen 1982 field canvass of housing units with 1980 census housing unit counts by block group or enumeration district (National Nielsen Television Index Survey segments only) . . . . . . . . . . . . . . . . . . . . . . .40 11. Number of listing errors found in Labor Force Survey study (Statistics Canada). . . . . . . . . . . . . . . . . . . . .41 12. Reasons units were added and deleted during reinterview, as determined by reconciliation--area segments only: October 1966 . . . . . . . . . . . . . . . . . . . . . . . . . . . .41 13. Estimates of percent net CPS within-household undercoverage relative to the 1980 census for males aged 25 and over by their household status (standard errors in parentheses). . .45 14. 1986 average coverage ratios by age, sex, and race for CPS .92 15. 1986 average coverage ratios for Hispanics by age and sex for CPS. . . . . . . . . . . . . . . . . . . . . . . . . . . . .92           LIST OF FIGURES Number Title Page   1. Typical physical flow of natural gas from gas well to industrial customer (custody relationship) . . . . . . . . .84 2. Possible financial flows (ownership) from gas well to industrial customer (equity relationship). . . . . . . . . .84 3. Industrial gas estimates from Form EIA-857 submissions: Total United States. . . . . . . . . . . . . . . . . . . . . . . .87   viii           EXECUTIVE SUMMARY   Coverage errors can cause serious biases in estimates based upon sample survey data. Undercoverage may be substantial in many surveys, especially of selected subpopulations. For example, the estimated undercoverage of Hispanic males aged 14 and over is 23 percent in the Current Population Survey (see appendix A.7). In economic surveys, new businesses may be missed at a higher rate than older ones. If the characteristics of the missed portion of the population are very different from those of the covered portion, serious biases in the survey estimates for the total population will result.   The purpose of this report is to heighten the awareness of survey program planners and data users concerning the existence and effects of coverage error and to provide survey researchers with information and guidance on how to assess and improve coverage in sample surveys. The report outlines the possible sources and effects of coverage error by documenting current knowledge of coverage errors in Federal surveys. It also profiles selected methods for controlling, measuring, determining the effects of, and reducing coverage errors using examples from Federal surveys and sampling frames.   This report utilizes a broad definition of coverage error. Some authors have included only errors associated with the sampling frame. Here, however, coverage error is defined to include all possible sources of error which are not classified as observational or content errors (U.S. Department of Commerce 1978b). For example, errors or mistakes leading to noncoverage of target population units (undercoverage), errors or mistakes leading to the inclusion of units which are not members of the target population (overcoverage), and failure to elicit a response for a sampled population unit (nonresponse) are included.   The report narrative is structured to follow the sequential procedures typically used in a survey. Other approaches, including one based upon a typology of sampling units (housing units, persons, and establishments), were considered but discarded because of the complexity of many surveys. (An excellent discussion of the coverage errors in housing unit surveys can be found in United Nations (1982).) The survey process has been divided, for the purpose of this report, into two components. Chapter 1 discusses coverage errors which might occur before the first stage of sampling. Issues associated with the creation and maintenance of sampling frames and the choice of sampling frame and strategy are included. Chapter 2 discusses coverage errors which might occur after the first-stage sampling units are selected. Coverage errors associated with field listing, screening, subsequent sampling operations, interviewing, and processing are presented, along with overcoverage due to volunteer respondents. Nonresponse as an important source of coverage error and bias, particularly in housing unit surveys and mail surveys of establishments, is also discussed.   Each chapter includes a detailed discussion of the circumstances leading to coverage errors. A discussion is also provided regarding the seriousness of the errors, their effects on survey estimates, and methods for controlling, measuring, and improving survey coverage. Numerous studies, which have been conducted to measure either overall frame coverage or the effects on coverage of selected data collection procedures, are cited throughout the report. One large single source of coverage error identified in this report is within-household listing of persons. In general, coverage error is a more significant problem in housing unit surveys than in establishment surveys.   Throughout the report, examples are used to illustrate, not to encompass, the diversity of knowledge and experience derived from surveys conducted by Federal agencies. Although the examples in the text are necessarily brief, a more detailed examination of selected coverage issues is provided in appendix A, which presents exemplary material from the following surveys: Annual Survey of Manufactures, National Long-term Care Survey, National Master Facility           Inventory, Producer Price Index, Quarterly Agricultural Surveys, Monthly Report of Industrial Natural Gas Deliveries, and Current Population Survey. Readers are encouraged to compare their current knowledge and practices concerning coverage with those of other Federal agencies as represented by the examples in the report. To assist the reader, glossaries of acronyms and terms are included at the end of this report as appendix B and C.   2           CHAPTER 1 COVERAGE ERRORS OCCURRING BEFORE SAMPLE SELECTION   This chapter's goal is to provide a comprehensive set of evaluative tools which will enable users to identify and minimize potential coverage problems associated with a survey research program or specific research project, to assess the strengths and weaknesses of alternative research methodologies as these relate to potential survey coverage error, and to identify overly ambitious research projects and recast them into an achievable framework.   Four major subjects are discussed: Conceptual or relevance "error, frame construction and maintenance, sample design strategies to minimize coverage error, and coverage evaluation methods. The chapter delineates the thinking, planning, and assessing processes which should precede and inform a complete survey design with its associated sampling plan.   The first section of the chapter contains a discussion on the importance of thinking carefully about prospective research and the necessity of using clear and concise language in the statement defining the research project or program. Attention to correct and clear thinking about, and specification of, research goals, concepts, and targeted population(s), an often neglected or abbreviated phase of research planning, helps to avoid or minimize many coverage problems at the outset.   Types of frame errors, standards for selecting or building a high- quality frame, and the many complex issues associated with correct and thorough frame maintenance, including match-merging independent source lists for updating and correcting frames, are discussed in the section on frame construction and maintenance. Not only are the major and minor problems arising from the failure to maintain frames illustrated with many examples, but evaluative criteria by which potential users of already existing frames may assess the appropriateness and adaptability of these frames for their own surveys are provided. The goal of this section is to provide the tools by which to identify appropriate existing frames, to assess those frames, and to determine when either supplemental or additional frames may be needed, or when a totally new frame must be built.   The third section presents sample design strategies that can minimize coverage errors associated with specific frame weaknesses. Moreover, design strategies for sampling rare populations for which existing frames are incomplete or inefficient are discussed. The section closes with a discussion of estimation procedures which compensate for known coverage error in the sampling frame(s).   Both macro- and micro-level analysis, as methods for measuring frame coverage, are discussed in the last section of the chapter. The degree of coverage error is measured routinely in many Federal establishment surveys. For example, reconciliations are made at the Bureau of the Census between economic census totals and corresponding"totals in the Current Industrial Reports annual survey for census years to measure and improve coverage. Similarly, the National Agricultural Statistics Service (NASS) conducts a continuous survey program for the agriculture sector and compares inventory and production estimates with those obtained in the Census of Agriculture. Administrative data are also used to measure coverage of establishment surveys. For example, the Bureau of Labor Statistics makes annual comparisons between the employment reported to State unemployment insurance systems and establishment employment estimates from the monthly Current Employment Survey.   For the housing unit surveys conducted by the Bureau of the Census, a demographic approach is used to estimate the degree of coverage error. This approach is similar to what is termed demographic analysis, where the coverage of the decennial census rather than of survey data is   3           analyzed using other sources. However, using census data as a benchmark for survey coverage must be done cautiously, since the coverage error detected supplements the coverage error that already exists in the census results.   1.1. Conceptual or relevance error   Coverage errors can be caused by incorrect specifications of the concepts to be measured or the populations) to be targeted by the survey. Incorrect specifications often result from conceptual errors. Some of these are hasty or incomplete thinking concerning the goals of the research, faulty reasoning or incorrect assumptions about the measurable characteristics of targeted groups, and failure to recognize existing information as relevant (or irrelevant) to the specifications being written. Incorrect specifications can sometimes be spotted by their vague, nonspecific, or ambiguous language. These faulty specifications, in turn, can lead to the construction of incomplete, inadequate, or otherwise flawed frames.   Hansen, Hurwitz, and Pritzker (1967) present the general concept of the mean-square, error of a survey estimate which includes a term for the "relevance of the survey specifications as related to the requirements." This is the squared difference between a statistic which constitutes the ideal goals of the statistical survey and a statistic based upon the specifications actually set for the survey if carried out precisely according to defined goals. Using vague or ambiguous language in terms of the ideal goals can lead to greater relevance error because this language can increase any difference between ideal goals and actual specifications. Thus, relevance error is a type of coverage error, since failure to specify correctly the concepts to be measured can lead to the construction of a flawed frame.   To ensure a more useful and complete frame, a clear, precise statement of the research question(s) and population(s) of interest needs to be written down, with careful attention to the exact language used. This is particularly important when the ideal goals are proposed by nontechnical sponsors or appear in enabling legislation. It may even be useful to write down what is = being researched and who is = being targeted, especially if exclusions may be of interest to some client or to researchers generally. Taking the time to think through the possible meanings of key terms and variables and, if needed, to determine whether and how the population(s) and concepts have been defined and researched by others can reduce duplication of effort, reveal previous conceptual errors, and highlight potential frame construction problems. Even in a recurring survey, a review of the concepts and definitions can be very useful. For example, this effort may reveal changes in how the target population(s) and concepts are being defined by other researchers.   Sometimes, it may even be necessary to devise new terminology or revise definitions rather than to perpetuate the use of terms which now seem too general or otherwise objectionable. For instance, the "black" population of the United States has not always been called "black," and it may soon be preferable to use "African American." A review of shifts over time in race and ethnic concepts used in Federal research reveals various intersecting but nonidentical definitions for the black population, such as "nonwhite" and "colored." Such examples show how the use of vague terms makes it difficult to know who or what has been studied and how, over time, changes in terminology have generally been made to increase detail or specificity (and, thus, measurement accuracy),.even at some cost in data continuity.   The language used to denote concepts, key variables, and the like should aim for concreteness and clarity. General equivalents of concepts (such as "education" to stand for "years of school completed") or of populations (such as "children" to stand for "persons aged 17 and under") should be avoided. Although very difficult, making the attempt to write specifications using I, standard scientific terminology" rather than the "language of everyday life" wherever possible should help one to avoid vagueness in defining populations and concepts. Vague definitions of   4           populations and concepts tend to create coverage errors because they lead to inappropriate unit inclusions on, or exclusions from, a frame and even to naming a population which cannot be adequately represented on a frame.   A good rule to follow in examining the initial formulation of a problem is to ask a series of questions:   - To what population(s) of units does this problem refer?   Distinguish among populations from which information is sought, those which will be frame units, and those which may be reporting units, if different from the frame units. For example, suppose one wished to do research on "the scholastic achievement (as measured by grades) of children of recent immigrants." In this case, "children of recent immigrants," more suitably specified perhaps as "persons aged roughly 5 to 17 enrolled in Grades 1 through 12 of the U.S. public schools and living in a household in which at least one related head has been resident in the United States 5 or fewer years," would be the population about which information is sought However, it seems likely that one might need to construct two or more frames in order to reach this population. One of the frames might have U.S. public schools as units, while another might consist of residential addresses to be screened. In this example, reporting units might well consist of two groups, school record keepers and parents or guardians.   - Is (are) this (these) populations) observable or potentially measurable,? How?.   Continuing from the example above, one can see that the suggested specification of "children of recent immigrants" takes account of some of the presumably unobservable "children of recent immigrants," such as those who may be homeless and those who may not be currently enrolled in school. Among recent immigrants, those who entered the country illegally may not be observable, as well as those who died following entry, leaving schoolage dependents. Sources for obtaining U.S. public school and residential addresses might be lists from various agencies. Thinking through all possible categories of the populations of interest should reveal those subsets which cannot be measured or reached; those whose measurements (observation) might be achieved; and those which seem reachable with some existing or proposed methodology. Thus, the "children" may be reached by means of a housing unit survey, school survey, and/or institutional survey (hospitals, orphanages).   - Are there one or more subsets of this (these) population(s) which cannot be measured/observed in some way? What are these? Would they ever be measurable?   Continuing the example of "children of recent immigrants," some possibly unobservable components of the populations have already been mentioned. The potentially measurable components might be those which cannot be reached now but which might be reached using a methodology that may be prohibitively expensive, such as scanning all death certificates or other sources of information to identify deceased recent immigrants. Thus, it may be useful to distinguish the inherently unobservable from the practically unobservable components of populations of interest   - Does time enter into the answer to one or more of the questions above, in the sense that the measurable population(s) may change or may have changed?   Continuing the example of "children of recent immigrants," one may find that a change in a legal boundary or definition can turn "internal migrants" to "recent immigrants" or vice versa. This would happen, for example, if Puerto Rico became a U.S. State, thus solving the problem of how technically to classify migrants to the mainland, who would become   5           "internal migrants." Such a change might force a redefinition of the size and location of the populations of interest.   - Have previous efforts been made to build a frame of this (these) population(s)? What problems were encountered in frame construction? Was one of these faulty conceptualization? Which of these problems has been solved?   This series of questions focuses on the need to locate previous research, to attempt to contact those who designed and conducted the research, or to obtain procedural histories about it and to evaluate carefully the definitions and language used by others. An assessment of previous research often reveals use of frames built for other purposes by still earlier researchers, especially when the frames are very expensive to assemble. Information needed for adequate frames may now be available (such as improved school lists) due either to improvements in information processing or to changes in laws regarding availability of administrative data.   Answering this list of questions has several important goals. The first is to decrease the slippage between the conceptual population(s) of interest and the actual units to be included on frames. The second goal is to facilitate the correct use of language, so that what can and cannot be researched is clearly understood. The third goal is to facilitate the specification of comprehensive and correct rules for frame maintenance. The fourth important goal is to help insure that the population(s) and concept(s) of interest will be defined and measurable, so that one can answer the research question(s) of interest with the greatest accuracy and completeness possible, with minimum coverage error.   In beginning to answer the questions given above (and there may be other useful ones to ask), background preparation might include a brief review of some of the literature on conceptualization. While extended treatments of this topic abound in the philosophy of science literature, statisticians such as Deming (1961) and applied researchers like Blalock (1968) have discussed the importance of conceptualization in more accessible and measurement-oriented works. Deming and others have discussed at length the "true value" of any variable or concept we attempt to measure, noting that there is no inherent "true value" and that the entity that we call the "true value" is a unique outcome of the concepts, assumptions, definitions, and procedures we use to arrive at it. With somewhat more focus on language, Blalock emphasizes the distinction between "concepts by intuition" and "concepts by postulation." These two kinds of concepts, one kind more or less abstract and the other more or less concrete, are linked for research and measurement purposes by assumptions (for example, the assumption that " education" is adequately reflected in "years of school completed"). It is upon these assumptions that researchers sometimes founder, not least because the language used is "everyday language" (for example, "children," "education," it worker").   The language of everyday life frequently is not suited to scientific inquiry. But when research is formulated solely or substantially in everyday language, the population(s) and concepts of interest will be named by this language as well. It is up to the researcher to guard against such usage and to clarify, specify, and define key terms, concepts, and populations, so that adequate frames can be constructed and coverage errors minimized. To build on an earlier example, a research organization might begin preparations for a study which will answer the question of "how children of recent immigrants are faring in school." Such a description of the research question may suffice for press coverage or to quickly summarize the general thrust of the effort for family and friends, but it reveals very little else. What is a measurable child? What does "recent" mean? Who is an immigrant? What is the process of "faring in school?" Is the intention to study a process, an outcome, or a set of outcomes? What is a "school?" Not only does the vague language used tell very little, it actually militates against thinking in clear ways. For example, once the word "school" is used, we probably unwittingly think of our own unique   6           individual "school" experiences, thus creating a tendency to omit by assumption other possibilities for defining "school."   This is not to say that one cannot use ordinary words; indeed, there is no method by which anyone can transcend all the limitations, contradictions, rules, and assumptions which are language itself. It means that there is a distinction to be made between the use of any word for purposes such as casual conversation and for research purposes. The difference usually lies in the modifiers and extended definitions containing the specificity and detail required for classification and measurement. In regard to frame construction specifically, concepts and population(s) of interest should be defined in such a way as to be observable and measurable in regard to the research question(s). (See appendix A.2 for an example of a carefully-defined target population.)   Sometimes it is possible to work with potential survey sponsors in order to gain assistance in formulating the original research questions). Such discussions often reveal incomplete thinking and allow the researcher to eliminate undoable or excessively costly projects or to modify those involving major frame construction problems. Even after research agendas are set, as with enabling legislation, a meeting with sponsors will reveal intent and can save time and trouble later on. For example, recently proposed legislation called for the establishment of a Consumer Price Index (CPI) for the "elderly," where "elderly" was defined as "all persons 62 and over." In the initial proposal, the definition was "all persons 62 and over and retired." In both definitions, the targeted units were persons, whereas the usual units for interview are .,consumer units." Not only do the different definitions imply potentially different sampling frames (and thus different cost levels) but also different procedures for constructing a CPI. Given these problems and others, had this legislation been enacted, it would have been necessary to determine the intent of Congress and establish a working definition for constructing an "elderly" index, so that the resulting research would have provided the information desired.   Finally, in thinking through any research agenda, attention should be paid to exclusions. Some frame (and research) exclusions are recognized and noted in many areas of research. One example might be something like: "This research focuses on immigrants who entered the United States within the last 5 years as identifiable in census data. It does not cover persons missed by the census. Some illegal entrants are included, but not identifiable, in the census data."   However, many exclusions are not noted, partly because researchers do not specify their work precisely, but often because the exclusions (or their existence in the real world) do not occur, or are unknown, to them. In addition, there is some lag between the emergence of new social phenomena and their explicit recognition.   In order to identify exclusions, it is often necessary to examine hidden assumptions and biases about the world. Reexamining topics from the perspectives of several disciplines and actually going out into the field might well be part of this process. As the result of these kinds of efforts and in various other complex ways, new concepts and populations for research do emerge. An example of this is the "hidden" economy.   Despite evidence that some forms of economic activity were not being included in the national accounts or were not the focus of serious research, "official" statistics and economic researchers in general failed until fairly recently to acknowledge such things as bartering, illegal activities of various kinds, and economic phenomena that were associated with other kinds of economic systems. Once the "hidden" economy or some subset or version of it was actually named, then vaguely described, it became easier for people to begin to think in new ways about the workings of the economic world. Once this kind of "breakthrough" occurred, it became easier to "see" exclusions. Today much more work has been done to identify, define, and attempt to research facets of an economic world which was largely ignored by the statistical establishment in the   7           past. However, for the "hidden" economy, as for any newly emergent topic, this process is by no means complete because inherent (and predictable) problems surrounding the interface between conceptualization and measurement have not yet been resolved.   One example of research intended to address the lack of terminological specificity in work on the "hidden" economy is McDonald's (1984) examination of the charge that Bureau of Labor Statistics employment, price, and productivity indexes are significantly affected by unreported economic activity. McDonald asserts: "Establishing the existence of a subterranean economy ... does not necessarily prove that government statistics are invalid. To determine whether a particular government statistic is affected also requires careful consideration of the way data are gathered ... and the relation between economic activities that may be covered by the survey and those that are not.... many of the critics of government statistics have simply not taken this necessary step" (p. 4). After discussing the most narrow through the most broad ways in which the underground economy had been defined in the literature to date, McDonald examined the extent to which evidence on the underground economy under any of these definitions implied mismeasurement of concepts measured by BLS data series and found that the critics had not proved their case. The importance of this work lies in its attempt to delimit explicitly several crucial interfaces between conceptualization and measurement pertaining to a researchable subject whose accumulated literature exhibited a notable lack of conceptual rigor. McDonald not only provided a solid point of departure for further conceptual and quantitative work on the "hidden" economy, but also pointed out what he had = covered in his assessment.   Of course, it hardly needs mentioning that it takes some creative thinking and observational acuity to "see" and to figure out how to name various forms of formerly unsuspected or illegal economic activity, let alone measure the monetary influence of these activities. Despite this, published economic research should not fail to mention any exclusions of already recognized "hidden" economic activity, where appropriate to the topic, and to state something about the potential effect of such exclusions on the findings at hand, regardless of whether this effect is minimal, large, or unknown. Since "hidden" economic activity has been the subject of a great deal of attention in recent years, even a statement that one or another form of it is irrelevant to the research being reported will reflect a prudent and thoughtful research approach and may prevent certain predictable criticisms.   More generally, a discussion of exclusions should be included in all published research as a matter of course. It should no longer be acceptable to omit mention of subpopulations which cannot be included on a frame. Excluding them from mention might well insure that no future attention will be accorded them and could give the false impression that existing frames are adequate or that new frames may not be needed. Put simply, mentioning exclusions points the way to future research and places the reported research in the correctly limited context. As a start, it is essential that statistical studies begin with a more extensive interaction between subject matter experts and research methodologists. The gains can be large and may well enable researchers to avoid many of the other coverage problems discussed in this report.     1.2. Frame construction and maintenance   Once a decision is made concerning the target population, either the sample design must be based upon an available sampling frame(s) or the frame(s) must be constructed specifically for the study. Dalenius (1995) notes the following three important properties of a frame:   - Makes it possible to compute estimates concerning a population which is sufficiently "close" to the target population,   - Serves to yield a sample of elements which can be unambiguously identified, and   - Makes it possible to determine how the units in the frame are associated with the elements in the (sampled) population.   8           The first stage of sampling is usually dependent upon a frame consisting of a physical listing of units. This may be a list of names of individuals, establishments, institutions, counties, cities, streets, etc., or a list of numbers attached to city blocks, land area segments, houses, pages, or any number of other unique, definable entities. However, as Kish (1965, p. 53) notes, a "Frame is a more general concept: it includes physical lists and also procedures that can account for all the sampling units without the physical effort of actually listing them." Deming (1960) cites one exception to the making of a list of sampling units, i.e., when a watch is used to sample time intervals during which customers leaving a store are interviewed.   The units listed in the initial frame may not correspond to the units about or from which information is sought. Often, additional frames are needed for successive stages of sampling in order to progress from available sampling units to the units to be contacted or measured. For example, areas may be selected from a listing or area of all blocks in an area frame. Housing units inside sampled areas may then be listed and sampled in order to achieve a listing of persons to be sampled that are members of the target population from which information is sought.   A more complex example is the procedure for selecting items to be priced in the Consumer Price Index. The sample of priced items is selected from items sold by a sample of outlets which, in turn, was selected from a list of outlets created from information provided by interviews with consumer units in addresses sampled from the decennial census, new construction permits, and area listings. In this case, interviews are conducted in a sample of housing units to create a sample frame of establishments, not a population frame, from which a sample is selected. Within the sampled outlets, probability methods are used to select increasingly more detailed classes of goods until a particular item is selected. A complete list of all the items available for sale is never constructed.   A variety of sampling frames utilized by agencies of the Federal Government is presented in table 1. Associated information related to construction and survey use of the frames is included.   In practice, with the exception of area frames consisting of land segments, the target population a sampling frame purports to represent is constantly changing. For a one-time survey, when it is desirable to obtain data for a specific point in time, this fluctuation is not usually critical, assuming the frame represents the near truth relative to the time of interest. It becomes more critical for ongoing surveys. While for such surveys, panel maintenance rules are inevitably applied so that the frames remain representative of the changing target populations, these rules are often difficult to apply comprehensively because of funding- limitations and/or methodological complexities (see appendix A.1 and A.4). The result is that, over time, any panel may no longer be representative of the target population that is to be measured. Thus, resampling from a current frame usually occurs regularly for such surveys. A current frame is usually available because a procedure for updating the frame is formulated during the panel survey's design process and so is in place at the time the first sample is selected. This updating assures that the frame remains representative of the target population over time. For example, a universe file has been established for the Producer Price Index survey. The primary purpose of this file is to provide up-to-date establishment information including name, location, industry classification, employment, and other pertinent items. These data are obtained via telephone interviews during a frame refinement process or by personal visits to collect data only from sampled units (see appendix A.4, section HI).   Not all frame maintenance procedures address problems of coverage. Some, such as the removal of inactive units, are geared toward sample efficiency. Still, neglecting such procedures can affect coverage. For example, deaths on a frame may be sampled but are not likely to respond. If they are treated as active units and data are imputed for them, bias is introduced. Therefore, it is proper to consider frame maintenance methods in more detail (see section 1.2.2). Before   9 doing so, however, it is useful to note some additional distinguishing features of sampling 0 frames.   Not all sampling frames are maintained over time, even those for ongoing surveys. In fact, the frames created for many sampling operations are discarded once samples are selected and approved. The sample that is representative of the frame at the time it is selected does not remain representative of the population of interest over time and neither does the frame from which it was drawn, if not maintained. When and if a new sample is selected, it is first necessary to construct a new frame that represents the current target population. An example is the use of the Census of Manufactures as a frame for the selection of the Annual Survey of Manufactures sample. The Census of Manufactures represents the manufacturing establishment population at a point in time and thus is not subject to change until the next Census of Manufactures, in 5 years. It serves as the primary, but not the exclusive, frame source for the Annual Survey of Manufactures, and is itself a derivative of the Standard Statistical Establishment List (SSEL). Once the sample for the Annual Survey of Manufactures is selected, it undergoes coverage updating each year, but no updating to the census frame can be done until the next Census of Manufactures. When the next census is completed, it will serve as the new frame for the next sample selection. The new census, while conceptually an update of the old census, is in fact developed from the latest version of the SSEL, which itself made use of the prior census results (see table 1).   For other sampling operations, the frames are evolutionary; that is to say, they are not fixed nor are they instantaneous creations. Instead, they evolve from periodic updates to a previous version of the frame. Each sample is taken to represent the target population at the reference time; however, the frame is maintained and updated to reflect the continuity of changes in the population it covers. In this context, frame maintenance is part of an iterative procedure, with results of a given survey contributing to changes in the frame from which subsequent cycles of samples are drawn. Two examples of this type of frame are the unemployment insurance (UI) file maintained by the Bureau of Labor Statistics (BLS) and the SSEL file maintained by the Bureau of the Census. Both files maintain a current profile (ownership, mailing address, Standard Industrial Classification (SIC) code, physical location, etc.) of economic entities in the United States. The UI file is updated quarterly using employer reports to the UI system. The UI file is supplemented with quarterly data for multiple reporting units, and the SIC and county codes are verified on a rotational basis, one- third of the establishment population each year. The SSEL is updated on a continuous basis using a variety of sources, including administrative records from the Internal Revenue Service (IRS) and Social Security Administration (SSA), the economic censuses conducted every 5 years, the Company Organization Survey conducted annually (except during a census year for most multiunit companies), and the many current economic surveys conducted by the Bureau of the Census. The UI file serves as a sampling frame for most BLS establishment surveys (see appendix A.4), while the SSEL is the underlying frame for most Bureau of the Census"economic censuses and surveys (see appendix A.1).   Other examples of evolutionary frames are two frames maintained by the Energy Information Administration (EIA). The Oil and Gas Well Operator List is used as the frame for the Annual Survey of Crude Oil, Natural Gas, and Natural Gas Liquids. A list of fl= selling petroleum products is used as the frame for two surveys: The Annual Fuel Oil and Kerosene Sales Report and the Monthly Petroleum Products Price Report. Information to update these frames comes, in part, from responses given by operators and firms on their survey submissions. Information from several other sources, including the triennial Petroleum Product Sales Identification Survey, is also used in adding, deleting, or modifying entries on the appropriate frame.   One other point is worth noting. Files whose primary purpose may be to serve as a sampling frame may serve other functions as well. For example, UI data collected by the States is used primarily to administer the Federal-State UI system. Additionally, the file provides a base from   10 which to estimate the wage and salary component of national personal income and the gross national product. In addition to being a sampling frame, the SSEL serves many other purposes. It must fulfill the needs of many different survey programs with many different requirements. Because of this diversity, the amount of information included is limited, so the SSEL is not always used as the direct establishment frame source for sampling operations at the Bureau of the Census. For example, the Current Industrial Reports surveys are selected from a frame created from the Census of Manufactures. These surveys are commodity surveys, and, for the most part, the population of interest is all producers of the particular commodities covered by the survey. Primary producers of these commodities can be identified on the SSEL, i.e., those establishments classified in industries which include those commodities, but the SSEL does not contain information on quantity or value of those commodities. More importantly, secondary producers cannot be identified, e.g., a steel plant which also happens to produce leather shoes would not be identified as in the scope of a survey to estimate shoe production. For some surveys, the contribution of secondary producers could be significant. The Census of Manufactures, on the other hand, contains product data which allow all known producers to be identified, and for this reason sampling frames are created directly from it. The underlying basis for this census, of course, is the SSEL.   The remainder of section 1.2 contains a discussion of coverage errors associated with the creation and maintenance of physical lists as sampling frames like those included in table 1. Section 1.2.1 gives a classification of frame errors as put forward by Kish (1965) and modified by Lessler (1980). The problems of maintaining or updating a sampling frame to reflect changes in the covered population over time are addressed in section 1.2.2. The concerns and procedures discussed are also relevant to the creation of a physical list of population elements which is to serve as a sampling frame. Section 1.2.3 treats the special case of frame updating or creation by means of matching and merging multiple lists to create a single more current or complete frame.     1.2.1. Classification of frame errors   Kish (1965) states a "frame is perfect if every element appears on the list separately, once, only once, and nothing else appears on the list," and classifies possible frame errors into four types: Missing elements, clusters of elements appearing on the list, blanks or foreign elements, and duplicate elements.   In a detailed presentation of errors associated with frames, Lessler (1980) classifies six types of error: The four types that Kish discusses, plus incorrect auxiliary information, and information insufficient to locate target elements. Incorrect auxiliary information can affect the coverage of the frame if the information is used to define subpopulations or subframes. Information insufficient to locate target elements does not reflect a coverage error in the frame, but may result in a coverage error as discussed under rules of association in section 2.1.1.   Missing elements. The omission of units in the target population causes greatest concern. Because units are missing, no examination of any sample from the frame will reveal the nature of the missing component of the population. Research conclusions may be erroneously extended beyond an incomplete frame on the frequently tenuous assumption that missing units are like or very similar to those on the frame. This assumption is to be distinguished from the assumption often made for sample estimation purposes that survey nonrespondents are like respondents. When this assumption about the frame used is not clearly revealed in research reports, the research community receives misinformation, as mentioned in section 1.1.   Missing units are most commonly the result of the following situations: Absence from sources used for frame construction, failure to report to an administrative system, births (new to relevant population), and zero units by definition. All of these circumstances might contribute to a conclusion that missing units are not like others which are included in the frame. Because it may   11           be extremely expensive to attempt to obtain complete coverage, an organization may or should show the missing component to be a trivial proportion of the total or institute some form of estimation procedure to account for the missing portion of the population. This is especially true when the missing units are suspected of being unlike the included units.   Examples of list frames considered very nearly complete for survey purposes include the UI file of business establishments, the SSEL, Oil and Gas Well Operators, the Department of Defense Master Gain/Loss File, and the National Master Facility Inventory (NMFI) for health services. (Refer to table 1 for some selected characteristics of these and other frames.) Some organizations such as the Department of Agriculture and the Bureau of the Census maintain area frames that are considered complete, since all areas of land in the United States are contained within the frames. Therefore, all activities occurring within the United States are theoretically reachable through these frames. However, completeness does not imply the frames or the surveys which use these frames are free of coverage errors.     Clusters of elements appearing on list. A frame is ideally composed of individual sampling units with known characteristics which identify or link to reporting units. The initial sampling units may be known to consist of clusters of subunits which can be incorporated into a sampling design. An example would be a listing of single-family dwellings that contains some duplexes. Another example is a list of farm operator names of which the vast majority represent a one-name/one-farm relationship but some represent a one-name/multiple-farm relationship. Jessen (1978) describes four different relationships between what he refers to as frame units and observation units. These various relationships introduce complexity into the survey process. There is a definite possibility for coverage error if field representatives have not been thoroughly trained in the proper procedures for handling clusters of reporting units associated with a single sampling unit.   Blanks or foreign elements. If a frame is created or an existing list modified for a particular onetime survey, elements on the list which are blank or are not members of the population of interest should be removed. However, most Federal surveys are repetitive or ongoing, and many frames are used for more than one survey. Thus, quite often it is appropriate to retain elements on the frame which previously were members of the population of interest for at least one survey. For a discussion of frame creation and maintenance procedures designed to deal with inactive or out-of-scope frame elements, see section 1.2.2.   Duplicate elements. Duplication of units on the frame may result in overcoverage, i.e., some members of the population are represented more than once. Population totals may then be overstated and means could be biased. Moreover, multiple representation of units on a sampling frame leads to sampling inefficiencies. There are, however, survey procedures which may be employed to identify and compensate for frame duplication (e.g., see Gurney and Gonzalez 1972).   Data collection may be complicated in the face of suspected frame duplication by the necessity of obtaining additional information in order to allow matching with the frame to find other frame elements representing the same population unit. For example, a farm name may be present on the list in addition to the farm operator's name. In the case of partnerships, any enterprise may have multiple representation through the names of individual partners. The necessity of obtaining these names and cross-checking against the frame lengthens the interview and complicates the survey process (see section 2.1.1, Location errors). The Producer Price Index Establishment Universe Maintenance System was developed for the Producer Price Index survey as a means of minimizing duplication as well as other sampling frame problems. It captures all changes made during frame refinement and collection feedback (see appendix A.4, sections III-V).   12           Undetected duplication resulting from nonsampling errors made during data collection or frame-check activities may result in a biased survey estimate. The extent of the bias depends upon the amount of duplication for which no adjustment is made and the size of the units involved. For example, a business enterprise may exist in the form of a vertically integrated company having a pyramid structure. Individual units may then maintain their own books on number of workers or value of production and contribute to the next higher unit in the structure. The parent unit may have the relevant data pertaining to the entire organization. The effect of not detecting the relationship among these sampling units depends upon which units happen to be included in the sample and how the structures of their operations compare to those of the remainder of the population.     Incorrect auxiliary information. Great care must be exercised when units are intentionally excluded from a frame because they are not thought to be members of the population of interest. Errors in frame variables, like size, type, class, or location of unit, could cause valid units to be excluded. For example, the SSEL file contains a relatively large number of records for which no industry classification has been assigned. These unclassified units become missing units on the various frames which are derived from the SSEL, since frame eligibility is first determined on the basis of industry classification. A major effort is made prior to each census year to code the unclassified units that have accumulated on the SSEL since the previous census, including, as a last resort, mailing an inquiry to an establishment to obtain a description of its activity. Little is done between census years because of the cost, but the business surveys which use the SSEL attempt, on a sample basis, a yearly classification, since experience has shown that most unclassified units are ultimately coded to their domain.   For a discussion of frame creation and maintenance procedures relevant to misclassified elements, see under section 1.2.2, Misclassified elements.     1.2.2. Frame maintenance   In this section, frame maintenance procedures are discussed with reference to the kinds of coverage error described in the previous section. These procedures can be classified as follows:   - Adding new frame elements or births, - Eliminating or identifying inactive frame elements or deaths, - Correcting misclassified frame elements, - Identifying- existing frame elements no longer in scope, or in scope for the first time, and - Determining whether or not elements have combined with other elements or have split from existing elements (e.g., change in ownership, mergers, and divestitures in an economic setting).   Each of these updating procedures is discussed in turn below. The discussions address the effects on the frame of failure to update. Distinctions are made between updating procedures intended to determine the cur-rent status of existing frame elements, and those intended to identify elements not previously known to exist. In addition, procedures that update the frame as a whole are distinguished from those that may update only a subset of the frame.   New frame elements. When the research population is dynamic, it is important that the frame which represents it be updated to reflect births. Samples drawn from frames which are not updated for births can result in serious biases, especially if simple weighted estimates are to be used (see discussion of missing elements in section 1.2.1).   One effective method for detecting new units is to canvass periodically the existing frame elements. As an example, the larger (50 or more employees) multiunit companies on the SSEL are canvassed on a yearly basis (with the exception of the census year) via the Company   13           Organization Survey. A proportion Of the smaller companies is also canvassed in years other than either the census year or the year following. Companies are queried as to whether or not they have started new operations. However, companies do not always specify whether a newly listed establishment is a new entity (birth) or represents the purchase of an existing plant. If a plant is treated as a birth and sampled when, in fact, it had a chance of selection under another name or code, bias can result. (See appendix A.1 for additional details.)   A second method of identifying new units results from coverage maintenance operations performed for samples selected from the frame. This method, like the first, uses canvassing, but only of the sampled portion of the frame units. As part of the questionnaire administration process in nearly all surveys, inquiries are made about the status of the sampled units and whether any changes in their status have occurred since the last data collection period. Although the inquiries are targeted to sampled units believed not to be births, sometimes incidental information about other units (including births) can be obtained. This is obviously more a random than systematic approach for identifying new units. Inquiries made in the Annual Survey of Manufactures of single-unit companies provide an example of the use of this approach. Each sampled single-unit company is asked whether any additional plants operate at its location or whether the company owns any additional plants or is owned by someone else. The purpose of these inquiries is to determine whether or not the company is a multiunit company. If the single unit does identify other locations, these may well be establishments which are new or which were not previously known to exist.   Establishments are also added to the SSEL through new employer identification numbers (EIN's) received from the Internal Revenue Service. New numbers do not necessarily imply new establishments, however, as existing plants often request new numbers. The SSEL does not distinguish between the two. Duplication on the file of a plant under both a new and an old EIN will soon resolve itself, as the old EIN will eventually show no payroll data and will be dropped. Survey designers need to be able to identify the true births, however, and this requires additional work. In the Annual Survey of Manufactures, for example, classification cards are mailed to all manufacturing-coded establishments given a new EIN in an attempt to determine whether the establishments are births or existing plants. Only a sample of true births is added to the survey panel (see appendix A.1, section U).   Administrative records are also used to add establishments to the UI file. New business establishments are required to file with the State employment security agencies. However, there is a time lag between filing and being added to the UI file. Units added to the UI file are not necessarily births. Mergers, changes in ownership, branch offices, etc., may sometimes be assigned new UI account numbers. In an effort to address this problem, State agencies are trying to, identify units which are legal predecessors and successors within the UI system. In addition, units which do not meet the legal UI requirements, but are still essentially the same economic units, may be identified as predecessor/successor by the States. In the meantime, the Producer Price Index survey annually uses an automated process whereby the new incoming UI file is compared to the universe file. If an establishment fails to match a unit on the universe file, it is added to the universe file with a special code (see appendix A.4, section HI).   The Bureau of Labor Statistics (Grzesiak and Tupek 1987) has conducted several studies of business births in conjunction with its Current Employment Statistics program. The usefulness of the UI file as a sampling frame for new businesses is constrained by the delay between the time a business first hires employees and the time it enters the UI file. A study of all 12,983 UI accounts (the sampling frame for this program) assigned by Florida for 3 months in 1984 found almost 80 percent were new accounts without predecessors. The study focused on determining the length of time between a business's first liability for UI coverage and its entrance on the UI file, which depended on how the State discovered the employer and whether the employer had a predecessor. The median lag-time for all new accounts was found to be 120 days. A study was   14           also conducted in New York to develop a methodology for identifying new businesses using the UI system and to construct new procedures for estimating the employment of new businesses for incorporation into the Current Employment Survey. The median lag-time in New York was also found to be 120 days, with 93 percent of the establishments having fewer than 10 employees.   Record checks with outside sources can also be used to identify birth elements. Generally, these checks do not allow one to distinguish between new establishments and previously missed establishments, but in either event they provide information for updating the file, and reveal elements that had no chance of inclusion in previously selected survey samples. One example of an outside source frequently used in record checks of establishment frames is the trade association list.   Several methods have been used for identifying birth elements to the National Master Facility Inventory. Each method has relied on State agencies"lists of facilities. (See appendix A.3, section II for details.)   The traditional method for including births in housing unit surveys is to update field-generated listings of sampling or reporting units within sampled geographic areas. Initial listings are usually made just prior to the first interviewing period and are subsequently updated through a recanvass to correct errors and to add newly constructed units.   The Bureau of the Census uses this approach in some geographic areas for the housing unit surveys it conducts. However, in most areas, births are included by sampling building permits. (See appendix A.7, section II for details.) Sampling building permits results in a significantly lower sampling variance, since large housing projects can result in very large clusters of sampling units being added during the alternative field-listing update process.   However, the building permit files do not identify illegal new construction, conversions, and new mobile home placements; nor do they identify new special places, such as dormitories, fraternity houses, boarding houses, and public housing. To illustrate, it was estimated for the 1985 Annual Housing Survey that approximately 25 percent of all new mobile homes were missed (Schwanz 1988a). In the Survey of Income and Program Participation, the undercoverage of new mobile homes in the building permit file was estimated to result in a 1 percent underestimate of the number of households in poverty (Singh 1989). In 1976, the undercoverage of births in the building permit frame was estimated to be about 2.3 percent (U.S. Department of Commerce 1978a). Since the Bureau of the Census normally uses this building permit frame for sample augmentation over a 10 year period, e.g., 1975-85, undercoverage may increase substantially over this time span. (For a more complete description of the procedures used by the Bureau of the Census to identify and sample dwelling units created after the last decennial census, see appendix A.7.)   Another methodology for capturing new housing construction and reducing undercoverage in sampled geographic areas is the half-open interval procedure. Instead of listing all units within the sampled area, a string of k units is listed in a predetermined order. The string begins with a designated unit from the original frame and is bounded by the k-th unit that was reported in the original frame (Montie and MacKenzie 1978). A modification of this procedure was used in the 1977 National Health Care Expenditure Survey and the 1980 National Medical Cost Utilization and Expenditure Survey. Cursory analysis indicates the approach may be limited in its ability to capture new construction (Adams 1989).   Inactive frame elements. Efficient sampling dictates that inactive units or deaths on a sampling frame be identified. In the initial construction of a frame, deaths in the various sources used to construct the frame should be identified and, if needed, removed. Existing frames should be updated periodically to remove or flag units that are no longer active. Failure to identify deaths   15           on a sampling frame does not necessarily result in overcoverage, but, as was noted earlier, biased sample estimates can result if an inactive element is sampled and imputed for when no response is obtained.   It may be desirable to retain inactive units on a frame for a certain period of time because of estimation considerations or because it is desirable to have a history of elements available. When doing so, the inactive units should be identified either through flagging or partitioning to a distinct death subfile. In the Producer Price Index survey frame, a death is defined as either a sampled unit which has been identified as out of business or out of scope, or any existing frame unit that remains unmatched when the next UI file is compared to the universe file. All deaths are removed from the universe file and added to the death file (see appendix A.4, sections III and V).   Two methods, both of which were mentioned in connection with identifying births, are particularly effective in identifying whether units are still active or in existence. The periodic canvass of existing frame elements is one. The frequency of these updates varies. For the COS, a canvass of the multiunit portion of the SSEL is conducted annually (except for the census year) for the large companies (50 or more employees). The smaller companies are periodically canvassed as well, but not on a yearly basis and not all in the same year. The companies are asked to identify any listed plants that are now out of operation. The National Agricultural Statistics Service's list frame is canvassed within 5 years of the preceding canvass. The UI file is updated quarterly using a census of UI reports. Units not paying taxes for some period of time (4-8 quarters) are removed from the frame. On the other hand, fixed frame elements are not canvassed at all, since frames such as the Census of Manufactures are not updated. It should be noted, however, that the companies in the Census = canvassed to update the SSEL.   Inactive elements are also identified through maintenance operations performed on samples drawn from the frame. In nearly all survey panels, sampled units which are no longer active are routinely identified through inquiry. Maintenance operations performed on samples are more likely to reveal changes in status of known elements (from active to inactive) than to reveal births (new elements), although this method can be used for both purposes. The information obtained using this method can then be used to update the source file (frame). The SSEL is continually being updated from information obtained by the many Current Business Reports surveys, the Current Industrial Reports surveys, the Annual Survey of Manufactures, and many other economic surveys. The UI and National Agricultural Statistics Service list files are updated in this fashion as well. It is important to note that this method yields information for updating only the sampled units and for adding new elements revealed through survey operations. It does not permit updating the entire frame.   Misclassified elements. A problem with many frames is not that elements are missing, but that they are misclassified or are not classified at all with respect to one or more variables. This assumes importance if the variable or variables that are misclassified determine either the elements eligible for sampling or the subpopulations for which estimates are produced. Housing occupancy status (vacant or occupied), geographic codes, SIC code, age, race, and gender are examples of such variables.   For economic surveys, where the population distribution of the variable to be estimated is usually extremely skewed, some measure of size is often used in the sample design as either a stratification variable or for sampling with probability proportional to size. Incorrect values for the variable(s) used to derive an establishment's measure of size can result in its being placed in the wrong stratum and in other sampling inefficiencies. To illustrate, in the Annual Survey of Manufactures, which has a probability-proportional-to-size design, an arbitrary certainty stratum is defined to consist of all frame establishments with total employment greater than 250. Since estimates are not published for this stratum, the erroneous inclusion or exclusion of an   16           establishment in this stratum because of an incorrect employment size value does not bias a survey estimate, but the resulting sampling error may be different than expected.   Coverage error at the estimation stage due to misclassification will often be confined to sublevels. For example, in economic surveys like the Annual Survey of Manufactures (ASM) with independent sampling across SIC's, an inaccurate SIC code could result in undercoverage if the code identifies the unit as nonmanufacturing. However, if the SIC code is incorrect, but within manufacturing, only industry level estimates will be affected (see appendix A.1, section II).   Incorrect classifications may result from errors in data input during frame creation, but more often they occur because changes in frame units are never detected by the surveying organization. To update industry and location changes on the UI file, BLS conducts a refiring survey in which SIC and county codes are verified for each unit of the universe on a 3-year rotational cycle.   In the 1977 Economic Censuses, misclassification studies were conducted for both the employer and nonemployer segments of the administrative records frame. For the employer segment, a subsample of 5,505 out-of-scope employer cases on the 1977 Economic Censuses Master Sample were mailed the Economic Census General Schedule to complete. An estimated 3.1 percent of out-of-scope employer establishments with 0.4 percent of the employees and 0.3 percent of the annual payroll were found to be misclassified as out of scope (Hanczaryk and Sullivan 1980). An estimated 12 percent of the nonemployer establishments were misclassified as out of scope, resulting in a 20-percent underestimate of nonemployer receipts.   A majority of the employer misclassifications resulted from errors in the SIC on the administrative file. In some industries, an establishment may be comprised of distinct but related activities, e.g., construction and real estate. However, it is classified into an industry on the basis of which activity yields the highest percentage of total receipts. For example, an establishment with 45 percent of receipts attributable to construction and 55 percent to real estate sales would be classified as in the real estate industry and excluded from the Economic Censuses. If the percentages had changed between the time of coding and the census, the establishment would be misclassified, since construction was within the scope of the census.   The evaluation study found that many of the establishments misclassified on the basis of out-of-scope SIC codes had in-scope Principal Industrial Activity codes on the administrative file. Therefore, a significant drop in the misclassification rate could have been achieved through the use of another variable. On the other hand, missing or incorrect tax return Principal Industrial Activity codes were responsible for a majority of the nonemployer misclassifications. Again, in this situation, the use of additional tax return information could have substantially reduced the misclassification rate.   While many of the procedures discussed previously can, to a degree, aid in the identification of misclassified elements, none of them is really intended to address this problem comprehensively. Their effectiveness depends on the specific variable of interest. For example, although the SIC code is part of the information collected in the Company Organization Survey (COS), it is not likely that the companies would routinely verify that the right code is assigned. Nor can the Bureau of the Census determine the validity of the code, since the COS does not collect detailed data.   Another procedure to handle misclassification is used in the Producer Price Index survey. Since some establishments can easily modify their capital equipment to produce a different product, depending on demand, the 4-digit industry classification can change. Thus, industries for which a high proportion of this type of misclassification occurs are treated as related SIC's and sampled   17           at the same time. The sampled establishments are then assigned the proper SIC (see appendix A.4, sections III-V).   An auxiliary variable that is used as a measure of size can sometimes cause coverage error if it is incorrect. For example, in the Producer Price Index, the employment value on the UI file is used as the measure of size. Some establishments are reported with zero employees. In order to ensure that all units have a positive probability of selection within the probability-proportional-to- size sample design, two employees are added to the employment value of each unit. For EIA's survey of active oil and gas well operators, companies on the frame having no known production are sampled, so that they are represented along with those operators having known current production.   Unclassified elements are unique in that it is known at the outset that they exist. Knowing this information and perhaps the reason(s) for lack of classification allows the surveying organization to design a strategy for obtaining codes. This strategy may not necessarily be useful for resolving the lack of classification on all frames. For example, most of the units on the SSEL without SIC codes do not have them because the SSA is unable to assign them a code when their applications for new EIN's are received. Prior to a census year, the Bureau of the Census makes a concerted effort to code these records. This includes identifying key words in an establishment's name which might identify its activity and, when this fails, culminates in the mailing of a classification card to the establishment which asks for a description of its activity. A certain number of records remain uncoded. Although new EIN's are obtained by the Bureau on a continuing basis, little is done for them between census years because of the cost. However, attempts are made to classify a sample of them for the business surveys because most unclassified units fall within their population of interest.   Out-of-scope elements. Closely related to the problem of misclassification is the problem of out-of-scope elements, i.e., elements that if properly classified would not be part of the population of interest. They differ from the type considered above in that if they were properly classified, they would be dropped from the frame. Out-of-scope cases generally arise because historically they were coded in error. It may be, however, that their status has changed so that they are no longer part of the population of interest (see appendix A.2, section II). As with death elements, the presence of out-of-scope elements on a sampling frame does not result in any biased sample results should they be sampled (assuming the survey processing identifies them as out of scope), but it does compromise the efficiency of the sample.   Split-out or combined frame elements. The composition of elements constituting a frame will often change over time. This is especially true for establishment frames, where, for example, individual plants are bought and sold by companies, two or more companies merge, or companies divest. But it is also true for frames of housing units and households. In these instances, frame maintenance properly includes activities which update or modify the frame to account for compositional changes.   Compositional changes do not necessarily affect the number of units on establishment frames and, thus, the overall coverage of the frames. Indeed, it is likely that no changes in plant activity occur at all. If both the sampling unit and the reporting unit are the establishment, it is really not vital that the corporate owner of the plant be known as far as data collection is concerned. From a coverage point of view, however, ownership may be important because the sample status of a sold establishment often depends upon the status of the buying company. Also, in some economic surveys, establishment records are combined into company records for sampling purposes. Thus, there are a variety of other reasons, some coverage related, which mandate maintenance of proper identification of plant ownership.   18           For the SSEL, the annual canvass of the multiunit establishments by the COS is the prime basis for maintaining the identification. The COS provides a list for each company of all the known establishments of the company and requests that the company verify and update the list by indicating any new establishments it has opened, the name and seller of any additional plants it has acquired, the name and purchaser of any plants it has sold, and any plants it has closed down. COS processing identifies the new owners (successors) of sold plants and the old owners (predecessors) of bought plants and corrects the records for those companies as well. Similar activities are conducted for virtually every census/sample survey enumeration at whatever level the reporting is done.   The UI file is another example of a broad-based file which is supplemented with quarterly data for multiple reporting units. The frame units that have undergone changes in composition are identified as new owners with predecessor relationships and old owners with successor relationships. Thus, for establishment-based surveys, the company is queried about changes in the operation of each establishment, including whether or not it has been sold or leased to any other company.   In the Producer Price Index survey, four economic characteristics are used to define a unit. Pertinent establishment data are obtained via telephone interviews during the frame refinement process. Any change in the composition of an establishment is captured on the universe file. If an establishment is split, the new portion is treated as a birth and is added to the universe file with a special code. If a unit is sampled and during the first interview a portion is identified as either split or sold, it is treated as a field-created sampling unit and data are collected (see appendix A.4, sections III and IV).   Agricultural frames generally do not have this kind of multiunit situation. Each operation (farm or ranch) is defined by one common land operating arrangement. This lowest common denominator precludes the necessity of keeping track of elements within farm units. However, bookkeeping arrangements covering farms in a hierarchical management system may necessitate periodic monitoring to be sure each unit is accounted for but not duplicated. A more complete description of counting rules for agricultural surveys is provided in the Quarterly Agricultural Surveys case study (appendix A.5).   Likewise, for the Census of Population and Housing frame, the identification of basic addresses does not change at the transfer of ownership from one person to another. Other identification problems may exist, such as how many separate living quarters really exist at a particular location.     1.2.3. Match-merging of independent source lists   In many of the examples of updating procedures discussed above, it was noted that outside source lists or files were used to update a primary frame. Among the problems arising from the use of such lists not addressed above are those associated with matching and merging such lists to update primary frames. These problems deserve special mention because they affect frame completeness.   The two general classes of error that can occur when combining lists are: Erroneously adding an element already in the frame and erroneously removing a qualified element from the frame. The two types of error are not equally problematic, since a more stringent set of rules can govern the deactivation of a frame element than governs the incorporation of a new element   The updating process entails some formal matching between the primary frame and the source list. Various identifiers may be utilized in the match operation (Fellegi and Sunter 1969, Scheuren and Oh 1985). These identifiers may have different degrees of precision ranging from   19           very precise (e.g., EIN) to less precise (e.g., name, address), and it may be that successive matches are attempted on each level of identifiers. At the end of this process, records on the primary frame and on the update list can be allocated into three mutually exclusive parts:   a. Records which are classified as matching (i.e., appear on both frame and source list), b. Records on the primary frame that do not match to the source list, and c. Records on the source list that do not match to the frame.   Some records which match may represent false matches. Depending on the quality of the source list, i.e., whether it is truly free of duplicates and out-of-scope units, false matches can lead to failure to add a unit or, more rarely, failure to identify a potential death.   Depending on the conceptual and operational fit between the source list and the in-scope population, the failure to match some frame records to the source list may or may not be a problem. The ascribed completeness, timeliness, and accuracy of the source list are all important in deciding whether unmatched entities have died and should be eliminated from the frame (or flagged) or to leave them on the frame and let sampled deaths be revealed during data collection. If any sampled entity is a death, that fact will become apparent when it cannot be found, although in a mail- out/mail-back survey, any such unit may be presumed to be a nonrespondent for which data are imputed.   Source list records not matched to the frame represent potential additions to it. Although these records can just be added to the frame, it would be more prudent to try to determine whether they are duplicates of existing units or are out of scope.   Many variants of the process described above occur in survey situations. At one extreme, there is no attempt to determine if firms with newly issued EIN's are already on the SSEL under an older EIN. At the other extreme is the procedure followed for the National Master Facility Inventory of inpatient health facilities, in which names and addresses of facilities from source lists that do not match the primary frame are automatically added to the frame.   Problems related to the timeliness of the source lists can arise. Since the population of interest is not static and events are cumulative, combining untimely information from source lists with that on an existing frame can lead to numerous errors. Ideally, one wants the frame resulting from the match-merge to include all elements in the population of interest and to exclude elements which are not. For example, if a survey is to be conducted of firms currently in business, one does not want to rely on a historical file of all businesses that does not denote firms no longer in business. To do so would be to risk including these firms in the sample. Less error prone, but still problematic, is the use of source lists containing information for units that existed at any single point during the year. Units that exist throughout the year will be included, along with those born during the year. But, those that died will also be included. Should such a list be used to update a frame for a sample survey in the succeeding year, units no longer in existence may be included. In such instances, the reasons units no longer exist may be extremely useful pieces of information, if they can be obtained. For example, has a f= simply gone out of business, did its name change, or was it purchased by another company?, Otherwise, when deaths are sampled and revealed as deaths during data collection, frame records for these deaths can be flagged as inactive or deleted from the frame, as appropriate. Also, the use of less than timely source lists can result in the addition"of unknown out-of-scope units that will remain on the frame to plague subsequent surveys.   1.3. Sample design strategies to minimize coverage error   In the previous section, the discussion focused on coverage error associated with sampling frames. Solutions to problems arising from the limitations of available frame sources are a major   20           challenge to the survey design statistician. Colledge (1989) identifies and discusses 26 specific coverage and classification problems faced by Statistics Canada in its Business Survey Redesign Project, as well as possible alternative solutions.   This section presents some sample design and estimation options available to survey designers in dealing with recognized deficiencies in a frame. The options discussed are: Defining the target population to equal the frame population, random-digit dialing sampling, multiple frame sampling, sampling rare populations, and estimation procedures.     1.3.1. Defining target population to equal frame population   While it is important not to imply coverage of a wider population than the one covered by the frame(s), it is more important to make concerted efforts to reach every member of the original target population, even if this means using additional frames or more expensive procedures. Only intolerable expense or practical impossibilities should be grounds for narrowing the defined target population, as discussed in section 1.1.   Hansen, Hurwitz, and Jabine (1963) provide an example of how a coverage problem for a survey about truck ownership and operation was handled. When it became clear that State motor vehicle registration records did not include all trucks being operated and that coverage of truck registration varied by State, the scope of the study was redefined as registered trucks instead of all trucks.     1.3.2. Random-digit dialing sampling   One household sampling method used to avoid omission of households with telephones is random-digit dialing (RDD) (Waksberg 1978). The use of telephone directories as sampling frames often results in unacceptable levels of undercoverage because they omit unlisted numbers for some nontypical portions of the population. With RDD, a sample of telephone households is located through the use of randomly generated telephone numbers. In this way, only those households without telephones are omitted. For many surveys, this could be considered a trivial exclusion. In others, differences between telephone and nontelephone households may have a profound effect on the characteristics being measured. For example, measures of poverty and income from entitlement programs would most likely be biased because households in poverty or receiving such income are less likely than other households to have telephones. The collective experiences of numerous researchers and survey statisticians who have used RDD are presented in Groves, et al. (1988).   An extensive discussion of the health characteristics of persons in telephone and nontelephone households is presented by Thornberry and Massey (1988). Data from the National Health Interview Survey indicate that those in the nontelephone U.S. population are more likely to suffer disability days, chronic conditions, and hospitalizations than those in the telephone population. At the same time, those without telephones have fewer visits to physicians and dentists and are much less likely to have private health insurance. These findings are consistent with expectations, given there are disproportionately more low income families in the nontelephone population. The authors note that for most characteristics, the differences between the values for the telephone households and the total population are small because 93 percent of all households can be reached by telephone via RDD. However, estimates for certain population subgroups could be severely biased when based on an RDD survey. The authors note that an RDD survey seeking information on preschool aged children would exclude about 12 percent of them, and also almost one-third of such children living in poverty.   An example of favorable results from using RDD is reported by Williams and Chakrabarty (1983) for the State of Michigan portion of the 1980 National Fishing, Hunting, and Wildlife   21           Associated Recreation Survey. Parallel surveys were conducted utilizing an RDD sample and a subsample from previous Current Population Survey samples which did not depend upon presence of a telephone. The report points out "the socioeconomic characteristics and the sportsmen variables between the two studies do not reflect any substantially important differences." However, there were differences in results for "nonconsumtive users," i.e., wildlife-related activities outside of hunting or fishing. These activities were highly related to the geographic location of the user, so findings may result from the geographically restricted nature of the expired CPS samples compared to the unrestricted nature of the RDD sample.   A study by McGowan (1982) on telephone ownership in the National Crime Survey sample contains evidence that the exclusion of nontelephone households has a significant effect on the measurement of crime victimization in the United States. In this instance, the use of RDD without a supplemental frame to provide a sample of nontelephone households would be unacceptable.     1.3.3. Multiple frame sampling   Coverage may be improved through the use of multiple frames. Sometimes, no single frame fully covers the target population and merging independent source lists would be impractical. In this case, separate probability samples from different frames can be used to expand coverage beyond any available single frame. (Additional frames may also be used to increase sampling efficiency if coverage is already sufficient.) The use of multiple frames entails two assumptions (Hartley 1962):   - Every unit in the population of interest belongs to at least one of the frames, and - It is possible to record for each sampled unit whether or not it belongs to the other frame(s).   The first assumption requires linkage between the sampling frame units and the target population. Application of rules of association to accomplish this linkage is needed when sampling from any frame (Hansen, Hurwitz, and Jabine 1963). When multiple frames are used, sampling units = often different between frames. This is of no consequence as long as the different sampling units lead to a common reporting unit during the survey. Complete coverage of the reporting units should be equivalent to complete coverage of the population of interest. The rules of association, from sampling units selected to reporting units tabulated, must ensure the representation of each population element once and only once in the final estimates. Field representatives must be equipped with clearly defined rules that can be communicated to respondents to achieve this unique representation.   Difficulty with the first assumption, given a need to use multiple frames, arises because concurrent application of different rules of association may be required of field representatives depending upon which frame supplied the sampled unit. Potential errors in associating sampling units with reporting units are discussed in section 2.l.   The second assumption requires that frame membership be known for each population unit. Nonoverlapping frames are a special case, wherein each population unit is assumed to be on one and only one frame. The statistical theory for this special case is essentially the same as for stratified sample designs.   In the case of nonoverlapping frames, each frame represents a different, unique segment of the total population. The principal consideration here is that the same reporting unit not be included in more than one of the sampling frames. In this way, estimates for the nonoverlapping frames are additive to the greater population of interest. Examples of the use of nonoverlapping frames within government are the Current Employment Statistics survey as well as other establishment surveys conducted by the Bureau of Labor Statistics. The primary frame for employees comes   22           from UI reports filed with State employment security agencies. The UI frame covers about 98 percent of wage and salary employment in the United States. Supplemental, nonoverlapping coverage comes from the Interstate Commerce Commission for interstate railroad employees. Another example is found in the National Cancer Institute's epidemiology studies, where the under-age-65 group is selected from a frame of driver's license records and the over-65 group is selected from a frame of Medicare records.   Most housing unit surveys conducted by the Bureau of the Census, e.g., American Housing Survey, Current Population Survey, Consumer Expenditure Survey, National Crime Survey, and the Survey of Income and Program Participation, use a combination of frames (U.S. Department of Commerce 1978a). In areas where building permits are required and maintained by a local government and the Census of Population an( Housing addresses contain street names and , numbers, the census lists are used as the basic sampling frame. A sample of building permits is also selected to cover housing units built after the census. Conceptually, these two frames are nonoverlapping even though they refer to the same land areas. In other areas of the country where permits are not available for sampling or the census address lists are considered inadequate, land areas are sampled, and an address list is created by field representatives. For a discussion of coverage errors in this listing process, see section 2.2.l.   Use of overlapping frames in the application of multiple-frame survey methodology mandates that extraordinary attention be paid to potential errors in the survey process. A population (reporting) unit may fall within any or all frames utilized. Sampling units are selected from each frame and linked by rules of association with corresponding reporting units. Each reporting unit must ultimately be represented exactly once across all frames utilized. This may be accomplished either directly through a matching process to remove duplicates or indirectly by weighting adjustments. (The latter tends to be far less costly.) Duplication because of multiple representation or omission through failure to account for the unit in at least one of the frames can result in serious coverage errors.   Sampling from overlapping frames is most commonly done when an area frame and an overlapping list frame are available. The area frame is generally designed to provide complete coverage by including as sampling units all land parcels which encompass the population of interest. The list frame is nearly always incomplete, a common attribute of lists, but its use provides certain sampling efficiencies which enable the multiple frame survey to provide the same precision at a much lower cost than would an area frame survey alone.   Examples of the area/list dual-frame survey approach may be found in the Department of Agriculture (in nearly all inventory and economic probability surveys conducted by the National Agricultural Statistics Service) and in the Bureau of the Census (in the Monthly Retail Trade Survey and the Services Annual Survey). The Department of Agriculture's application of this approach for the Quarterly Agricultural Surveys of crops, hogs, and grain stocks inventories is a typical illustration of the linkage requirements with multiple frame sampling (see appendix A.5).   An important special case occurs when an existing complete frame is used in conjunction with a list of telephone numbers. This general case has been discussed extensively in the literature. See, for example, Lepkowski and Groves (1986) and Biemer (1983). Important special cases are considered by Lepkowski (1988).     1.3.4. Sampling rare populations   There are two known procedures to compensate for undercoverage that are especially useful for surveys of rare or elusive populations: Network sampling and capture-recapture methodology. Both are briefly described below.   23           Network sampling used in conjunction with multiplicity estimation (Sirken 1970 and Sirken and Levy 1974) relies on a known set of relationships (or links) between members of the population. Network sampling, unlike more traditional sampling, uses links which extend beyond the usual sampling or reporting unit by building rules for more extensive sampling. One example of such extended rules is the sibling rule, where sampled members are asked not only about themselves, but also about all brothers and sisters not living in the same household.   In network sampling, a sample is drawn from an established frame using a probability sampling procedure. The sample is then contacted and interviewed to determine which sampled members have the characteristic of interest. Sampled members are then asked about the set of related individuals having the characteristic being studied. In this way, several members of the population are covered in one interview. Since this procedure is potentially prone to increased response error or item nonresponse, the names and addresses for the related individuals who are said to have the characteristic of interest are often obtained, so that the individuals can be contacted directly. This technique is best known for its potential to improve efficiency when the characteristic of interest is rare. However, it also has the potential for improving coverage when people are reluctant or unable to provide information about themselves and when the sampling frame is incomplete (Sirken 1983).   A survey to collect data on recent decedents is an example of a population unable to provide information about itself. The traditional methodology would be to collect information at the household that had been the decedent's place of residence. A network sampling approach would be to collect information at the household of a surviving spouse, siblings, or children residing in the county of the decedent, either instead of or in addition to at the decedent's last place of residence. Sirken (1983) reports on the results of experiments conducted in North Carolina to compare coverage between network sampling and traditional sampling (Sirken and Royston 1976). The traditional method missed 29 percent of the deaths; reports from decedents' relatives' households alone missed 22 percent of the deaths; and reports from both decedents' former residences and decedents' relatives' households missed 15 percent of the deaths. Emigrants are another group of people for whom network sampling can improve coverage because they cannot report for themselves.   Network sampling can be useful to improve coverage on incomplete sampling frames (Sirken 1983). Persons with no fixed address would usually be missed by traditional sampling but could be identified by relatives or friends. Also, if institutions or Armed Forces barracks are not included in the sampling frames, network sampling can be used to find persons living in these otherwise uncovered places.   Use of network sampling requires that the number of population units eligible to report each sampled individual be known. This number is used in the estimation process to adjust the probability sampling weight for each sampled unit.   Also known as dual system (or multiple system) estimation, capture- recapture methodology assumes that one or several frames have less than perfect coverage of the population, and that the amount of undercoverage is unknown. Capture-recapture methodology is essentially a counting technique and is used to determine the number of individuals in a population, or the number of individuals with a specific characteristic in a known population.   The population to be studied is defined independently of any frames, but at least two overlapping frames are needed to make an estimate of the population size. Membership on any frame is modeled as a stochastic event and, for two frames, membership is also assumed to be an independent event between frames. The two frames are matched, or a sample from one frame is matched to the entirety of the other frame. An estimate is then made using the number of persons estimated to be in the first frame (N,), the number of persons in the second frame (N),   24           and the number found in the match to be in both frames (M). The estimator (Marks, Seltzer, and Krotki 1974, p. 15) of the population size (T) is t = N.1N.2³M.   A number of assumptions are required to satisfy the model which generates this estimator; some of the assumptions can be relaxed if more lists are available for sampling. Lists can include administrative records, but the model requires the assumption that membership in the records system is a random event, an assumption that usually does not hold. (References include: Casady, Nathan, and Sirken (1985); Czaja, Snowden, and Casady (1986); and Cowan, Breakey, and Fischer (1988).)   For a general treatment of strategies for sampling rare populations, see Kalton and Anderson (1986). See also appendix A.2, section III.     1.3.5. Estimation procedures   Estimation procedures which compensate for known coverage error in frames may be used to decrease the bias of survey estimates. Improving frame coverage is always better than using these estimation procedures. One such procedure is ratio estimation or benchmarking; another approach is multiple frame estimation.   The Bureau of Labor Statistics employs a benchmarking procedure to revise monthly employment estimates from the Current Employment Statistics survey (U.S. Bureau of Labor Statistics 1989). Sample estimates are compared each year with later summarizations of mandatory UI reports filed by employers. The UI data, which serve as a benchmark, are an aggregation from the same source as the microdata used to construct the frame from which the sample was selected, except that the benchmark data are one year newer. Hence, the benchmark file takes into account new firms or changes in industrial classification to ensure more accurate coverage. The completeness of the UI administrative data affords the opportunity to analyze and adjust for frame deficiencies (Thomas 1986).   Most of the current surveys conducted by the Bureau of the Census use ratio estimation to projected population totals by age, sex, and race. For further discussion of the procedure as applied to the Current Population Survey, see appendix A.7, section VI.   The use of multiple overlapping frames requires the use of an estimator which may be written as follows for the two-frame case, where frame sizes are known but overlap domain size is unknown (Hanley 1962):   Y = (N.A³n.A)(y.a + py'.ab) + (N.B³n.B)(y.b + qy".ab)   where subscripts A and a denote the two sampling frames, N and n are the population and sampled units, and Y is the total of some variable to be estimated. Subscripted y's are estimated totals from the two frames (y.a based on units uniquely in frame A, y.b for units only in frame B, and y'.ab, the estimated total for units in both frames as measured by the frame A sample, while y".ab applies to units common to both frames from the frame B sample), and p and q are weights which sum to one. In this way, the estimates unique to each frame are added to a weighted combination of those units common to both frames. The parameters is selected so variance is minimized subject to a cost function reflecting differences in sampling from each frame.   A common application of this estimator utilizes a complete area frame A and an incomplete but more efficient list frame B to generate a screening multiple frame estimate of the form:   Y = (N.A³n.A)y'.a + (N.B³n.B)y".ab   24           where the estimate for the units unique to the area frame (nonoverlapping domain) is added to the list"frame estimate for the units common to both frames. In this case, the parameter p, from the general formula above, is zero, and q equals one. Other terms disappear because no units exist on the list that are not contained within the area frame.   It is easy to see in the simple form of the multiple frame estimator the importance of properly determining whether or not a unit is represented by one or both frames. Unrecognized overlap between the frames produces duplication in the estimate, while improper designation of a unit as overlapping results in omission.     1.4. Evaluation methods   One method of measuring the degree of frame coverage error is comparative analysis. Comparative analysis can occur at two levels. The first is a macro-level evaluation, which compares known population values with totals derived from summing characteristics for each sampling frame unit. The second type of analysis is performed at the micro or individual sampling unit level. This most often involves matching of data available from different sources for individual units.     1.4.1. Macro-level analysis   How do totals associated with sampling units compare with other measures of the target population? Suppose we have an area frame. The sum of the areas in individual sampling units or segments should match closely with the measured area of the total frame, e.g., county, State, or other target area. The National Agricultural Statistics Service electronically digitizes clusters of area sampling units and verifies that the accumulated total is within 0.5 percent of the published land area for each State (Cotter and Nealon 1987).   Tortora (1987) notes that with two frames, one a complete area frame, a process quality control evaluation of a list frame is possible through the use of survey data. For example, list coverage of the number of farms or land in farms can be estimated by the sizes of the overlap and nonoverlap domains from the area frame. Likewise, the number of out-of-scope list units can be estimated from the samples in each frame. Monitored over time, the measures of list frame performance will provide knowledge and control of list coverage.   Similarly, the number of names in a list frame can sometimes be compared with census counts for the population of interest. Generally, the information available on every sampling unit is very limited and only gross comparisons with known population totals can be made. More often, totals estimated from sample surveys can be compared to similar quantities from other sources in order to provide measurements for the frame. Two of the most common sources are census and administrative files.   Reconciliations are made between economic census totals and corresponding totals from the Current Industrial Reports annual survey for census years. Similarly, the Department of Agriculture conducts a continuous survey program for the agricultural sector and routinely compares inventory and production estimates with those obtained in the agricultural censuses conducted at 5-year intervals by the Bureau of the Census.   The Bureau of the Census utilizes still another macro-level approach for frame completeness evaluation called demographic analysis. With this method, demographic data from various sources are used to develop expected values for the population as a whole and by race, age, and sex to compare with the census counts. This procedure relies on aggregate statistics of birth, death, immigration, emigration, past censuses, Medicare enrollment, and other sources to provide estimates of net census coverage errors for broad categories at State and national levels   25           (Fay, Passel, and Robinson 1988). The estimate of the net undercount of the legally resident population in the 1980 decennial census is 1.0 percent using this procedure (p. 26).   The mandatory UI reports filed by employees with their State employment security agencies are the primary source of information for the BLS Universe File. This file is used both for sampling frame maintenance and during the estimation process. For example, comprehensive totals from the Universe File at the SIC or SIC/size class level can be used to evaluate the sampling frame inadequacies caused by lack of timeliness for the Current Employment Statistics Survey. Births of new firms (economic units which have begun operations since the time of frame construction) and inaccuracies at detailed levels resulting from changes in SIC codes contribute to differences between the survey frame and the target population. The degree of undercoverage during the time lag until discovery of new units.depends upon the number and size of operations entering the target population. During the estimation process, an updated Universe File is used to ratio adjust the estimates; the reference period for the updated Universe File is one year later than for the Universe File used as a sampling frame. Evaluations of survey data versus target population totals show that only minor revisions apply to Current Employment Statistics Survey results (Thomas 1986).   Several studies have been made of business births and job generation which indicate the importance of measuring employment in new businesses. Roughly 800,000 businesses are formed each year, creating 2,500,000 new jobs. While jobs in new businesses constitute a small fraction of total nonagricultural payroll employment (annual average employment of 104,300,000 in 1988), they are a substantial portion of net new jobs (2,800,000 from 1986 to 1987 and 3,200,000 from 1987 to 1988). An analysis of Dun and Bradstreet credit rating information (Birch 1979) showed that small businesses (20 or fewer employees) accounted for two-thirds of net new jobs between 1969 and 1976. Other studies using Small Business Administration files at the national level or files at the State level have shown that more than half of the net employment growth came from small businesses or business births (Armington and Odle 1981; Teitz, Glasmeier, and Svensson 1981; Connor, Heeringa, and Jackson 1985). These studies show the importance of including new businesses in establishment surveys of employment.     1.4.2. Micro-level analysis   Micro-level analysis of sampling frame units implies direct matching or linkage of the same units found in more than one source. Given a common reference unit, be it person, housing unit, or business, the information available from an administrative file, a census, or survey source should verify and enhance the dam associated with the unit.   The U.S. Department of Commerce's "Report on Statistical Uses of Administrative Records" (1980) includes four case studies of projects utilizing comparative analysis between surveys, census data, and administrative records. All four of these studies utilize matching between files at the individual record level to assess coverage problems and illustrate the kinds of sampling unit evaluations possible across frame sources.   Such record-matching studies are performed for statistical purposes only. In general, strict laws govern the release of unit-level data collected for statistical purposes. For example, Title 13, U.S. Code prohibits the Bureau of the Census from releasing information that allows the identification of survey or census respondents. The same law allows the Bureau of the Census to obtain administrative information from other agencies and organizations in support of its statistical activities, including computer matching. When the statistical and administrative records are matched, the resulting files are afforded the same guarantee against disclosure as the statistical records.   29           Two of the studies are concerned with the use of multiple administrative record systems to serve as frames for integrated samples and to provide basic information for record-check studies. The first is the Linked Administrative Statistical Sample Project involving samples from the records systems of the Internal Revenue Service, the National Center for Health Statistics, and the Social Security Administration. In the second project, "The Use of Administrative Records in the Survey of Income and Program Participation (SIPP)," the Department of Health and Human Services and the Bureau of the Census use administrative files from the Aid to Families with Dependent Children program, Supplemental Security Income recipients, and the Basic Educational Opportunity Giants applicants.   The third study describes how the Bureau of the Census uses the Internal Revenue Service tax files for persons aged 17 to 64 years and the Medicare file for persons 65 or over in conjunction with the Current Population Survey sample to provide reliable estimates of coverage error in the 1980 decennial census for certain geographic areas and some socioeconomic categories. The final study, "Record Linkage in the Nonhousehold Sources Program," is a large-scale record check by the Bureau of the Census which matches census returns against administrative records for drivers' licenses from State departments of motor vehicles and against registers of resident aliens supplied by the Immigration and Naturalization Service to reduce census undercoverage.   A comparison of macro- and micro-level analysis procedures for evaluating coverage of the 1980 Census of Population and Housing and results of their application can be found in Fay, et al. (1988).   30           CHAPTER 2 COVERAGE ERRORS OCCURRING AFTER INITIAL SAMPLE SELECTION   Errors occurring after the initial selection of a sample from a frame can be classified broadly into three types: Incorrect association of sampling with reporting unit(s), listing errors, and other nonsampling errors.   In the first section of this chapter, the discussion focuses on coverage errors which occur in the process of associating a unit selected to be in the sample with a unit from or about which data are to be collected. Misidentification of the physical unit selected for sample and misclassification of the sampled unit as either a member or nonmember of the eligible population are two types of such errors. Some of the coverage problems inherent in the survey process, which by necessity cannot be instantaneous, are discussed under the rubric of temporal errors.   At some point in the sampling process, a frame or master list of subunits within the last selected unit may not exist, so field data collection agents are asked to compile such a list for further sampling or interviewing. The more stages of listing and sampling required to move from the initial frame to the unit for which a response is desired, the greater the opportunity for mistakes. This is particularly true when sampling control moves from a centralized location to individual field representatives. Listing errors are the topic of section 2.2.   In section 2.3, the discussion focuses on the coverage implications of other types of nonsampling errors, such as recording errors, volunteer responses from nonsampled units, and failure to elicit a response for a sampled population unit (nonresponse).     2.1. Incorrect association of sampling with reporting unit(s)   Sampling units used in the frame construction process often differ from the reporting units which are or which provide data about members of the target population. Relating the sampling unit to the reference (reporting) unit for respondents in a concise and unambiguous manner is an important challenge for the survey design statistician. The ability of the field representative and respondent to avoid coverage errors at this stage of the survey process depends upon the clarity with which sampling and reporting units of interest can be defined in survey materials and field representatives' instructions.   The concern here is with the rules which must be applied by field representatives to determine what reporting unit is specified by the sampling unit. Fuzzy rules that do not consistently lead the field representative from a single sampling unit, selected with known probability, to one or more unique reporting units while retaining the correct known probability of inclusion, can result in coverage errors. When the rules of association are very complex, coverage errors are more likely to occur. Similarly, survey concepts that are unclear or objectionable to respondents can also impair coverage.   Rules of association delineate the relationship between sampling units and the final reporting units. For the purpose of this report, rule-of-association errors have been classified into three basic types: Location errors, classification errors, and temporal errors.     2.1.1. Location errors   Location errors result from the difficulty of associating sampling units with reporting units when the sampling units are not uniquely or clearly defined or when they are difficult to locate. In a housing unit survey with a good listing of addresses, it is extremely difficult to determine if field representatives have conducted interviews at wrong addresses. Error studies of sampling unit   31           listings for the CPS have shown the overall quality to be quite good. A very small percentage of errors has been attributed to such things as defective address units (0.31 percent), errors in records that caused serious problems for field representatives (0.09 percent), and combinations of two separate addresses into one (0.10 percent) (U.S. Department of Commerce 1978a).   In surveys of establishments or firms, the task of associating sampling units with reporting units is more difficult. For instance, if a part of a larger organizational entity is sampled, the response may reflect the entire firm rather than the sampled part. Sometimes, as has been noted in the Annual Survey of Manufactures, respondents will request forms for additional (unsampled) establishments within their company, then proceed to complete and submit those forms. The same survey has also determined that respondents may combine data from several plants, some of which are not included in the survey, and some of which are individually selected. In an attempt to avoid this problem with one segment of the population in the Annual Survey of Manufactures, single-unit companies are asked whether they have operations at any additional locations. If so, they are asked to list those locations and report minimal data, such as kind of business, sales, payroll, and employment. They are also asked whether data for these units have been included as part of their response. If so, the data are allocated to each of the sampling units during data reduction. The additional locations are researched to determine whether they had any previous chance of selection or had been omitted from the frame.   In overlapping multiple frame surveys where the opportunity for multiple representation within and between frames is high, appropriate coverage of the population is heavily dependent upon rules of association. These rules must define how to disaggregate data among sampling units in such a way that the reporting unit is represented only once. For example, the reporting unit of interest in most Department of Agriculture surveys is land operated by farmers and ranchers. Farm operators provide the required survey information regarding inventory of crops and livestock for the acres they operate. The list and area frames for these surveys utilize two different sampling units. Parcels of land sampled from the area frame must be linked with the operators farming the land. Names of potential operators sampled from the list frame must be verified as operators and the amount of land operated must be determined (Beller 1979 and appendix A.5).   There are then several sources of coverage error which must be addressed by rules for associating sampling units with reporting units within and between the frames used in the Department of Agriculture surveys. Within sampled land segments from the area frame, field representatives must identify the appropriate operator or operators in order for State office personnel to check whether or not that land is also represented in the list frame. For units sampled from the list frame, the proper linkage between sampled name and land operated is necessary to ensure the same land would be identified as overlapping ff found in the area sample. Finally, the names of operators and any multiple names associated with land parcels (e.g., farm business names or partners farming together) must be obtained in order to determine which sampled units are common to both frames.   The method of handling duplication within the list frame can influence the rules for matching elements between frames. Each of the names associated with a sampled parcel of land in the area frame must be matched against the list frame. There are then at least three alternative procedures for assigning.response data to the overlapping and nonoverlapping domains.   - Prorate the data to each name associated with the reporting unit so the proportions allocated to the overlap and nonoverlapping domains correspond to the proportions of names on the list and not on the list, respectively (partial overlap procedure). - Associate all dam with the list frame so long as any name associated with an area tract of land is found on the list (list-dominant procedure).   32           - Accept an area parcel as belonging to the list domain only if all names appear together as a single list sampling unit (area-dominant procedure).     2.1.2. Classification errors   In many surveys, the population of interest is a subset of the sample frame. The decision to define a frame unit as within the scope of a survey and to conduct an interview may be made in the office on the basis of administrative data on the sampling frame, by a respondent in a mail survey, or by a field representative who has a set of rules to determine ff a sampled unit should be interviewed. The set of rules may take the form of procedures or a screening questionnaire. To the extent frame units or sampled units are misclassified as out of the scope of the survey, undercoverage occurs.   Misrepresentation of out-of-scope units as members of the target population results in overcoverage. This can normally be detected during data collection when a response is obtained. If changes in the sampling units have occurred between construction of the frame and the first interview, the first interview must include procedures to deter-mine whether the unit should be interviewed. If a nonresponding unit is treated as a missing unit through imputation or weighting when it is not a member of the target population, then overcoverage results. In this case, deaths of sampling units present a special difficulty during the time they remain sample nonrespondents.   Some examples of surveys which collect data from only a subset of the sample are given in table 2.   misclassification in many economic surveys results from inaccurate SIC coding on the administrative files used as sampling frames. Misclassification in most housing unit surveys results from field representatives classifying an occupied housing unit as vacant.   The estimation of misclassification rates is an important tool for evaluating the effects of misclassification on coverage. However, misclassification rates are more often estimated for censuses than surveys. Therefore, the following discussion of misclassification- -its magnitude, causes, and effects--is based primarily upon census evaluation projects.   33           In the 1982 Census of Agriculture, an estimated 3.1 percent of the estimated total number of farms in the United States were missed because they were misclassified as nonfarms, and approximately 3.7 percent were overcounted due to misclassification of nonfarms as farms (U.S. Bureau of the Census 1984). The farms which were missed because of misclassification accounted for approximately 23 percent of all estimated missed farms. Eighty-four percent of the misclassified undercounted farms had less than $10,000 in sales. Misclassification in the Census of Agriculture may result from the respondent misinterpreting the instructions and/or census definitions. For example, some respondents feel their operation is "too small" or "for home use only" and should not be classified as a farm. Others think that because they have no sales of crops or livestock in the reference year or were only operating a farm for a few weeks in the reference year, they should not respond. Incomplete reporting by respondents and errors in census processing also lead to classification error.   In two separate evaluations of the 1970 Census of Population and Housing, 11.4 percent and 16.5 percent of the units initially enumerated as vacant were misclassified and should have been enumerated as occupied (U.S. Bureau of the Census 1973). The occupied-enumerated as vacant misclassification rate was approximately twice as high in multiunit structures as in single- unit structures. As a result of the National Vacancy Check study, 0.5 percent of the total population was imputed to adjust for the misclassification of occupied units as vacant.   As a result of the 1970 census misclassification problems, a follow-up of vacant and deleted housing units was conducted as a coverage improvement procedure in 1980 (U.S. Bureau of the Census 1987). This procedure found that an estimated 17.3 percent of the deleted units should have been enumerated, 7.5 percent as occupied and 9.8 percent as vacant, and an estimated 11.2 percent of the vacant units were misclassified, 9.1 percent as a result of enumerator error and 2.2 percent as a result of procedural errors. Enumerator errors occurred when enumerators visited housing units occupied before and during the census but classified them as vacant. Procedural errors occurred when housing units had been vacant early in the census-taking period, but were occupied before the end of the census by people who had not been enumerated at previous addresses. As a result of this study, 1,724,087 persons (0.76 percent of the total population) were added to the 1980 decennial census count.   In October 1966, the CPS reinterview survey concentrated on measuring the coverage errors made by field representatives (U.S. Bureau of the Census 1968). In the CPS, noninterviews are classified as type A if the housing unit is occupied by persons eligible for interview; as type B if the unit is either unoccupied but could become occupied, or occupied solely by persons not eligible for interview; or as type C if the unit is ineligible for the sample. In table 3, a cross-classification of the 325 reinterviews conducted during that time period indicates approximately 10 percent of the units originally classified as vacant should have been classified as occupied. Table 3. Reinterview classification of units originally classified as noninterview: October 1966     Reinterview Classification (percent) Original Interview Classification or Type A Type B Type C Type A 99.04 0.96 - Type B 10.11 88.83 1.06 Type C 12.12 - 87.88     As a comparison, estimates of the reinterview classification of the 2,499 units originally classified as noninterviews during the period April to September 1966, when coverage was not the focus of reinterview, are shown in table 4. In this case, only about 3 percent of the units originally classified as vacant were found to be occupied.   34           Table 4. Reinterview classification of units originally classified as noninterview: April to September 1966   Reinterview Classification (percent) Original Interview Classification or Type A Type B Type C Type A 96.57 3.26 0.17 Type B 3.45 95.87 0.68 Type C 12.12 - 87.88     The misclassification error rates for type B noninterviews were found to be higher when the purpose of the reinterview was specifically to examine coverage errors than when the reinterview was not specifically targeting coverage errors. Therefore, the CPS misclassification error rates presented in table 5, based upon the reinterview of 4,940 units in 1987, should be interpreted with this fact in mind (Newbrough 1988). In addition, the reader is cautioned to note that the relative standard errors for these estimates are high. Standard errors are not available for the 1966 data, but the standard errors for the 1987 percentages are given in parentheses below.   Table 5. Reinterview classification of units originally classified as noninterview: 1987   Reinterview Classification (percent) Original Interview Classification or Type A Type B Type C Type A 0.12 (0.05) 0.06 (0.03) Type B 0.43 (0.09) 0.67 (0.12) Type C 0.02 (0.02) 0.12 (0.05)     Beginning in 1985, the type B noninterview rate for the Current Population Survey and the Survey of Income and Program Participation began to increase gradually. Once identified, this became a concern, since most of the units were recorded as vacant. Table 6 provides the type B rates for the Survey of Income and Program Participation and the Current Population Survey for 3 years.   Table 6. Type B rates for the Survey of Income and Program Participation and the Current Population Survey, 1985-87 (percent)   Survey 1985 1986 1987   SEPP 14.35 14.43 15.20   CPS 14.85 15.29 15.74     Two potential reasons for the increase in type B rates were investigated. First, the increase may have resulted from the misclassification of type A noninterviews (refusals from occupied housing units) as type B noninterviews by field representatives. Second, the vacancy rate among housing units may have truly increased from 1985 to 1987. If the first hypothesis were true, misclassification of eligible units could be an important source of survey undercoverage. If the second were true, undercoverage would not occur. The investigation revealed nothing to rule out a real increase in the vacancy rate among residential housing units as the cause for the increase in type B noninterview rates (King 1988).   In many surveys, screening for eligible respondents takes place at sampled housing units. Volunteer responses may occur if the field representative mistakenly includes families or   35           individual respondents who are not eligible. An error like this may occur when the field representative fails to screen at all or misinterprets the screening requirements as the result of inadequate g. A good example of this type of problem is described in appendix A.2, section M (see also Anderson, Schoenberg, and Haerer 1988).     2.1.3. Temporal errors   Time inherently complicates the complete and accurate representation of any population of interest on a frame. There are many phenomena which affect survey coverage that are magnified by the passage of time. For example, different farm operators are sometimes listed at different times for the same farm. This could simply result from an error in respondents' understanding of the farm operator concept from one frame development to the next, or it could actually reflect a change in farm ownership between the time of one frame creation/update and the next. Rules of association to account for inevitable changes in the population of interest over time are necessary.   In housing unit surveys conducted by the Bureau of the Census, there are problems associated with the use of a separate housing permit frame (see section 1.3.3 for a description of multiple frame sampling and appendix A.7). One problem concerns units built around the time of the decennial census. Those units deemed to be completed at the time the 1980 census (about April or May 1980) were included in the census frame. New construction units are selected from building permits. A unit for which a permit was issued in November 1979 may or may not have been completed at the time of the census. A special study was undertaken to determine the average time elapsed between permit issuance and construction completion for different categories of housing and by region. The results were used to determine the starting dates for sampling permits, i.e., the point in time when the number of housing units with no chance of selection was approximately equal to the number of housing units with two chances of selection. Thus, there would be little net bias arising from selecting new construction units authorized before the census, if characteristics of units with no chance and two chances of selection were the same (see Statt, et. al. 1981).   The length of time involved in establishing a frame, selecting a sample, and fielding a survey inherently affects the survey's coverage, since changes in the population of interest occur from the time of inclusion on the frame to the time of interview(s). Temporal error results when the frame or sample is not updated to represent the population of interest for the survey's reference period. Time-related coverage errors due to field representatives and respondents should be examined from both cross-sectional and longitudinal perspectives. Cross-sectional coverage error results from unaccounted for changes in the sample population from the time of frame establishment to the first interview. Longitudinal coverage error results from unaccounted for changes in the sample population from the first to subsequent interviews. For both situations, resulting coverage error will supplement the coverage error resulting from any inadequacies of the initial frame (for examples, see appendix A.2 and A.3).   The length of reference periods, or the periods for which information is collected, also affect coverage. These periods may be defined as some particular time before frame establishment, as the time of interview, or as any time between the two. Since changes in the population of interest occur continuously, there will be discrepancies in the population interviewed and the population of interest which are dependent on the reference period. Reference periods which differ from the time of frame establishment inherently cause coverage problems. Field representatives and respondents contribute to the discrepancies as a result of confusion in interpreting reference period rules, carelessness in applying reference rules, and memory loss. For example, suppose the frame for a housing unit survey is established in January and the households' reference persons interviewed in March are asked to provide household composition for January. The field representative may not convey the reference period correctly so that the   36           respondent provides March household composition, or the respondent may not accurately recall the household composition in January. In general, using relatively short reference periods should improve respondent recall.   Even though a frame may be representative of the population of interest at a point in time and the sample representative of the frame from which it is selected, it is imperative that the sample for recurring surveys be updated to remain representative of the true target population across time. Coverage errors can occur if the types of procedures discussed in section 1.2.2 on frame maintenance are not followed.   In longitudinal surveys, special rules of association are necessary to account for changes over time. For example, there may be changes in certain attributes of units, so that a sampled unit is a member of the target population in one or more reference periods, but not in other reference periods. Seasonal businesses or items, for example, may be eligible for the survey in reference period A but not in reference period B.   It is fundamental in the design of recurring and longitudinal surveys to anticipate and prepare for births, deaths, mergers, splits, movers, etc., so that a representative sample over time is maintained. Such changes may be accounted for by adjusting interviewing or by adjusting the sample through sample maintenance or weighting. Additional resources may be needed to consistently and correctly update the sample in some surveys. For example, in the 1979 Income Survey Development Program (predecessor of the Survey of Income and Program Participation), it was found that just to follow movers who settled within 50 miles of a sampled primary sampling unit cost 6.66 percent of the total time charged and 10.25 percent of the total mileage charged. In the current Survey of Income and Program Participation design, movers are followed if they move within 100 miles of a sampled PSU or can be contacted by phone (King, Petroni, and Singh 1987).   The National Agricultural Statistics Service employs a frozen domain approach (Bosecker 1984) for its multiple frame surveys. The status of each sampling unit is determined for a base survey reference point (June 1). List and area frames are, in effect, frozen as they existed on June I for all following surveys until the next June. All existing farmland has a known probability of selection in one or both frames at the time of the base survey. Very nearly all new farm operations are born out of existing operations, so the National Agricultural Statistics Service allows the inclusions of births after the reference date through the death of an existing sampled unit. Original selection probabilities for the old unit apply to the new unit   In many surveys, it is the responsibility of the respondent to notify the survey sponsor of changes. This is especially true in mail-out/mail-back surveys. As one example, a major coverage problem occurred in a Department of Energy survey of gas sold as a result of the time lag between sample selection and data collection. Some time after the sample was selected, some companies set up operating subsidiaries to sell their gas. Thus, when the parent companies reported zero gas volumes, undercoverage occurred. The coverage problem has now been rectified by appropriate procedural changes.   Respondents may not inform field representatives of changes either because concepts are not understood or respondent burden would increase to an undesirable level. Similarly, field representatives are charged with the responsibility of identifying, tracking, and updating the sample according to specified procedures if such changes occur. Field representatives may be unable to uphold this responsibility if survey concepts are not clear or training is inadequate. Also, field representatives themselves may opt not to update for a known change in status if a significant increase in workload would result.   37           The sample design also affects the ease with which status can be updated. For example, it should be easier for a field representative to locate a mover, obtain information about births, deaths, splits, and transfers if an area sample design has been adopted rather than a random-digit dialing system because neighbors or subsequent occupants of the sampled unit may be able to provide the field representative with information. A partial solution is to provide preprinted cards to respondents to be mailed to the central office upon change in status. This, however, assumes the cooperation and ability of the respondent to acquiesce in such cases (U.S. Office of Management and Budget 1986).     2.2. Listing errors   Surveys may or may not have the benefit of sampling from a frame containing the final sampling units. The two tables below give a selected representation of each situation in the Federal Government.   Two challenges to the designers of the surveys in table 7 are to ensure the unit selected is in fact the unit found and interviewed by the field representative, and the respondent reports for only the particular unit selected. Intensive reinterview procedures are usually required to detect errors in meeting these challenges (see section 2.1.3).   Surveys with a target population different from the first-stage field sampling units, like those shown in table 8, obviously face additional difficulties. The more stages involved in the sampling process to proceed from the initial field sampling units through intermediate listings to reach a final sampling unit, the greater the opportunity for coverage errors.   The following table shows for selected Federal surveys the types of units within sampling units which require field listing.   38   As can be seen by an examination of the table above, some surveys have several stages of field listing. Within area or land segments, listings may be made of residential dwelling units or land controlled by farm operators. Within establishments, lists of products, employees, or occupations may be created. Within addresses, lists of housing units, consumer units, or persons may be created. Whatever the type of unit being listed, the potential for listing error exists.   The types of error include failure to find units which should be listed, failure to classify a unit as being within the scope of the list, listing a unit which is not within the scope of the list, or listing a unit more than once. The causes for error include inaccurate source materials such as maps, incomplete or inaccurate instructions for listing, insufficient training, or failure to follow instructions.     2.2.1. Area segment listing errors   Studies measuring error. The few housing unit survey studies which have been conducted to measure the error in various area segment listing processes reveal that the level of error is relatively small. For example, over the last 10 years, the annual gross and net error rates in listings of area segments for the CPS, as estimated from the ongoing field representative quality control program, range from 1.58 percent (s.d. 0.06 percent) to 4.01 percent (s.d. 0.10 percent) and 1.70 percent (s.d. 0.10 percent) to -0.50 percent (s.d. 0.06 percent), respectively (Schreiner 1987). The median difference between the number of dwelling units listed in sampled area segments for the National Nielsen Television Index Survey and the 1980 decennial census housing unit counts was found to be -0.4, while the median difference without regard to sign was 4.9 (Hawkes 1985).   Even though the gross error level may not be very high, an examination of the frequency and types of error indicates there are areas where improvement is warranted. The following table shows the distribution of errors found in the Nielson study (Hawkes 1985), which was conducted in 1982, 27 months after the 1980 decennial census.   39           Table 10. Comparison of A. C. Nielsen 1982 field canvass of housing units with 1980 census housing unit counts by block group or enumeration district (National Nielsen Television Index Survey segments only)   Percent Differences Number of Survey Segments Total number of segments 2,001 +30 or more 74 +20 to +30 40 +10 to +20 95 +5.5 to +10 121 +2.5 to +5.5 157 +0.5 to +2.5 195 -0.5 to +0.5 181 -2.5 to -0.5 271 -5.5 to -2.5 258 -10 to -5.5 257 -20 to -10 198 -20 to -30 86 Less than -30 68   In another study comparing listed units for the CPI Housing Survey to the 1980 decennial census, Jacobs (1986) found that BLS field agents listed approximately 3 percent fewer residential dwelling units than in the 1980 decennial census. Part of the difference, an estimated 2 percent, was due to a difference in scope, i.e., the CPI Housing Survey does not include public, institutional, and military housing. In the CPI Housing Survey study, members of the Washington, DC office staff independently relisted several hundred segments. A line-by-line comparison of 192 segments, the first returned for processing and, therefore, a nonprobability sample heavily skewed toward smaller and more rural areas, indicated that although the total number of units did not vary substantially, the number of units determined to be in common between the two lists was less than 90 percent. The major problem was found to be unnumbered units, where the lack of a house number and variations in written descriptions precluded a match between listings. The next most prevalent problem occurred in multiunit structures where the exact number of units in a particular structure varied between the two listers. Boundary determination was also a problem, but much smaller in magnitude than those just mentioned. The primary causes of most of the errors were inadequate procedures, instructions, and training. Some procedures, such as a windshield screening of very large segments, proved to be very error prone. An emphasis on procedures and training in map reading, boundary determination, and listing for unnumbered units and units in multiunit structures was recommended for future listing processes.   Statistics Canada conducted a quality measurement study of its area segment listing process for its Labor Force Survey (LFS) (Joncas 1985). A dependent check of initial lists of 46,857 residential dwellings in over 10,000 segments, which excluded those selected from apartment or special area frames, resulted in an estimated average segment quality of 98.5 percent. That is, 1.5 percent of the units which should have been listed were determined to be missing from the original list. As another measure of quality, 72.8 percent of the segments were estimated to be errorless in the LFS study. The percent of errorless segments varied from 42.4 percent for segments located outside block-faced areas of large cities or rural areas to 80.2 percent for self-representing urban areas or small self-representing urban towns where the segments correspond roughly to blocks, combinations of blocks, or block faces. The distribution of error types found in this study was as follows:   40           Table I 1. Number of listing errors found in Labor Force Survey study (Statistics Canada)   Type of Error Number Total 1,044 Boundary misinterpretation 71 (not listed, inside boundary) Portion of road network missed 99 Dwelling thought to be excluded 156 Dwelling located in business structure 23 Missed or hidden dwelling 224 Boundary misinterpretation 298 (listed, outside boundary) Dwelling converted to business 28 Dwelling incorrectly included 92 Error not classified 53   An intensive coverage check of the Current Population Survey in 1966-67 found that most of the listing errors in area segments resulted from the incorrect determination of segment boundaries. "In some cases this was due to the interviewer's inability to read maps; in other cases, the map was worn and illegible or the boundaries no longer existed. Other important reasons for differences were failure to update a segment, failure to list units under construction, failure to canvass back roads and impassable roads, and segment was difficult to list." (U.S. Bureau of the Census 1968, p. 37.) The distribution of reasons for errors in the October 1966 portion of this study was as follows:   Table 12. Reasons units were added and deleted during reinterview, as determined by reconciliation--area segments only: October 1966   Type of Error Number   Total II0 Boundary incorrectly determined 38 (not listed, inside boundary) Bad map (not listed) 16 Concealed 4 Appearance was deceiving 4 Miscellaneous addition 10 Error not classified (addition) 16 Boundary incorrectly determined 11 (listed, outside boundary) Definition of housing unit 3 List not updated at proper time 4 Miscellaneous deletion 1 Error not classified (deletion) 3   In the few listing studies for economic surveys which have"been conducted, the level of error is somewhat higher. For example, the gross listing error rate for listable (retail, wholesale, and manufacturing) establishments within area segments is estimated at 4.0 percent annually (Konschnik 1987).   The NASS area frame for U.S. agriculture utilizes aerial photographs to delineate accurately land parcels using identifiable boundaries. Estimates of the number of farms are dependent upon p identifying every resident in sampled segments who meets the farm definition (sales or potential sales of $ 1,000 or more in agricultural products). Finding and listing these individuals in high density housing areas is especially difficult and expensive considering the rarity of farm   41           operators in residential neighborhoods. To reduce cost, NASS utilizes a procedure called the skip technique to screen residential segments (typically 0. I square mile in size). Some houses are accounted for indirectly by asking interviewed persons whether any of their neighbors is engaged in any agricultural activity.   Special studies were conducted in 1986 and 1987 to determine the error rate associated with the skip technique (Matthews 1988). A procedure was instituted whereby each randomly selected house in a subsample received an interview. Estimates for this procedure and the skip technique were then compared. Undercoverage for the number of farms using the skip technique was estimated at 5.4 percent in 1986 and 6.3 percent in 1987. (Commodity estimates are not dependent upon the estimate for number of farms and so are little affected by operators found living away from their agricultural land.)   Budget considerations have prevented using this subsampling procedure. However, a multiplicity sampling procedure based on land operated, excluding land for residential purposes, was under investigation by NASS in 1989 to make listings unnecessary in residential areas (Bosecker and Clark 1988). Farm estimates would be based on an estimator utilizing acreage devoted to agricultural purposes rather than where the operator lives.   In addition to estimating the overall level of undercoverage due to area segment listing errors, two studies have identified some specific biasing effects. Hawkes (1985) noted that the Nielsen listing process resulted in a lower proportion of one-person households than the decennial census. For the Retail Trade Survey, the ratios of net error in sales caused by field representatives listing sales of establishments that should not have been listed and vice versa to area sample sales estimates and total sales estimates are estimated as -5.0 percent and -0.3 percent, respectively, for the 12-month period ending October 1986.   In the CPS, a housing unit is defined as "a house, an apartment, a group of rooms, or a single room occupied as a separate living quarters, or if vacant, intended for occupancy as a separate living quarters." The CPS makes extensive use of address listings that were compiled for enumeration districts (ED's) during the previous decennial census. Some other ED's are canvassed just prior to the initial CPS interviews. In either case, the field representative must apply the housing unit definition to list the units eligible for sampling. This seems reasonably straightforward for single- family houses, townhouses, and apartments. However, embedded household units, i.e., separate households within a single structure, such as a basement apartment, are a particularly troublesome source of undercoverage error. Field representatives are also expected to determine when tents, railroad cars, lofts, and other unconventional units qualify as housing units.   Discussions with field representatives suggested to Hainer (1987) that field representatives may have incentives for not listing some units. For example, suppose a field representative discovers a basement apartment occupied by a single male. If the field representative lists this as an extra unit, obtaining an interview may be difficult if the occupant is not often home. In most survey organizations, if the field representative ends up with a noninterview for the unit, it would count as a demerit If the field representative simply does not list the discovered unit, undercoverage will occur and probably no one will ever know.   An alternative to area listing. As an alternative to listing area segments in rural areas and other areas without numbered addresses, the Survey of Income and Education selected individual housing units from the 1970 census address registers, and the field representatives located these units on the basis of whatever information was available (name of 1970 household head, box number, map spotting). A special coverage evaluation in these rural areas and small towns using a successor check of four structures (Marks and Nisselson 1977) estimated 7 to 13 missed housing units in missed structures per 100 enumerated housing units, significantly higher than   42           the estimated 4.6 missed units per 100 enumerated units in missed structures in rural areas estimated in the 1970 census. The apparent poor coverage in rural areas may have been the result of structures considered to be nonresidential or unfit for human habitation in the 1970 census being occupied when the Survey of Income and Education was conducted.   A similar methodology has been used in the 1980's for the American Housing Survey, supplemented by some area sampling to pick up the added units that were believed to be a problem in the Survey of Income and Education. The unable-to-locate rate for the 1985 survey was in the 2.1 - 2.2 percent range for the Northeast and West regions outside metropolitan areas, but was much lower outside metropolitan areas in the Midwest and South regions and within metropolitan areas in all regions (Schwanz 1988a).     2.2.2. Household listing errors   Of greater concern are listing errors caused by respondent reporting error, especially for people (or subunits) within interviewed housing units. In this report, the term respondent reporting error is used to refer to all errors which occur during the interview process, whether they are caused by the field representative, the respondent, vague concepts, faulty instructions, imprecise questions, or combined effects of several of these. Respondent-related reporting errors are believed to significantly reduce coverage for most household surveys.   Among housing unit surveys, the issue of the relative importance of within-unit misses to whole unit misses has been addressed for the National Crime Survey. The National Crime Survey has about 10 percent lower coverage of persons 12 and over, and about 6 percent lower coverage of housing units than the 1980 census. Alexander (1986) argued that whole-household undercoverage is probably less than 6 percent, since smaller housing units tend to be missed more often than large units. This means that more than 4 percent of the undercoverage occurs within units and results from household listing errors. This is one of the largest single sources of coverage error identified in this report.   Within-unit error is probably more serious for blacks and Hispanics. Hainer, et al. (1988) point out that in the CPS, black female undercoverage is close to the overall undercoverage of 7 percent, but black male undercoverage is about 20 percent, suggesting that most of this undercoverage occurs within unit as a result of household listing errors.   Hainer, et al. (1988) discuss at length the ethnographic research that has been done on household survey coverage. They suggest there are two main causes of respondent reporting error resulting in missed persons, although there have not been any good quantitative studies to show that these are major causes of coverage error.   - Some people, especially black and Hispanic males, are deliberately omitted because of potential loss of household income if their presence in the household were known to authorities. - There is a lack of correspondence between survey definitions of household residency and how people actually live.   Motivational causes. Hypotheses about deliberate omissions of certain people were first inferred from 1950 census postenumeration Survey data. As part of the Bureau of the Census's research into this, Valentine and Valentine (1971) conducted a study in a predominantly black inner city community. The study consisted of matching the household rosters for 25 units that the Valentines knew well to the reports given to Bureau of the Census field representatives. The Valentines concluded that the field representatives missed 61 percent of the males over 19 years of age, and that "... practically all the significant inaccurate information came from adult females who had some reason for neglecting to mention productive men residing in their domiciles"   43           (Valentine and Valentine 1971). In all 15 households where men were not reported, there was significant welfare income. Many of the unreported men were also engaged in some form of illegal economic activity, e.g., the stolen goods market. The Valentines concluded that concern about losing significant income was the reason for nonreporting.   Studies by Harwood (1970) and Hainer (1987) have come to similar conclusions. Hainer, et al. (1988) summarized: "Hainer (1987) conducted intensive interviews with his long-term informants, focusing on ways to increase the perception of confidentiality.... Hainer's informants were unanimous in their view that virtually any question that was linked to anyone's name was too personal and threatening.... Informants assumed that any information given to one source is shared by all others."   Lack of correspondence between survey designer's and respondent's residency concepts. Hainer, et al. (1988) discuss the effect on coverage of differences between respondents and survey organizations in their cultural assumptions about household structure. In some cases, there are fairly simple misunderstandings, which might be corrected by better question wording, more careful definitions of concepts, and additional probing by field representatives. The Bureau of the Census has conducted some experimentation in this area (Shapiro 1986). However, more coverage error seems to be caused by fundamental differences in language or behavior rather than simple misunderstandings.   Survey organizations generally attempt to assign people a usual residence based on where they live and sleep most of the time. There are some people for whom applying this definition is problematic (Hainer, et al. 1988). Hainer (1987) and other authors have suggested that many black families can be seen as "... a group of people who share membership in a domestic group, and is an exchange network. Residence is not coterminous with "address" People may live in the same apartment or house, or they may instead live close by (close enough for daily interaction). Family members share clothes, store them in each of the various apartment"addresses' shared by the family, and generally eat at the address of the family household head, usually an older black woman" (Hainer, et al. 1988).   Hainer (1987) observed a pattern in his inner city study population in which males tend to leave their families' household when 15 to 17 years old. "They reappear later as husbands or other contributors to household/families, but not in a capacity that allows their presence to be formally acknowledged" (Hainer, et al. 1988).   Hainer (1987) has also noted that family members sometimes disagree among themselves about the composition of the family. "Internal household membership is a matter of sponsorship and role performance.... Members who meet the two criteria from the standpoint of the household head do not always meet with the approval of others in the family" (Hainer, et al. 1988).   Effect of household listing error. As discussed above, there is considerable evidence of deliberate omission of males due to fear of income loss and due to the lack of correspondence between researchers' definitions and peoples' actual residence conditions. However, there is little quantitative data on the importance of these causes. Shapiro (1979) and Pennie (1990) compared March 1980 CPS and official 1980 decennial census tabulations to determine household relationships of persons missing in the CPS but interviewed in the census. Although the Valentines found that most of the missed males were heads of households, which would suggest that the CPS should have a higher proportion of female-headed households than the census due to deliberate omissions of males, Pennie (1990) found only some evidence for this.   As part of the 1980 census coverage evaluation program, a sample of April 1980 CPS households were matched to the corresponding 1980 census households to determine if persons found in CPS were enumerated in the census. Fay (1989) looked at the reverse match to   44           determine if persons enumerated in the census could be found in the same household in CPS. He found that male heads are missed much less frequently than males in the three other major relationship categories, as shown in table 13.   Table 13. Estimates of percent net CPS within-household undercoverage relative to the 1980 census for males aged 25 and over by their household status (standard errors in parentheses)   Head/spouse 0.8 (.1) Child 12.1 (1.2) Other relative 17.8 (1.5) Nonrelative 22.7 (1.8)   One possible explanation for Fay's results is that male household heads who may be deliberately omitted tend to be missing in both the decennial census and housing unit surveys conducted by the Bureau of the Census. In contrast, men in other relationship categories may be missed more often in surveys than in the census because of the lack of correspondence between the survey designers' and respondents' residency concepts.   Fay (1989) estimates CPS undercoverage by age, race/ethnicity, tenure, and metropolitan-area status, as well as by household relationship status. He also fits several logistic regression equations which indicate coverage does not vary much by race/ethnicity when these other variables are taken into consideration.   There have been few studies to determine the effects of coverage problems on survey data. Thus, much of the following discussion on the effects of respondent reporting error is tentative and speculative. The discussion is taken from Hainer, et al. (1988) and Shapiro and Kostanich (1988).   One obvious potential effect is for household composition data for black and other minority groups to be biased. Shapiro and Kostanich (1988) report: "in comparison with Census Bureau interviewers, Valentine and Valentine (1971) concluded that 12% of the sampled households were female-headed vs. a Census Bureau estimate of 72%." This is inconsistent with the results of Fay (1989), which indicate that male heads of households who are reported in the census are also likely to be reported in CPS. Although the Valentines' study is small and probably an extreme case, it does suggest that surveys may be substantially overstating the number of female-headed households, especially for blacks.   Shapiro and Kostanich (1988) give adjusted estimates of black males aged 15 and over by family status. They make direct use of undercoverage rates determined by Fay (1989) from the CPS census match discussed previously. The authors regard their estimates as speculative because Fay's results are limited to persons enumerated in the census and have other procedural limitations (see Fay 1989 for details). Biases for some categories are quite large. For example, the percentage of black male householders is estimated as 37.4 percent from the March 1985 CPS, but drops to 31.7 percent when adjusted for the effects of undercoverage.   Both deliberate omissions and misses of those with no clear usual residence probably result in significant bias for many . estimates. Hainer, et al. (1988) state that deliberate omissions are probably extremely biasing because the reasons they are missed are so directly related to important personal and household characteristics.... For instance, Clogg, Massagli, and Eliason (1986) discuss the implausible finding from the CPS that school enrollment rates are higher for blacks than for whites, for almost every age-residence category. They speculate that this occurs because of differential undercoverage of black youth, with those attending school more likely to be counted than those who have dropped out."   45           As evidence of the effects of people missed because they have no clear usual residence, Hainer, et al. (1988) give this example: "... Cook (1985) presents evidence suggesting that the National Crime Survey may underestimate the number of gun assaults by as much as one-third. He offers the explanation that the National Crime Survey does not adequately cover the kinds of people criminologists believe are most likely to be involved in the life of the streets (including participation in criminal activity ...) (Cook 1985; see also Martin 1981). "   To examine the effects of coverage error on CPS dam, Hirschberg, Yuskavage, and Scheuren (1977) used a different estimation method than that normally used in the survey and compared results. Their method differed from the standard method in two ways. First, it adjusted for undercoverage in the 1970 decennial census rather than controlling to census levels unadjusted for census undercount, and second, in the March supplement to the CPS, special procedures were used to assure equality between a husband's and wife's weights in addition to controlling to age-sex-race figures. The Hirschberg, et al. method was intended as an improvement over those procedures. The Hirschberg, et al. comparisons yielded substantial effects for aggregates, as would be expected, since data were controlled to larger population figures. The effects on percentages and rates were much smaller, but in some cases significant For example, the unemployment rate increased from 4.5 percent to 5.0 percent and the poverty rate increased from 10.6 percent to 10.9 percent. Since Hirschberg, et al. were attempting complex methodology, and since there can be no assurance that their methods were correct or best, their results are difficult to interpret.   Johnston and Wetzel (1969) conducted an earlier CPS study in which they made estimates of the effect of not adjusting for the 1960 census undercount on the regular monthly estimation in the CPS. Though they found substantial effects on aggregates, they found effects on the unemployment rate to be only 0. I percent.   Methods for reducing household listing error. There are several means by which the number of people missed because of deliberate nonreporting or misinterpretation of concepts could be reduced. Experimentation is required to determine which of these are effective. The ideas discussed here are taken from Hainer, et al. (1988).   One approach is to change the survey household residency concept or, at least, to ask about it differently. Defining residency according to where a person is currently living or sleeping instead of where they usually live is simpler and probably less error prone for respondents. If it is undesirable or infeasible to abandon the usual residency concept, coverage might still be improved by not asking about it directly. For example, a survey might ask who slept in the unit last night, who usually eats in the unit, or even who spent any time at all in the unit yesterday. Of course, follow-up questions would be needed for such broad questions, as these clearly would otherwise result in overcoverage.   A second approach is to convince respondents that reporting all household residents will not adversely affect them, a quite difficult task for some population subgroups. A prerequisite for this is assuring the confidentiality of survey data and, thus,,protection of the privacy of respondents. It is important to truly convince field representatives that confidentiality assurances are taken seriously, since respondents are not likely to be convinced by confidentiality assurances if field representatives themselves are at all doubtful about them. Improved training of field representatives about the subject of confidentiality is needed.   More discussion of data confidentiality and measures to protect privacy at the beginning of an interview might help allay respondents' fears. Not asking for full household rosters at the beginning of an interview may also help, especially when the questionnaire content is not threatening. Keeping a survey anonymous by not asking for names or only asking for first   46           names may help. Also, in face-to-face interviewing, using local, indigenous field representatives may be better, especially in low- income areas or in areas where different languages are spoken. Finally, for large-scale, frequently conducted surveys, community outreach to publicize the survey and improve public relations, as is done for the decennial census, may help.     2.2.3. Nonhousehold listing errors   In most nonhousehold surveys, the frame sampling unit is equivalent to the final sampling unit (see table 7). For example, in most business surveys, the establishment is both the sampling unit and the reporting unit. Occasionally, the sampling or reporting unit is the company. Coverage error is most likely to occur ff a multiunit company is requested to provide data for each establishment that belongs to the population of interest. Failure to include an establishment results in undercoverage. Erroneously including an establishment results in overcoverage. However, there have been few formal studies of listing errors which result from respondent reporting errors in nonhousehold surveys.   Some coverage error resulting from respondent reporting error was found in evaluations of the 1977 Economic Censuses. Reconciliations of same-unit information were made across the Census of Retail Trade, Wholesale Trade, and Service Industries; the Current Business Surveys; and the Bureau of the Census annual business surveys. Sales (receipts) of the larger individual companies were compared among the three data sources. Millions of dollars in differences were found for many companies. About 300 of the companies with the largest differences were investigated in detail. "It was discovered that many of the firms reporting large differences were not covering the same establishments in the census and current surveys, or were not reporting as instructed. Reporting differences also resulted from different people completing the questionnaires, from dissimilar instructions, and from timing differences" (Bernhardt and Helfand 1980). It was also found that for some firms with franchised operations as well as company-owned stores, where data were requested only for the company-owned stores, duplicate sales were reported for franchised stores by the company and the franchiser. Many of the reporting differences, however, were not a result of coverage error. Reconciliation was also done between the 1977 Census of Manufactures and the Current Industrial Reports, but no coverage problems were discovered (Bernhardt and Helfand 1980).     2.3. Other nonsampling errors   This section includes a discussion of some other nonsampling errors which can lead to coverage problems in a survey. The first of these, recording errors, occurs when information correctly known to the survey organization is incorrectly noted. This can lead to noncoverage if the error causes incorrect exclusion or inclusion of a unit from the sample or frame. The second, responses from nonsampled units, occurs when unsampled units are substituted for sampled units in the field or when unsampled volunteers respond. In either case, the potential for biased results exists. The final part of this section presents a short discussion of unit nonresponse.     2.3.1. Recording errors   A common error encountered in survey operations is recording error, that is, error arising when information correctly known is incorrectly recorded. Recording errors can be made during the execution of many different survey operations. Fortunately, the future extension of electronic data capture to many of these operations will minimize or eliminate associated recording errors. In the more typical setting, however, respondents, field representatives, screeners, data keyers, and survey analysts can all contribute to this error at various points in the survey process. Of concern here are recording errors that cause a survey unit to be dropped from or added to the sample unintentionally. Recording errors most adversely affect surveys from which small- area data are published, since the incidence of one inadvertent deletion or addition can more readily   47           affect these data. However, while no quantification of coverage recording errors has been found, it is the consensus among program managers that such errors are so rare as to be inconsequential. One reason for this is the fact that most survey programs make it a practice to review all survey deletions and verify that they should be dropped from the survey. If recording errors were the cause of a unit being dropped, the error would likely be discovered in this review. Recording errors are such random events that no systematic program check, except for this type of review, is likely to detect or prevent their occurrence.   There are conceptually three major points in the survey operation where recording errors, are most likely to occur: At the time of the interview; at the time of screening, directory operations, and initial data entry; and during the review period when corrections to data records are made and files are updated. Field representative recording error is less a problem for the economic surveys considered in this report, since they are primarily mail- out/mail-back surveys with little or no field representative intervention. However, area-sample cases, delinquent list-sample cases, and certain failed-edit cases are handled by field representatives for the Current Business Surveys. Field representatives, of course, are vital in the execution of demographic surveys.   The probable reason that most recording errors do not result in coverage errors is that most such errors are errors of content rather than of omission or faulty inclusion. Thus, the fact that a field representative incorrectly records the information received about the occupants of a household does not lead to the household or its occupants being omitted, although it may lead to classification errors in subdomain tables. On the other hand, some surveys do employ a size criterion that a unit must satisfy in order to be included in the survey, so that a recording error based on this criterion can cause a unit to be deleted incorrectly. For example, the Pollution Abatement Survey (MA-200) conducted by the Bureau of the Census does not sample plants whose total employment is less than 20. If the plant is erroneously classified as out of scope for the survey due to a recording error, then this clearly is an error leading to undercoverage.   The type of recording error more likely to result in coverage error is the incorrect assignment of codes which directly determine whether a sampled unit does or does not belong to the survey's target population. Most surveys employ various coding schemes which denote the location, current status, classification, etc., of the survey units. For example, in the manufacturing surveys of the Bureau of the Census, coverage control codes are assigned in directory operations to indicate any changes in the operations of plants being surveyed. Many of these codes can result in a plant being deleted from the survey, so if a recording error results in the assignment of one of these codes, undercoverage results. Similarly, the SIC classification is most critical in determining the survey status of plants. If this code is incorrectly recorded, a unit may be dropped from the survey as out of scope. Such misrecordings are a possible, but very unusual, source of error in the Current Business Surveys, and in the industrial surveys of the Bureau of the Census. In most circumstances, the SIC code is incorrect because some procedure was followed incorrectly, or secondary information used to determine the code was incorrect, not because the correct code was misrecorded. Determining the right code but entering it incorrectly is still in the domain of rare events. To affect coverage, the recording error must be one that results in the unit being dropped from the sample.   Keying errors, even of codes, have little potential for causing coverage loss because most keying operations are subject to some form of verification either on a sample or 100-percent basis. Computer edit checks, as employed by the Quarterly Agricultural Surveys for example, reduce even further the possibility of these errors going undetected. Nor are corrections entered directly to a data set via an interactive terminal considered a likely source for error, since the vast majority of such corrections are made to characteristic data only. However, most of these interactive corrections are not independently verified, since for the most part they are made by analysts not prone to following a preset and independent verification regime.   48           2.3.2. Responses from nonsampled units   The probability of receiving a response for elements not in the sample should be zero, and in most surveys, responses for units. not in the sample do not occur. But, administrative and clerical oversight can lead to unintentional additions. If a sample of housing units, establishments, or firms is drawn and then subsequently subsampled or refined by cutting selected elements, all records of the sample must reflect those changes. If through oversight, some of the data records contain the dropped sampled elements, normal conduct and processing of the survey will result in overcoverage. This happened in the 1984 Survey of Income and Program Participation. For budgetary reasons, a sample cut occurred at the fifth of eight interviews. Approximately 80 nonsampled units were interviewed because the dropped units had not been removed from the lists sent to field offices.   Although most surveys seek to avoid responses from volunteer units, some do not. In the Energy Information Administration's Form EIA- 23 data system, a survey of oil and gas well operators, a volunteer is defined as a respondent whose assigned identification code was not on the selection list at the time the sample was drawn. Reporting requirements vary by size of operation, with the largest being required to file. Operating affiliates of a corporation are considered as individual operators and have selection and reporting requirements based on their size. Throughout the course of the survey, the forms and responses of volunteer respondents are monitored differently by the respondent tracking system than are those of the initially drawn sample. Some of these respondents clearly should not be in the survey. Others, however, should be in the survey but were not selected because the frame can never cover completely the rapidly changing population of interest.   The types of volunteer companies not incorporated in this survey include:   - Companies which filed a Form EIA-23 for the previous report year, but have not been selected in the current year and mistakenly think they have to file for the current report year, and - Companies which complete a form forwarded by sampled companies because they sold or transferred operations to the nonsampled companies.   Some of the volunteers included are:   - A company which has a large operating affiliate and has not received a form, but is required to file, and - A large operator which has not received a form but realizes through the Federal Register, its trade association, or some other means, that it should report.     The EIA's Form EIA-23 dam system recognizes the possibility of volunteers at the outset and uses the respondent tracking system to help identify them. A respondent tracking system uniquely identifies each element sampled from the frame and follows the records for that respondent from the mailing of the survey form to clean entry on the final data set. At several points during the processing cycle, a list of identification codes of actual respondents is compared with those of companies on the original sample selection list, together with all previously identified and analyzed volunteers. Any new identification codes on the former list represent entities which am then added to the list of outstanding volunteers.   The weights on the randomly selected component of the sample from the EIA-23 data system are not adjusted. The appearance of new, previously unknown, entities triggers a series of actions which ultimately lead to an improvement in the frame for subsequent survey cycles.   49           If a sample were drawn from a perfect, current frame, then the theoretical effect of unidentified volunteer respondents would be to bias the survey's results upward. However, since sampling frames are not perfect, the effect of volunteers on coverage is not clear cut. Sometimes, when volunteers respond to surveys which have sampled from imperfect lists, this may lead to improved net coverage.   In other cases, the effect of erroneously including unsampled or unqualified respondents is to decrease slightly the efficiency of the survey and correspondingly increase respondent burden. However, if identified during survey operations, these respondents and the information they provide can be ignored in subsequent analysis. For instance, the 80 respondents mistakenly interviewed in the Survey of Income and Program Participation were identified and subsequently excluded from analysis. Some survey designs even deliberately provide for the potential inclusion of out-of-scope respondents when failure to do so would result in unacceptable undercoverage. One example of this is the Survey of Neurological Disorders. At the cost of some inefficiency, rules for screening in the housing unit portion of the survey permitted the inclusion of out-of-scope cases, which were later identified in subsequent medical evaluation and discarded from further analysis. This design permitted erroneous inclusions in order to avoid missing true cases, which would have occurred using tighter screening criteria.     2.3.3. Coverage errors resulting from nonresponse   The definition of coverage error being used in this report is very broad and includes any source of error or mistake (other than sampling error) which contributes to the under- or overcoverage of the target population. Under this definition, the failure to elicit a response for a sampled target population unit or unit nonresponse can be viewed as another source of coverage error. However, in order to limit the size of this report, the following discussion of coverage errors as a result of unit nonresponse is relatively brief. For a more complete description of unit nonresponse as it contributes to incomplete data, methods to improve survey response, and procedures for adjusting for unit nonresponse, the reader is referred to the 3-volume report of the Panel on Incomplete Data in Sample Surveys (Madow, Nisselson, and Olkin 1983).   The Panel on Incomplete Data made a distinction between undercoverage and unit nonresponse as types of incompleteness using the following definitions.   "Undercoverage occurs if units that should be on the frames or lists from which a sample is selected are not on the lists, if units in the frame or sample are incorrectly classified as ineligible for the survey, or if units are omitted from the sample or skipped by the interviewer" (Madow, et al. 1983, Vol. 1, p. 16).   "Unit nonresponse occurs if a unit is selected for the sample and is eligible for the survey, but no response is obtained for the unit or the obtained response is unusable. There are four primary reasons for unit nonresponse in housing unit surveys:   (1) No one is at the unit when the efforts are made to interview. (2) The interviewer cannot communicate with the persons in the unit, e.g., because of illness or a language problem. (3) Total refusal occurs or the interview is broken off by the respondent and the partial response prior to breakoff is classified a refusal. (4) The responses given by the unit are later classified as unusable" (Madow, et al. 1983, Vol. 1, p. 18).   There are a number of situations in which the distinction between undercoverage and nonresponse error, as defined by the Panel on Incomplete Data, is not clear. For example, a field representative conducts a set of interviews and mails the schedules to a central processing   50           facility, but they never arrive or arrive after the processing deadline. Or, a mail-out/mail-back questionnaire has been stripped of the identifying information required for using the data in estimation. What if, as occurred in the 1980 decennial census, all schedules for a particular area are destroyed by fire? In each of these cases, the sampled units could be classified as nonresponding under (4) above. On the other hand, the loss of sampled cases for a significant proportion of a subdomain of interest results in significant undercoverage for the area, and compensation with nonresponse adjustment methods may not be very satisfactory.   In housing unit surveys where the sampling frame is believed to cover the target population adequately, the incomplete recording of persons within households as discussed in section 2.2.2 above can also be considered a type of item nonresponse, where the item is the list of au persons in the household (Bailar 1984).   Differential unit nonresponse, especially where the probability of response is correlated with the variable of interest, is particularly likely to lead to biases. Consider, for example, the Consumer Expenditure Survey. If all sampled units with income greater than $100,000 were not to respond, the effect on estimates of annual income and expenditures would be no different than if all units with income greater than $100,000 were not represented in the sampling frame. Investigations into the response/nonresponse mechanism need to be made to determine if nonresponse adjustment methods are adequately compensating for the differential nonresponse or if, as a result of differential nonresponse, estimates of selected subdomains should not be made.   When general-purpose or national surveys are used to perform specialized analyses of demographic or geographic subpopulations, it is especially important to identify and control for the effects of differential nonresponse. For example, consider a survey designed to estimate and conduct analysis of the population in poverty. If nonresponse for this population is higher than for the general population, under-representation of the population in poverty will result. The data may also be less useful analytically than anticipated if the characteristics of the respondents in poverty differ from the characteristics of the nonrespondents in poverty. Thus, survey planners should carefully plan and analyze results from surveys of specialized populations, since unit coverage and unit nonresponse can be very specialized as well.   Sample attrition in longitudinal surveys can seriously affect coverage when the characteristics of the sampled units which drop out or refuse to continue to participate in the survey differ from those of the sampled units which continue to participate. Attrition is most consequential when the reasons for continued or noncontinued participation are correlated with the objectives of the survey. For example, there is evidence that sample attrition may be related to victimization status in the National Crime Survey (Biderman and Cantor 1984). To the extent victims drop out of the survey at a faster rate than nonvictims, the estimates of victimization from later interviews will be biased (U.S. Office of Management and Budget 1986).   In the Survey of Income and Program Participation, McArthur and Short (1986) found that the characteristics of the sampled units which continued to participate in the survey differed from those which did not. Out of roughly 25 characteristics examined, the following characteristics of all first-interview households were significantly different between those households which continued to participate through the fifth interview and those households which did not: Household monthly income, employment status, marital status, race, age, interview status, tenure, residence, relationship to household reference person, and region. Certainly survey estimates of two crucial variables, household income data and labor force status, will be affected by coverage loss due to attrition. Other characteristics being measured by the survey are probably affected as well.   51           in surveys conducted by mail, especially surveys of establishments, nonresponse may lead to overcoverage, since nonrespondents are usually assumed to be eligible sampled units from which a response should have been received. Because of this assumption, some form of imputation is performed, either explicitly or implicitly through weighting adjustments. If, in fact, nonrespondents are ineligible (e.g., establishments which are out of business), imputation results in overestimates in the same way that overcoverage does. A more detailed discussion of overcoverage as a result of imputation is given with respect to the Annual Survey of Manufactures in appendix A.1, section U, Deaths. The Annual Survey of Manufactures is generally, but not always, able to identify deaths at some point in time, but this may occur after current estimates can be corrected.   Another cause for unit nonresponse has been noted earlier in this report in the discussion of temporal errors (section 2.1.3). Coverage is influenced by the reference period(s) for which information is collected. In this ever-changing world, when the collection and reference periods are not identical, changes that occur during the periods from frame construction to sample selection or from sample selection to sample data collection can all influence coverage.   Nonresponse due to cross-sectional temporal coverage error occurs in both cross-sectional and longitudinal surveys. If a survey has established rules and procedures to maximize coverage, then field representatives and respondents need to interpret and implement those rules correctly. When respondents or field representatives are careless or unable to apply survey rules, coverage error occurs. For example, field representatives should locate the correct sampling unit. The unit might be a family that moved after the sample was selected. If survey rules specify that the family be contacted at the new address but no forwarding address or phone number is available, the field representative is unable to apply survey rules and undercoverage results.   An inverse situation may also occur. For instance, the composition of the family may have changed between the time of sample selection and the first interview of the sampled family unit. The field representative or respondent might purposely or inadvertently identify the existing family as the desired sampled unit. Undercoverage or overcoverage of the desired population could result.   Finally, field representatives in the Survey of Income and Program Participation attempt to trace movers by making inquiries of the persons now living at the sampled address and of mail carriers, rental agents, real estate companies, post offices, and through telephone directories. As a result of these procedures, Jean and McArthur (1984) found that about 80 percent of movers between the first and second interviews of the 1984 Survey of Income and Program Participation panel were awed. To improve upon this rate, the Bureau of the Census initiated a procedure whereby respondents who moved were asked to return a card with their new address. Unfortunately, even with this, there were no improvements in the response rates of movers because of respondent apathy (Kalton and Lepkowski 1985).   52           CONCLUSION   The purpose of this report is to provide information to the Federal and nonfederal sectors about the existence and effects of coverage errors in surveys and to provide guidance on how to assess and improve coverage. Few studies were found by the authors, however, which actually measure coverage errors in surveys and even fewer which address the effect of coverage error on survey estimates. Of course, this report does not cover all Federal surveys and only minimal references to nonfederal surveys are provided. Even so, there appears to be an overwhelming dearth of studies devoted to measuring coverage errors and their effects. It is possible that more studies have been conducted than were found, but ff so, they are certainly not readily available. This is almost as serious as not even researching coverage error, since the information in either case is not available for use in future survey work. The hope is that by increasing awareness, through this report and other media, of the potential for coverage errors in all aspects of survey design and implementation, methodologists will accept the responsibility of routinely employing and documenting more studies to identify and assess survey coverage error.   It is strongly recommended that survey researchers include methods to assess coverage in their research designs as a matter of course. In addition, they should routinely provide a discussion of populations covered and excluded, coverage error, and the measurable effects of this error on estimates when publishing survey findings. When it is not possible to estimate the effect of coverage error, even a qualitative statement about this effect should be made. It should no longer be acceptable to omit mention of coverage and coverage error in data products resulting from Federal surveys.   As to the seriousness of coverage error, the largest single source of coverage error for a housing unit survey cited in this report is an estimated 4-percent undercoverage in the National Crime Survey estimates of persons aged 12 and over due to within-housing-unit listing errors. On the economic side, the largest source of coverage error due to a single source cited is a 20-percent underestimate in the 1977 Economic Censuses statistic of receipts for nonemployer establishments due to misclassification.   These errors come from single sources. When combined across all sources, errors can become quite serious, even when each individual source is minor. Since we already know that single sources themselves can be significant, the overall effect of all sources of coverage error on survey products is of great concern. The best way to address this is to include studies to evaluate and improve coverage routinely into survey designs.   Because of the diversity of errors which lead to noncoverage, there are many methods for controlling or minimizing coverage error. The methods that apply to most surveys and which can lead to significant improvements in data quality are the use of multiple frames to improve coverage at the sampling stage and weighting adjustments to reduce the bias induced by coverage error at the estimation stage. For rare target populations, specialized sampling techniques should be considered.   It should be noted that the report is selective of the surveys discussed and the bibliography prepared. Any methodologist planning a new survey or evaluating coverage in a current survey should delve deeply into survey requirements and limitations. The information and references provided in the report should be a good starting point for many ideas, but should not be used as the only guides.   The importance of defining objectives clearly and planning accurately cannot be overemphasized. As discussed in chapter 1, survey methodologists need to make concerted efforts to thoroughly think through the objectives and issues of a proposed survey at the onset. Precise language and correct translation from the ideal survey to an implementable plan are vital;   53           otherwise, coverage is in jeopardy. Significant resources should be allocated in the conceptual and planning stages of a survey to minimize problems and misunderstandings that lead to coverage error. Survey designs should provide for evaluations and procedures to minimize coverage error wherever possible. If the planning is indeed thorough, survey products will be much more accurate and useful, which is the ultimate goal of any survey.   54           APPENDIX A. CASE STUDIES   Introduction   Each of the following seven case studies illustrates how the selected survey carried out by the Federal Government deals with one or more of the survey coverage problems discussed in this report Virtually all the case studies include a discussion of methods and procedures used in the routine updating of the survey frame, a prime line of defense against coverage errors in a survey. Each highlights the application of one or more coverage assessment or improvement methods discussed in this report.     The first case study describes coverage in the Bureau of the Census Annual Survey of Manufactures, emphasizing measures taken in the sample design to minimize coverage error, match-merging of several lists during the update of the frame, and classification errors arising during the survey.   The Long-term Care Survey, carried out on behalf of the Department of Health and Human Services, is used to illustrate coverage concerns arising from the use of an administrative data base as the sampling frame. This survey covers the noninstitutional population aged 65 and over which has functional limitations that impede normal daily activities. This second case study focuses upon the problems arising from classification errors, nonhousehold listing errors, and recording errors and provides a discussion of frame maintenance procedures.   The third case study emphasizes the methods used for developing and evaluating the National Master Facility Inventory, a comprehensive list of inpatient health facilities in the United States. The discussion covers the processes used to match-merge and unduplicate several source lists when updating and maintaining the inventory, as well as means of identifying and analyzing coverage problems.   The Bureau of Labor Statistics Producer Price Index is the subject of the fourth case study. It emphasizes the criteria used and decisions involved in selecting a source data series for the frame, frame maintenance, classification errors, the effects of time and the temporal errors on the frame introduced in the survey, and the analysis and identification of coverage problems.   In the fifth case study, the Quarterly Agricultural Surveys carried out by the National Agricultural Statistics Service of the Department of Agriculture are discussed. These surveys, providing crop and livestock estimates at the State and national level, use a dual-frame design. Some of the coverage problems of frame construction and sample design associated with a dual-frame study are discussed.     The sixth case study presents a discussion of coverage problems encountered by the Department of Energy's Energy Information Administration in monitoring monthly deliveries of natural gas to industrial end users. This case study differs from the others in that the problem upon which it focuses originated from conceptual or relevance error rather than from the more common sources. The case study discusses the identification and analysis of coverage problems, sample design strategies used to minimize the coverage problems, and evaluation methods.   In the last case study, the coverage control and improvement procedures of the Current Population Survey are presented to exemplify those used when the sampling frame is the latest decennial census. Despite the high-coverage properties of the frame for the few months of the census-taking period, great care must be taken to minimize coverage error over the ensuing 10 years when numerous samples are selected from the census address, lists.   55           APPENDIX A.1. ANNUAL SURVEY OF MANUFACTURERS (ASM)   I. Introduction   At 5-year intervals (years ending in 2 or 7), an enumeration of the entire manufacturing establishment population of the United States is undertaken through the Census of Manufactures. The census collects a wide variety of information ranging from very general statistics common to all industries (employment, salaries and wages, costs of materials, inventories, etc.) to detailed statistics on industry-specific products produced and materials consumed. These data are published at various levels of industrial and geographic detail.   Except for the very small single-unit companies, the census is a mail-out/mail-back survey. Data for the small companies are imputed using payroll, shipments, and employment data provided by the administrative records of the Internal Revenue Service.   In intercensus years, a sample survey--the Annual Survey of Manufactures--is conducted. The Annual Survey of Manufactures collects virtually the same type of data on an annual basis as the census, but, in some cases, the data requested are less detailed. For example, in the annual survey, detailed census product codes (7-digit codes) are combined into product class codes (5 digit codes), and dam are collected for this level In general, however, the annual survey can be regarded as a mini-census. (See U.S. Bureau of the Census (1971) for a complete description of the Annual Survey of Manufactures.)   The Census of Manufactures serves as the primary sampling frame for the Annual Survey of Manufactures. Because of the time required to process, review, correct, and finalize the census, however, a 2- year lag exists between the census year and the first survey year of the new panel. For example, the Annual Survey of Manufactures panel selected from the 1987 census was first mailed for survey year 1989. The Annual Survey of Manufactures is also a mail- out/mail-back survey, but as does the census, it maintains a small imputation stratum which is neither sampled nor mailed. This stratum of small units is initially composed of the same administrative records imputed during the census. Each year during its birth processing, however, the Annual Survey of Manufactures adds additional cases to this stratum.   Both the Census of Manufactures and the Annual Survey of Manufactures are establishment (plant) based. A manufacturing establishment, in general, corresponds to a single physical location where manufacturing activity is performed. For the Annual Survey of Manufactures, the establishment serves as both the sampling unit and the reporting unit. However, a certainty company stratum is also defined whereby for certain companies, all the manufacturing plants owned by the companies are included in the panel as self-representing. These complete companies are included not because of sampling considerations, but rather for the use by other survey programs interested in company data over time. Sample estimates of level are formed using a fixed-base difference estimation methodology whereby sample estimates of change from the base (census) year to the current year are added to the census value.   At the time of selection, the Annual Survey of Manufactures panel is representative of the census frame from which it was chosen. This frame includes the manufacturing establishment population at a point in time. This population, however, is constantly changing as new plants are being constructed, existing plants are going out of business or are converting to different activities, plants are sold from one company to another, and companies merge and divest. In addition, the census itself is a less than perfect frame. Among other errors, it is likely to include plants not in the scope of manufacturing, be missing plants actually in manufacturing, and include industry-misclassified plants. Coverage problems associated with the census have been presumed to be minimal and primarily confined to the smallest plants which have little effect on most of the published totals. However, no formal evaluation of these errors has been undertaken.   56           It is necessary that the Annual Survey of Manufactures attempt to remain a panel representative of the true population over time. To this end, the survey designers maintain and apply a variety of panel maintenance rules. Some of these rules are intended to account for deficiencies that might have existed in the frame itself, while others take into account the evolving nature of the population. The panel maintenance rules are generally unbiased rules, that is, when applied correctly and comprehensively, they introduce no upward or downward bias to the survey estimates. As will be noted in some of the discussion to follow, however, it is not always practical to be comprehensive either in the application of the rules or in the identification of the complete population to which the rules = to be applied. In some cases, in fact, it is judged preferable to lower the mean square error at the expense of introducing some bias. This is particularly true at sub-U.S. levels, where a bias may be intentionally permitted at these levels, although not at the total U.S. level. The net result is that some slippage in the coverage of the ASM panel occurs, and, while the coverage loss is believed to be modest in nature, it has not been quantified.     II. Panel maintenance operations   The major panel maintenance operations for the ASM are classified under six headings: Births, deaths, intercensus transfers, ownership changes, census misses, and miscellaneous. Limitations of these operations and their possible effect on the sample are discussed when appropriate.   Births. Conceptually, the treatment of new plants is straightforward. If all new operations in the course of a year could be identified, then a representative panel from these births could be selected and added to the existing panel. The problem lies in identifying the complete population of births. Births are recognized primarily through two procedures.   First, each year a complete list of newly issued employer identification numbers is obtained from the Social Security Administration. EIN's are required for tax purposes, so every employer with paid employees must have one. This list of new EIN's, however, has several limitations as a source of births. For one, an existing company can request a new EIN for a variety of reasons (reorganization, new partner). Therefore, the list contains some EIN's which do not represent new establishments. Secondly, the list finally obtained by the Bureau of the Census is incomplete. Although many firms request EIN's, they do not always inform the Social Security Administration of the nature of their operations. Thus, SSA is unable to assign an SIC code to many of these concerns, and they remain unclassified. This is a significant problem. In recent years, as many as 100,000 unclassified units have been identified, and it is estimated that as many as 6,000-10,000 may be in manufacturing. Thirdly, multiunit companies do not normally request new EIN's when they build new plants. Instead, they include them under EIN's previously assigned.   Units that are assigned a manufacturing SIC code are mailed a classification card intended to verify the SIC and to determine if the plant is a true birth. Plants indicating that they are new operations become part of the imputation stratum if they fall below the administrative record cutoff which defines that stratum. Otherwise, they are added on a sample basis to the ASM mail sample. Plants. that are not births are assumed to represent existing plants requesting new EIN's. Whether or not these successor EIN's are included in the ASM is solely dependent upon whether they are in the ASM under their old (predecessor) EIN's. No presumption is made regarding establishments that do not respond to the classification card inquiry. To the extent that they are births, some downward bias is introduced, since they are not represented in the ASM. To the extent that they are successors, no bias results, since the predecessor EIN's are represented in the ASM.   57           The second procedure utilizes the Company Organization Survey of the large multiunit companies (employment of 50 or more), which among other things requests that the company list any additional plants not preprinted on the COS mail questionnaires and indicate whether the additional locations are, newly built, acquired, or under construction. Births detected through the COS are either added in total to the mail portion of the panel or, depending upon their number, are sampled and added. This list, however, has its own deficiencies. For one, not all multiunits are included in the COS each year-, a proportion of the small companies are included in each of the 3 years following the year after the census. Secondly, not all companies respond to the survey, and thirdly, many acquired plants are not identified as acquired and are treated as new. Finally, there is the possibility that some multiunit companies are not identified as such and, therefore, are not in the COS file. Typically, such undercoverage would result because the individual plants of the multiunit companies are identified on the SSEL as single units. (The SSEL is the Bureau of the Census' master list of economic entities which is updated each year by the COS and the EIN additions). Nonetheless, the COS multiunit list is believed to be close to complete.     In addition to these systematic methods for identifying birth establishments, new establishments may be identified during data collection from sampled units. Each sampled single-unit company is asked whether any additional plants operate at its location or whether the company owns any additional plants or is owned by someone else. If the establishment is determined to be part of a multiunit company, the additional plants may be birth establishments not previously identified.     It is important to recognize several factors affecting the identification and treatment of birth establishments for the ASM. The births identified from the Social Security Administration lists, which are received on a quarterly basis, typically represent -the first three calendar quarters of the survey year and the last quarter of the previous year. Further, because of the lateness of the classification operation, no attempt is made to collect data from the sampled birth establishments for the current year. An imputed record is created using payroll data provided by the Internal Revenue Service.   Likewise, the identification of multiunit company birth plants by the COS occurs so late (the COS is conducted concurrently with the ASM) that no attempt is made to collect data from them until the following year. Since previous attempts to make reasonable amputations from data obtained by the COS have been unsatisfactory, amputations have not been made for these cases during the last several years. This has resulted in a slight downward bias in the ASM estimates. Beginning in 1989, these cases have been included in the mail survey in an attempt to collect current data from them.   Deaths. The treatment of deaths is straightforward from a conceptual viewpoint. As the ASM sample is a sample representative of the manufacturing establishment population, deaths in the ASM panel should be representative of deaths in the population. A number of practical problems arise, however, in the identification of deaths.   Perhaps the foremost problem associated with identifying deaths is exemplified in the saying "dead men tell no tales." In order for one to know that a plant has ceased operations, one has to be told. Since the forms for a given survey year are mailed in January of the following year, many of the out-of-business plants may have long since ceased operations. No one is now present to complete the questionnaire. From the ASM perspective, the plant is merely delinquent, so data for it will eventually be imputed, as for all nonresponding sampled units.   This is not of particular consequence for single-unit companies, since the data are imputed from Internal Revenue Service payroll data, which should reflect the fact that the plant has been inoperative during part of the year. The plant will still be considered active, however, and will   58           likely continue to receive the questionnaire until the Internal Revenue Service payroll data reach zero.   For multiunits, the imputation of deaths as delinquents may assume importance, since the imputation uses the prior year's data, which may be the data for an active, fully operating establishment. Normally, however, since multiunit companies do not go out of business overnight, the company will inform the ASM of individual plants that have ceased operations. It is still important to obtain data for the plants if they were in operation during part of the year, for otherwise zero records will result On the other hand, when a company does indicate that a plant has gone out of business, there is no means of confirming it. For example, there is no Internal Revenue Service payroll for individual plants of multiunit companies that can be checked, as is the case for single-unit companies. When an entire company shuts down, however, the ASM is particularly vulnerable, for then it is possible that data for every plant of the sampled company will be imputed. The closing of the company will eventually be discovered through other operations, such as the COS canvass, but the timing may be such that the imputed data contribute to the ASM estimates for that year. It may in fact be another year or so before it is realized that the company is no longer operating. However, the infrequency of entire companies shutting down within a short span of time leads to the belief that this is a minor problem.   In summary, it is believed that the identification of death establishments is fairly comprehensive. The procedures safeguard against the erroneous deletion of plants thought to be out of business, since some positive evidence is required before the plant is actually deleted. The more likely source of error is the continuing imputation for plants that are no longer in operation.   Intercensus transfers (ICT's). Occasionally, the major activity of an establishment will change from nonmanufacturing to manufacturing. Usually no actual switch occurs, but rather the establishment is misclassified. Most of these incoming ICT's for the ASM originate in the retail and wholesale trade areas. The identification of ICT's is most likely to occur during census years, when the smaller establishments, those most likely to be misclassified, are most completely canvassed. Of course, the smallest establishments in the imputation stratum are not enumerated, and, therefore, are not subject to detection. Unless they are of significant size, the ICT's identified as manufacturing will not be added to the current ASM panel. Thus, some coverage loss occurs. Such establishments are included in the manufacturing census, however, and so will be part of the frame when the next ASM panel is chosen.   In intercensus years, the number of ICT's diminishes significantly, since only those plants belonging to the trade surveys are contacted routinely. This, plus the fact that not all ICT's are added to the ASM when found, results in some coverage loss in these years as well. Intercensus transfers can be regarded as a sample maintenance operation intended to account for deficiencies in the original sampling frame.   Ownership changes. Conceptually, ownership changes should result in no coverage loss. If a plant is in the ASM and is sold to another company, it remains in the ASM with the same sampling weight. A plant not in the ASM that is sold remains out of the ASM.   In practice, applying rules has not been so straightforward. In past years, it was often difficult to identify the successor and to determine if the successor was already included on the SSEL or needed to be added. This determination was difficult because information provided by the predecessor was often incomplete or inaccurate, especially for smaller plants, and the manual matching operation involved thousands of separated company records. Historically, therefore, the ASM managers took into account the practical limitations of attempting to link all ownership changes by establishing size cutoffs, below which no linkage was performed. With the introduction of interactive processing in recent years, the ability to obtain and process this   59           information has been enhanced, so that, beginning with the 1989 survey, linkage is being attempted across the entire mail panel.   Ownership changes occurring among the administrative record cases making up the imputation stratum present no problems. Even if two records appearing in the imputation stratum represent the same plant, the predecessor's record will reflect dam up to the point of the change, while the successor's record will reflect data subsequent to the change.   Census misses. On very rare occasions, manufacturing plants are found to be missing, not only from the manufacturing population, but from the Bureau of the Census master establishment file (SSEL) as well. That is, the plant is completely missed and is not just misclassified. These mistakes are generally the result of human error, for example, misinterpreting company correspondence, using the wrong updating routine, etc., and are minimal in number. These misses, like ICTs, can be considered as deficiencies in the original frame.   Miscellaneous. As mentioned in the introductory remarks, decisions are made on the basis of mean square error considerations, which affect the representativeness of the sample at sub-U.S. levels. The most prominent example of this is the treatment of noncertainty sampled cases that produce such a different mixture of product classes in a given year that their basic industry classification should be changed. The ASM rule is not to change the industry code but to freeze it in its existing code. The argument advanced is that industry switches are rare events, and it is not likely that the other establishments this weighted case represents also changed to this same new industry code. Although bias is introduced in both industry estimates (but not in the total U.S. estimate), the variance component of the mean square error is substantially smaller than it would be if the change were allowed. Clearly, the sample is no longer truly representative of the industries involved. Certainty cases, which represent only themselves, are permitted to switch industries, but even they must satisfy a rigorous set of tests before the change is permitted. These tests are designed to prevent establishments from oscillating year to year from one industry to another. The freezing of the noncertainty SIC codes is maintained until the next census year (a maximum of 5 years hence) when the correct code can then be assigned.   60           APPENDIX A.2. NATIONAL LONG-TERM CARE SURVEY (NLTCS)   I. Introduction   The 1982 National Long-term Care Survey was designed to provide nationally representative data on the functionally limited population aged 65 and over. It was a detailed personal interview study of the population aged 65 and over who were not living in hospitals, nursing homes, or other institutions and who had functional Nations that impeded daily activities. The survey was sponsored by the Office of the Assistant Secretary for Planning and Evaluation, Department of Health and Human Services; the data were collected by the Bureau of the Census.     In 1984, the Health Care Finance Administration (HCFA) sponsored a follow-on to the 1982 survey, the 1984 National Long-term Care Survey. The 1984 National Long-term Care Survey was designed to provide information at a second time period for those persons covered in the 1982 survey, as well as a comprehensive picture of the population aged 65 and over in 1984. Data were again collected by the Bureau of the Census.     II. Sampling frame   The target population for the 1982 National Long-term Care Survey sample consisted of noninstitutionalized persons aged 65 and over with limitations in activities of daily living (ADL) (eating, getting in or out of bed, getting in or out of chairs, toileting, dressing, bathing, walking around, or going outside) or limitations in instrumental activities of daily living (IADL) (meal preparation, laundry, light housework, grocery shopping, money management, taking medicine, or making telephone calls) lasting 3 months or longer. Because there was no available list of such persons, and virtually all of them were on the Medicare rous, the list of Medicare enrollees was a useful frame from which to select the 1982 National Long-term Care Survey sample.   The frame population from which the 1982 National Long-term Care Survey sample was drawn included all Medicare enrollees in sampled geographic areas as of April 1, 1982. In most areas, a 10-percent sample of Medicare enrollees was selected from HCFA's December 1981 Health Insurance Skeleton Eligibility Write-off file. The sample was updated with a 10-percent sample of persons added to the file from January I through March 31, 1982. In areas with no December sample, a 50-percent sample was selected from all persons on the March file. The 50-percent sample was selected only in areas thought to require a sampling fraction greater than 10 percent. In total, a sample of 55,767 enrollees was selected. The sample was subsequently reduced in size to about 36,000, who were screened for Nations in activities. Of the 36,000, 6,393 were identified with such limitations and qualified for a detailed interview. Detailed interviews were completed for 6,088 persons.   The target population for the 1984 National Long-term Care Survey sample consisted of all persons aged 65 and over. First, all persons who reported functional limitations during the 1982 screening interview or who were not screened due to being institutionalized on April 1, 1982, and who survived to 1984 were interviewed regardless of their 1984 functional status. Second, from the original 25,541 persons who did not report functional impairments during the 1982 screening interview (and were not institutionalized), a random sample of 47 percent (i 2, 100) was drawn and subjected to the same screening procedure as in 1982. In addition, a sample of elderly noninstitutionalized residents who turned 65 after the 1982 survey was screened so that a full cross- section of persons aged 65 and over in 1984 could be evaluated.   It should be noted that persons aged 65 and over who were not Medicare enrollees we.-e missing from the sampling frame. Medicare entitlement has always been tied to a person's work history, or a spouse's work history. Individuals with no work history (excluding those receiving Supplemental Security Income) have not been entitled to receive Medicare and were thus   61           excluded from the frame. This frame undercoverage was estimated to be no more than 3.7 percent of the 65-and-over population of interest for 1982 and 1984. Undercoverage varied by age, race, and sex. Generally, undercoverage was greater for black persons less than 85 years old than for the corresponding nonblack persons; and, for either race, was greater for those less than 70 years than for those 70 years and over (U.S. Bureau of the Census 1986).   An additional source of undercoverage was the result of geographic interviewing constraints. Sampled persons who were found to have moved beyond 100 miles of any Bureau of the Census field representative were treated as ineligible for interview. The noninterview adjustment procedure did not adjust for these persons, although they could have been treated as noninterviewed eligible persons and included in this procedure. No estimate of the size of the undercoverage resulting from geographic interviewing constraints was made.     III. Screening interviews and their effect on coverage in the NLTCS   For the 1982 National Long-term Care Survey, a brief series of questions was administered by telephone or personal visit interview to the 36,000 persons drawn in the sample from the Medicare enrollment files to screen out those persons without functional limitations. In the screening interviews, questions were asked to determine whether the sampled persons had an ADL or IADL limitation, and whether the limitation had existed or was expected to exist for 3 months.   Self-reporting of limitations during the screening interview had a tendency to create two types of errors:   - The sampled person reported a limitation during the screening interview which was not verified in the personal visit detailed interview--a false positive; and - The sampled person did not report in the screening interview a limitation that did exist--a false negative.   In the National Long-term Care Survey, there was a 13-percent false positive rate among those persons for whom a detailed interview was completed. False positives were anticipated. The screening questions had been written to cast a broad net in order to minimize the false negatives, since false positives could be identified from information in the detailed interviews and eliminated, if desired, by the data analyst.   No attempt was made to measure the rate of false negatives inasmuch as this would have required administering the detailed interview to a sufficient number of people who did not report limitations on the screener to determine the rate, which could not have been accomplished within the budget fixed for this survey. The designers of the NLTCS thus assumed that the proportion of false negatives was negligible. In general, this type of assumption must be made with caution, for if the stratum defined to contain the group of elderly with no limitations, i.e., the negative stratum, was large, even a small proportion of false negatives among the negative stratum could have constituted a sizable proportion of a rare population, and thus result in coverage error (Kalton and Anderson 1986). It must be noted, however, that any attempt to measure the rate of false negatives in this instance is fraught with methodological problems. For example, elderly who reported no limitations during the screening interview, but did report them during a later follow-up interview, may not be false negatives but simply persons showing the effects of the aging process. Nevertheless, the potential for coverage error does exist when using a screening interview to identify a rare population, such as the target population for the NLTCS.   62           IV. Response rates   In surveys such as the National Long-term Care Survey, the final interviewed sample often differs from the initial sample for two reasons: Some members of the initial sample turn out not to be part of the population of interest, and some of the persons selected for the sample cannot be interviewed.   Of those persons initially selected for the National Long-term Care Survey, 2,472 were not in the population of interest on April 1, 1982. Some were deceased, some were institutionalized, and some lived outside of the country. A rate of ineligibility of 6.9 percent indicated that the administrative records and techniques used to provide a list from which to draw the initial sample were relatively accurate.   Of those persons who were in the population of interest on March 31, 1982, 785 had left it before they could be given the screening interview, and an additional 622 persons could not be interviewed for one reason or another. These reasons were: They could not be located; they had moved outside of any geographical area where interviewing was conducted; they were temporarily away from home or unable to respond and no proxy was available to be interviewed in their place; they refused to answer questions; or a host of other reasons.   In all, almost 96 percent of the eligible population in the sample were interviewed.     V. Sampling weights and their effect on coverage   Researchers in the field of long-term care research (Spillman 1989) believe problems exist with the 1984 National Long-term Care Survey cross-sectional weights on the public-use tape, resulting in a slight undercount of individuals in the community and a slight overcount of individuals in institutions. Specifically:   - The criteria used to classify individuals as institutional in the 1984 National Long-term Care Survey defined a more restricted population than the population represented by the control total used in ratio estimation. To be classified as institutional in the National Long-term Care Survey, a sampled person's residence had to have at least three unrelated residents and a health professional on duty 24 hours a day. Those who met the criteria for institutionalization, but who did not meet the more restrictive National Long-term Care Survey criteria, were classified as noninstitutional and included in the ratio estimation to noninstitutional control totals. However, the control totals were based on the more encompassing decennial census definition of institutionalization.   - Persons in correctional institutions were explicitly ineligible for the National Long-term Care Survey, but the control total was not adjusted to reflect their exclusion. Such persons account for a minuscule share of the elderly institutional population as a whole (0.3 percent), but represent larger shares of certain subpopulations, for example, 7.7 percent of black males aged 65 to 69.   - There was no attempt to post-stratify the combined institutional and noninstitutional population to an estimate of the total population aged 65 and over. When the slightly too large National Long-term Care Survey institutional control total is added to the noninstitutional population control total, the resulting total of 28.03 million persons is slightly larger than the census estimate of the total resident population aged 65 and over in 1984, 27.97 million.   Manton (1988) found fault with the traditional nonresponse adjustment procedure and its subsequent effect on the 1982-1984 longitudinal weights. The nonresponse adjustment resulted   63           in an undercoverage of the disabled in 1982 and an overcoverage of the disabled in 1984, because the 1982 nonrespondents, a group that included 284 persons who became institutionalized after April, 1 1982, may not have responded for health reasons. Manton's solution was to create a special set of longitudinal weights. However, the author did point out that more sophisticated weights could be created to remove additional bias in the calculation of transitional rates by adjusting further the health selection effects on the sample in 1982 by calculating weights that reproduce the appropriate life table experience.   64           APPENDIX A.3. NATIONAL MASTER FACILITY INVENTORY (NMFI)   I. Introduction   The National Master Facility Inventory, established in 1962-63 by the National Center for Health Statistics (NCHS), is the most comprehensive inventory of inpatient health facilities in the United States (Science Applications International Corporation (SAIC) 1985). The NMFI program provides for the development of a list of names and addresses of all facilities or establishments within the scope of the NMFI and the collection of information which describes the facilities with respect to their size, type, and current business status (U.S. National Center for Health Statistics 1965).   The NMFI serves a twofold purpose: It is the only source of national statistics on the number, types, and geographic distribution of nursing and related-care homes, and it serves as a sampling frame for other health facility and long-term care surveys. These have included NCHS National Nursing Home Surveys (Strahan 1987) and Early Hospital Discharge Surveys (U.S. National Center for Health Statistics 1968, SAIC 1985) and the National Center for Health Services Research's (NCHSR) Institutional Population Component of the 1987 National Medical Expenditure Survey (NMES) (Potter, Cohen, and Mueller 1987). The designs for these surveys are characterized as stratified multistage probability designs, with inpatient health facilities selected in the first stage(s) and persons selected within sampled and cooperating facilities in the last stage (Shimizu 1986; Cohen, Flyer, and Potter 1987).   The NMFI has been updated about every 2 years since 1967, using NCHS Agency Reporting System, survey data obtained from facilities listed on the NMFI, and data obtained from State and national agencies (U.S. National Center for Health Statistics 1986). In 1985, the scope of the NMFI was expanded and data were collected under the name Inventory of Long-term Care Places. The first Inventory of Long-term Care Places survey was conducted in 1986 by NCHSR, in conjunction with NCHS and the Bureau of the Census (Potter, et al. 1987).   The scope of the NMFI inpatient health facilities has changed over time as a result of changes in the health-care industry and because new and better non-NMFI sources of data have become available. All NMFI's have included in the target population facilities defined as nursing and related-care homes. Prior to the 1978 survey, the NMFI also included hospitals (excluding Veterans Administration-operated hospitals) and other types of inpatient health facilities, such as homes for the blind, deaf, mentally retarded, and emotionally disturbed. For the 1980 survey, all identifiable hospital-based nursing homes and extended-care facilities were excluded because NCHS was unable to obtain dam on all such facilities. Hospital dam were subsequently obtained from the American Hospital Association. Also included for the first time in 1980 were residential community-care facilities in California and adult foster-care homes in Michigan (U.S. National Center for Health Statistics 1983). The 1982 survey added to the 1980 target population residential-care homes in Florida and Kentucky (U.S. National Center for Health Statistics 1986). For the 1986 Inventory of Long-term Care Places, the scope was further expanded to include facilities for the mentally retarded.   The first list of NMFI names and addresses was assembled in 1962 by merging a number of published and unpublished lists of hospitals and institutions (U.S. National Center for Health Statistics 1965). Sources' included State license files for nursing homes and related facilities, directories maintained by national associations, and names of establishments which were contained in a subset of files maintained by the Public Health Service, the Social Security Administration, and the Bureau of the Census. NCHS then edited the list for duplicate records, and, with the assistance of the Bureau of the Census, undertook a mail survey of establishments to determine the current status, nature of business, and size of the places listed. This procedure   65           tended to maximize coverage and in some instances resulted in duplication on the NMFI (U.S. National Center for Health Statistics 1965).     II. Maintaining the National Master Facility Inventory   Maintaining the NMFI involves adding the names of new facilities, deleting those that go out of business, and obtaining information on facility type and size from those currently in business. In 1964, NCHS initiated work to develop a system, known as the Agency Reporting System (ARS), by which new facilities could be identified and subsequently added to the NMFI (U.S. National Center for Health Statistics 1968). The ARS initially had 365 participating agencies: 323 State and 4 Federal agencies with authority to administer or license facilities within the scope of the NMFI, and 38 national voluntary or commercial organizations which issue lists or directories of inpatient health facilities. All 50 States and the District of Columbia were represented among the State agencies; each State averaged five ARS agencies. These agencies periodically produced lists of new inpatient facilities as part of their regular duties. This new information was then forwarded to NCHS and used to add new facilities, i.e., births, to the NMFI. However, there were a number of ARS agencies which did not identify new facilities but only provided a current list of facilities. Consequently, each new list provided to NCHS had to be matched against the most current NMFI on file to identify new facilities (U.S. National Center for Health Statistics 1968).   To update the information on each facility listed in the NMFI, a mail survey of the ARS-updated NMFI was undertaken. All surveys through 1973 were conducted by NCHS with the assistance of the Bureau of the Census (U.S. National Center for Health Statistics 1986). Survey results, data obtained through field follow-up to nonrespondents, and postmaster returns were used to update and edit the list of NMFI facilities by deleting those that were out of business or out of scope, adding new facilities reported by survey respondents, and modifying data on facility type and size. This periodically updated data base was used as a sampling frame and as a source of national statistics on inpatient health facilities, particularly nursing and related-care homes.   Beginning with the 1976 survey, two distinct systems were used to update the NMFI. The first system was a continuation of methods used prior to 1976. The second system, the Cooperative Health Statistics System (CHSS), decentralized NMFI data collection from the Federal to the State level. The CHSS agencies, usually State licensing agencies, were responsible for identifying new facilities and collecting updated data on existing facilities. In 1976, 16 States within CHSS collected some or all of the NMFI data. Twenty- six States participated in 1978, and by 1980, 38 States collected NMFI data. By 1982, CHSS had ceased to be active, but arrangements were made with 36 States to obtain their data for the 1982 NMFI.   The most recent NMFI update occurred as part of the 1986 Inventory of Long-term Care Places (U.S. National Center for Health Statistics 1987). The method of updating was a modification of the system used prior to 1976. Letters were sent to over 200 State and national agencies asking them to send NCHS any and all listings that they maintained for nursing and related-care homes, and facilities for the mentally retarded. Facilities not appearing on the 1982 NMFI or 1982 National Census of Residential Facilities (NCRF) (Hauber, et al. 1984) were added to form a more recent depiction of the population of interest. The 1982 NCRF was a census of residential facilities for the mentally retarded and a necessary supplement to the NMFI, since the 1982 NMFI excluded facilities for the mentally retarded, a group considered in scope for purposes of the 1986 Inventory of Long-term Care Places. A matching process was performed to remove duplicates from within and between the two files. If there were any doubts as to whether a place was a duplicate, it was retained on the Inventory of Long- term Care Places. This procedure tended to maximize coverage, but its inclusiveness resulted in duplication on the Inventory of Long- term Care Places. The Bureau of the Census then conducted a mail survey of facility administrators to obtain information on current business status, type of facility, population   66           served, and size (i.e., numbers of beds, residents, and annual admissions). Field follow-up of nonrespondents was used to reduce the nonresponse bias. Survey results, dam obtained through field follow-up, and postmaster returns were used to update the list of facilities and facility information. This updated data base was used to create the NMES sampling frame (Potter, et al. 1987).     III. Evaluations of the National Master Facility Inventory   The National Center for Health Statistics determined the magnitude of NMFI undercoverage using a Complement Survey (U.S. National Center for Health Statistics 1965), an application of the multiple- frame survey design technique discussed in section 1.3.3. Two frames, the NMFI and an area-sample list, were used. All institutions in the sampled area were identified, their probabilities of selection determined, and then stratified by absence/presence on the NMFI. The stratum of facilities not on the NMFI was then used to make an unbiased estimate of NMFI undercoverage.   The first Complement Survey was conducted in 1962 utilizing the design of the Health Interview Survey. Nonmatched institutions from the area sample were surveyed to collect current information on type of business and period of operation. This process yielded estimates of 5 percent NMFI gross place undercoverage and 2 percent gross bed undercoverage. However, this method of estimating types of undercoverage was far from ideal because of its large sampling error and error in field identification of institutions.   A second evaluation of the NMFI occurred in 1966 and involved reconstructing the originally assembled NMFI using the newly developed ARS (U.S. National Center for Health Statistics 1968). This evaluation pointed- to a major undercoverage problem of homes for the aged that provided personal care in the States that did not license this type of facility and had no regulatory authority over them. In 1966, these States were identified as Connecticut, Idaho, Kansas, Nebraska, South Carolina, and West Virginia. Undercoverage in West Virginia was estimated at 500 facilities. Undercoverage in Idaho and South Carolina was unknown, while undercoverage in all of Connecticut, Kansas, and Nebraska was estimated at less than 25 total facilities (U.S. National Center for Health Statistics 1968).   Additional Complement Surveys were conducted in conjunction with subsequent NMFI surveys (SAIC 1985, Research Triangle Institute 1981, Shimizu 1983). Shimizu reported that on the basis of the 1982 Complement Survey, 94 percent of the eligible facilities and 98 percent of the eligible beds were included in the sampling frame for the 1985 Nursing Home Survey (Shimizu 1986).   The Cooperative Health Statistics System of updating the NMFI has also been reported as a source of NMFI undercoverage (U.S. National Center for Health Statistics 1986). Among CHSS States, there have been differences in definition, scope, and timing of the NMFI surveys because of different State licensing laws. For example, the target population for the 1980 NMFI included such facilities as personal-care homes, homes for the aged, and rest homes, but, in 1980, not all CHSS States licensed these types of facilities, resulting in some undercoverage (U.S. National Center for Health Statistics 1983).   The most recent evaluation of the NMFI began in 1983 and was completed two years later (SAIC 1985). It included a State-by- State review of health facility regulations and site visits to State agencies responsible for licensing or regulating facilities. It did not attempt to determine the degree of NMFI undercoverage as the Complement Surveys did; however, the final report concluded with a recommendation to redesign the procedures by which the NMFI was maintained. It also recommended that NMFI facility inclusion criteria be broadly defined (SAIC 1985).   67           IV. The National Master Facility Inventory as a Sampling Frame: Experience from the 1987 National Medical Expenditure Survey   As previously noted, the scope of the NMFI was recently expanded and facility data were collected through the 1986 Inventory of Long-term Care Places. The Inventory of Long-term Care Places was created to serve as the sampling frame for the Institutional Population Component (IPC) of the 1987 NMES (U.S. National Center for Health Statistics 1987). The following describes some of the coverage issues associated with the use of the NMFI and the Inventory of Long-term Care Places as sampling frames, and the methods used to correct for the potential bias. The examples rely on the NMES experience.   The NMES IPC was established to provide an assessment of the utilization, costs, sources of payment, and health status of the U.S. population in nursing and related-care homes and in facilities for the mentally retarded. The period of assessment covered calendar year 1987, during which data were collected from a sample of residents in nursing and related-care homes and in facilities for the mentally retarded. The IPC utilized a stratified three- stage probability design, with facilities selected in the first two stages and persons sampled at the last stage (Cohen, et al. 1987, Potter, et al. 1987).   Record linkage and identification of duplicates. As noted previously, the process of updating the NMFI and the Inventory of Long-term Care Places was dependent on record linkage techniques to assemble the list of facilities. These techniques may have resulted in undercoverage when facilities were erroneously classified as duplicates or overcoverage when facilities were not properly identified as duplicates. The adopted techniques minimized undercoverage at the cost of potential overcoverage (U.S. National Center for Health Statistics 1965, 1987). Five methods were used, with respect to the NMES, to correct for this source of potential bias.   - Incorporated into the instrument for the Inventory of Long- term Care Places mail survey were requests to facility administrators to return any duplicate questionnaires received under different names and/or addresses. Of the 56,720 facilities listed on the Inventory of Long-term Care Places mailing list, the Bureau of the Census classified 2,371 as duplicates on the basis of this respondent information (Potter, et al. 1987). It should be noted that the Bureau of the Census did not validate this information, and that duplicates may have been misreported by respondents and may serve as a potential source of bias.   - Results of the Inventory of Long-term Care Places mail survey were used to create the NMES sampling frame. Part of this work involved an evaluation of the Inventory of Longterm Care Places data to determine if the Inventory contained any duplicate listings. Computer-matching techniques and a visual review of the Inventory of Long-term Care Places data on facility name, address, type, and size resulted in the identification of an additional 1,570 duplicates (Potter, et al. 1987).   - After the NMES sample of 1,714 facilities was selected, a telephone screening operation was conducted to verify facility name and address prior to the first phase of field operations. A few additional duplicates were discovered when respondents indicated that they had already talked to somebody about the survey. Each case was reviewed to assure that sampled units were not erroneously classified as duplicates.   - The results of the screening revealed some remaining potential for duplication. A thorough review of the sample was then conducted to identify any potential duplicates. The potential duplicates, known as the problem pairs or duplicates, were flagged for special handing in the field. Each problem pair was assigned to the same field representative along with specific hand-written instructions on obtaining the information   68           necessary to resolve the problem. This information was then telephoned to a survey statistician at the home office, who determined whether there was a duplicate or two operating units at the same location.   - To assure that the above procedures resulted in the correct identification of duplicates associated with the NMES sample, an additional field procedure was developed and put into operation during the second round of NMES IPC fieldwork. An instrument known as the Duplication Worksheet was developed to identify and classify, for all NMES sampled units, all potential duplicates listed on the Inventory of Long-term Care Places list. A Duplication Worksheet listing all potential duplicate units previously classified as duplicates was created for each sampled unit. Field representatives asked facility administrators if the listed place was the same place as the sampled unit, a previous name and/or address of the sampled unit, a place affiliated with the sampled unit through administrative operating procedures or ownership, or some other place. Field representatives also inquired about any other names and addresses used by the sampled unit. The results from the Duplication Worksheet were used to adjust control totals during facility-level post-stratification adjustments in the NMES IPC.   Listing errors. The design of the NMES IPC called for the random sampling of current residents and admissions within sampled and cooperating facilities. Field representatives were responsible for listing and sampling persons by exacting procedures. These processes were subject to error and the following methods were used to reduce the error.   - Extensive training of the field representatives and supervisory staff was conducted.   - A programmable calculator was used to select the random sample of current residents and admissions. The programmable calculator was much easier for the field representatives to use than older methods of sample selection and, therefore, reduced the potential for error.   - Built into the programmable calculator was a review function that required field representatives to review all data entered into the calculator, thereby reducing the chance of calculator input error.   - Field representatives were instructed to call the Washington office if problems were encountered with listing or sampling. This procedure ensured that listing and sampling problems would be handled in a uniform manner at the national level rather than being subject to local variations in resolution methodology.   - Upon completion of sampling, all listing forms (except those maintained by the facility for confidentiality reasons) were forwarded to the Washington office for a 100-percent verification check. All errors in listing and sampling were reviewed by a survey statistician for corrective action.   - Ten percent of all listing, sampling, and interviewing was validated in the field by a quality control supervisor.   Temporal errors. The NMES IPC was designed to provide national estimates of health care utilization and expenditures for the calendar year 1987. During that year, new facilities opened, sampled facilities may have closed, sampled persons may have died, or sampled persons may have transferred out of sampled facilities. The longitudinal nature of the NMES, coupled with the length of time between the creation of the sampling frame and data collection, had implications for survey coverage. The following methods were used to control coverage.   69           - In the instructions for the Inventory of Long-term Care Places mail survey, facility administrators of home offices were requested to provide the Bureau of the Census with a list of names and addresses of all facilities administered by that home office. Facilities identified by this method were designated as potential births. A second list of potential births was created when Inventory of Long-term Care Places respondents returned altered copies of a single questionnaire with notations that the altered copies were for other facilities under the same administration. The two lists were compared to the original Inventory of Long-term Care Places and those facilities not appearing were added to the frame.   - Some of the NMES sample facilities were identified during the first field contact as home offices or administrative units. (Field representatives were trained on how to identify these administrative units.) A sampling procedure was developed whereby field representatives listed all units, by size and type, administered by the home office. This information was reviewed by a survey statistician and compared to the Inventory of Long-term Care Places. NMES eligible units not on the Inventory of Long-term Care Places sampling frame were identified and combined to form "super units" for the final- stage sampling process and were added to the frame.   - Closed facilities (facility deaths) were identified by pos=aster returns during the Inventory of Long-term Care Places mail survey, or by field representatives calling State or local licensing agencies or local associations of health- care providers. All such NMES facilities were verified by field supervisors and classified as out of scope for the remainder of the survey.   - Sampled persons who died during the course of the survey were classified as in scope until the date of death.   - Sampled persons residing in sampled facilities that subsequently closed during 1987 were not classified as out of scope, but were followed throughout the year using the NMES IPC Survey of Next-of-Kin.   - Sampled persons who transferred out of a sampled facility to other in-scope facilities were followed to the new facility, while sampled persons who transferred back into the community were followed using the Survey of Next-of-Kin.   70           APPENDIX A.4. PRODUCER PRICE INDEX (PPI)   I. Introduction   In 1976, the Bureau of Labor Statistics commenced work on a major revision of the Wholesale Price Index (WPI). The WPI was a commodity-based index consisting of a judgmentally selected sample of major producers. Due to the manner in which the companies and specific products were chosen, the actual coverage of the WPI fell short of its targeted population. One of the major goals of the revised index, now referred to as the Producer Price Index, was to achieve more complete coverage of the target population.   The Producer Price Index measures over-time changes in the selling prices received by goods producers from whoever makes the first purchase. Data collection procedures include a one-time personal visit to elicit cooperation and identify those products which are to be priced and subsequent monthly mailings to collect current prices for these same products. The many indexes produced each month are organized by SIC. (See U.S. Bureau of Labor Statistics (1988) for a complete description of the Producer Price Index survey design.)   For a given industry (SIC), the target population is the complete collection of transactions made by establishments classified in the SIC. An establishment is classified in the SIC in which it generates the most revenue. Sampling is done in two stages. The first stage consists of a sample of business establishments. The second stage consists of a sample of goods or services that are sold by the selected business establishments. In this case study, only the survey coverage problems which occur in the first-stage sample will be discussed   The Producer Price Index uses a frame which is partitioned by 4- digit SIC. The UI file is the primary data source for constructing this frame. Data from secondary sources, such as trade journal publications, previous sample data, telephone calls to the businesses themselves, and interviews with industry authorities are used to augment the UI data.     II. Evaluation and selection of primary data source   In the early development of the design of the Producer Price Index, various data sources were evaluated and compared in an effort to find the best primary source file for this survey. These potential frames were evaluated using completeness and accuracy of auxiliary information criteria (see section 1.2.1). The completeness criterion required that the frame contain all establishments currently engaged in the wholesale selling of goods and services produced at or by the establishment, regardless of the size of the establishment. The accuracy of auxiliary information criterion was based upon both the correctness of the SIC code and the adequacy of a measure of size.   The Producer Price Index publishes indexes at the industry level corresponding to the 4-digit SIC code. In order to serve as a primary source frame, establishment data must include the primary type of business. Ideally, this classification of type would be the SIC code. At a minimum, sufficient data must be present to classify each establishment by SIC code. The most desirable measure of size for the Producer Price Index is the most recent annual value of shipments and receipts for each frame unit. Short of this, reasonably accurate estimates are required. In a departure from usual establishment sampling practice, the Producer Price Index samples groups of establishments operating as a single economic unit, where the establishments are each classified in the same SIC and the prices they charge their customers are set centrally.   To satisfy the criteria given above, a single snapshot of the population frame at a given time is less useful than a source frame that is periodically updated or replaced, since establishments can   71           grow or shrink in size, change SIC, or go out of business, or new establishments can start up after the source frame's reference period.   Most data sources were quickly ruled out because of their lack of completeness. Due to the definition of the scope and type of sampling unit in the Producer Price Index, two source candidates were considered: The Dun and Bradstreet (D & B) file and the UI file.   Comparison of the Dun and Bradstreet file and the Unemployment Insurance file. During a pilot study of four SIC's, the D & B files were used to select samples in two SIC's, while the UI file was used to select samples in two other SIC's. Subsequent analysis showed the UI file to be superior to the D & B file as a sampling frame for the following reasons.   - The UI file was more complete than the D & B file. The coverage of the D & B file varied by SIC, since the file was composed primarily of companies for which a credit check had been made. Although the completeness of the UI file varied from State to State due to the differences in filing requirements, most companies were required to file a quarterly report to the State or States in which they operated. However, railroad workers, who were covered by the Railroad Unemployment Insurance Act, and family-owned and operated businesses with no outside employees were exempted from this requirement.   - Classifications of businesses by SIC code were found to be more accurate in the UI than in the D & B file.   - The D & B file was updated on an individual company basis, where the frequency of the updates depended on size and importance of the company and the number of credit checks D & B was asked to provide its users. A new UI file was received by BLS on a yearly basis and included updated establishment employment values.   - Because the UI file address was a mailing address used for tax purposes, it did not necessarily correspond to the physical location of the establishment. The D & B file address could also be incorrect, sometimes identifying the company headquarters or a mailing address. The State and county of the establishments were more accurate in the UI file, due to the reporting requirements of each State.   - The UI file was found to be a superior establishment file, while the D & B file was superior as a company file. The D & B file contained data on companies' organizational characteristics, such as central headquarters locations, divisional headquarters locations, and estimated employment and revenue for each company division.   - At the establishment level, the D & B file employment data were not as accurate as the UI file. These data were not critical to most of the users of the D & B file; therefore, rounding errors, rough estimations, and duplication of reported employment within a company were more likely to occur.   As a result of this comparison, the UI file was selected as the primary data source for the revision of the Producer Price Index. The D & B file was retained for several years, however, for use as a secondary source, particularly during the frame refinement process.   UI file coverage. Due to State UI laws, almost all establishments are required to file a quarterly report with the State in which the establishment is located. Although the filing requirements vary across States, 31 States have adopted the same requirements as the Federal Government: A quarterly payroll of at least $1,500 and at least one paid employee during the preceding 20-week period. Establishments are added to the UI file in the following ways:   72           - A new establishment applies for a UI number, An unemployed worker files a claim and there is no record of his employer on the UI file, - Field auditors find establishments not currently on the UI file, or The Internal Revenue Service furnishes the State with lists of new establishments that have filed for an EIN.   Conversely, establishments can be deleted from the UI file in the following ways.   - If an employer fails to file a quarterly report, an investigation may be made to determine if the company is still in business. If not, the establishment record is deleted from the UI file.   - An establishment notifies the UI office that it is out of business in order to avoid getting billed in the future.   - An establishment is in business but has zero employees for four consecutive quarters.   Duplication in the UI file. There are several ways in which the same establishment can be included in the UI file more than once. The State office may simply enter the data more than once, or occasionally, a company filing a supplemental report to correct an error in the original report will be entered as an original report. A company could also have been absorbed by another company, with each company filing a report during the same quarter.     III. Producer Price Index establishment universe maintenance   Prior to the development of the Producer Price Index Establishment Universe Maintenance System, the UI file was replaced every year with a newer version. Frames for selected SIC's which were refined using the previous year's file were not matched to the newer UI file; thus, most of the refinements to the frame were lost. This caused numerous problems, most notably in the area of frame unit misclassification. Due to the lack of longitudinal tracking over the successive UI files, units could be included in more than one SIC's sampling frame if their SIC changed from one UI filing year to the next. Similarly, other establishments could miss being sampled because of this undetected movement from one SIC to another. Other refinements to the frame, such as corrected addresses and establishment names, were lost whenever a new UI file replaced the old as the primary source file.   The Producer Price Index Establishment Universe Maintenance System has been developed to address the problems described in sections 1.2.2 and 2.1.3 of this report through the longitudinal tracking of individual frame units. With this system, a new UI file is received and edited by BLS each year. Records on the new file are matched to the existing frame records and, in the event of a successful match, the existing frame record is updated with whatever incoming data are deemed to be more current or correct. The initial matching is done by computer, comparing the State, county, and UI number of each record. This matching is done across two files, the active or universe file and the death file. The death file contains a record of establishments discovered to be either out of business or classified in an industry which is outside the scope of the Producer Price Index. Incoming UI records which match up with records in the death file are maintained in the death file. Incoming UI records which match records in the active file are used to update these existing frame records. After all possible computer matches have been made, the remaining unmatched units in both the incoming UI file and the universe file are compared on a manual basis. Once all possible manual matches have been made, the universe frame records are updated with any information that is more current. Existing frame records which are not in the PPI sample and are unmatched, are moved to the death file, since these units are assumed to have gone out of business or to be represented by so= unmatched incoming births. The remaining unmatched incoming UI records are considered births to the population. They are added to the frame. These birth establishments are assigned a special code to distinguish them from the other   73           frame members. During the yearly matching process, these birth records am put through the manual matching process, in an effort to match these units to the newer unmatched, incoming UI file members.   Effectiveness of the current Producer Price Index Universe Maintenance System. The current Producer Price Index sampling frame contains approximately 620,000 records. In the most recent capture process applied to the incoming UI file, roughly 560,000 records were matched automatically. The manual matching process resulted in approximately 1,500 additional matches. The remaining unmatched units were added to the existing universe dam base. Research suggests that not all of these new frame units are actual births to the frame. Some of these incoming UI file records cannot be successfully matched to an existing frame record due to some combination of name, SIC code, ownership, or location changes.   For most frame members, the tracking process consists solely of the yearly capture of new UI file data. In these cases, the new UI file data are used to update the SIC code, employment, and the company name and address. For other frame records, data are obtained from an additional source, such as a telephone interview during frame refinement or a personal visit during the initiation process. Occasionally, these sources will yield conflicting establishment data, making it necessary to choose among them. In these cases, the information obtained from a personal interview is considered more reliable than dam obtained via a telephone interview, while dam recently obtained over the telephone are generally considered more accurate than the UI file dam. Given the need to resample an SIC approximately every 6 years, it is important to retain the information that comes from sources other than the UI file, so long as these data are believed to be more accurate.   One way to judge the relative accuracy of incoming data is through the reference period variable. The reference period is a variable identifying the month and year in which data were obtained for that frame record. Incoming establishment information can only be used to update the present universe data base if its reference period is more recent than that of the existing file. By the time the incoming UI file is captured and used to update the frame, the dam are approximately 18 months old. As a result of this time lag, any recent changes to an industry must be properly reconciled in the universe data base during the frame refinement process. At some point, the UI file may catch up with these changes. It then becomes necessary to reconcile the newer UI file data with the data already in the frame file. At present, there is no way of doing this as part of the automated capture and update process.   Refinement of a single Standard Industrial Classification sampling frame. In preparation for sampling a given SIC, primary and secondary sources are used to carry out a variety of refinement operations. The goal of this frame refinement process is to create a final sampling frame consisting of all economic units producing in the SIC. Primary sources are telephone or written contacts with companies in the SIC. An industry analyst telephones the largest companies in the SIC as a standard part of the refinement process. Information from secondary sources, such as data collection during the previous cycle, industry company lists, and trade journals, is generally verified by a telephone call to the company.   Frame refinement operations include adding and deleting units, transferring units to or from another SIC's frame, splitting frame units into separate establishment records, and combining establishments. Whenever an industry analyst proposes adding a unit to an SIC's sampling frame, a search is made for the unit in other sampling frames, as well as in the particular SIC frame, to see if it can be located as a separate unit or should be split off from an existing record. Refinement of name and address information and other attribute data not connected with the sampling process is postponed until after sample selection, and then performed only on selected units during sample refinement.   74           A frame unit can appear in more than one SIC frame in a given cycle. This is permitted when there is disagreement as to the most appropriate SIC classification, there has been a recent change in production, or the previous SIC classification is believed to be incorrect. If a sampled unit is out of scope for the SIC for which it was sampled, it will be dropped from that sample.   There are several types of automated report listings that provide assistance at key points of the frame refinement process. A one- time listing produced prior to the beginning of refinement for a given SIC lists the outcomes of all previous cycle sampled units. Transaction trail lists can be obtained at any stage which show in chronological order all the operations (adds, deletes, transfers, moves, changes) made on the frame since the beginning of refinement. Frame report listings show the units available for sample selection. Frame exclusion listings show records that are not available for sample selection because they were already selected in another SIC.     IV. Identifying the reporting unit   Although the sampling frame for a given SIC is refined prior to sample selection, it may contain units which have been improperly included, i.e., at least a portion of the unit did not belong to the specified SIC or constituted a separate unit on its own. For such a sampled unit, there are detailed collection procedures for associating the sampling unit with a reporting unit, so as to minimize the loss of coverage. When a sampled unit is fielded for collection, the field representative locates the unit and verifies county, name, address, employment, and production, noting reasons for discrepancies. For sampled units which are clusters of establishments, this information is verified for each establishment in the cluster. A sampled establishment is considered to be correctly identified if an establishment is located which has three of the four defining characteristics of the sampled establishment. The defining characteristics in the Producer Price Index are location, ownership, production, and clientele. If two or more of the defining characteristics do not match, a check is made to see if the sampled establishment is located elsewhere. If it is not found elsewhere, then the establishment represented by the original frame record is considered to be out of business.   Although an establishment in a given cluster may be correctly located, it still, may not belong in the cluster because it is out of business, out of scope, misclassified, sold to a different company, or part of another records center of the same company. Whenever a portion of a cluster is determined to be sold, misclassified into one of a set of predetermined related SIC's, or part of another records center, the field representative forms a separate sampling unit for that portion known as a field-created sampling unit. When such a unit is formed, the original sampling unit is a cluster but does not constitute, as was supposed when it was sampled, a single economic unit operating in a single SIC. The new unit is retained in sample, provided it is in the sampled SIC or an SIC which is closely related to the sampled SIC. The purpose of forming field-created sampling units is to ensure that data are collected for all establishments that are legitimately part of the original sampled unit. Another method to minimize coverage loss is the treatment given dam for sampled units or portions of sampled units that are misclassified into non-related SIC's. These sampling units are given another chance of selection in their proper SIC. The results of this process determine if data collection should proceed for this Unit     V. Collection feedback   As industry samples are selected, fielded, and collected, the collected data are used to maintain the Producer Price Index Establishment Universe Frame. Much of this collection feedback process has been automated, so that many updates to the frame occur without being manually entered. Collection feedback forms, filled out by Washington, DC staff during review of collected data, are used to facilitate the same types of operations to the frame that take place in frame and sample refinement. These forms are used primarily to collect data about establishment clusters and to make changes on the basis of the data. As part of the collection   75           feedback process, the name and address information for sampled units is used to update the frame. Also, units for which the collected and assigned SIC's differ are transferred into the appropriate SIC in the frame, and out-of-scope and out-of-business units are moved to the death file. Sampled units that are called out of business with respect to one SIC because of significant changes to defining characteristics are added again to the active universe file as births in the appropriate SIC. Whenever a field- created sampling unit is formed, the frame is updated accordingly.     VI. Summary   Efforts to improve the accuracy and completeness of the Producer Price Index frame include:   - The continuous maintenance of a frame file, The yearly update of the Producer Price Index.Establishment Universe file with data from the Unemployment Insurance file, - The individual refinement of a single SIC's sampling frame, using telephone interviews, trade journal publications, and other secondary data sources, - The capture and use of dam obtained through a personal interview with individual sampled units, and - Updating during the repricing process.   In addition, certain definitional and procedural changes help to eliminate coverage losses that occur whenever sampling units are incorrectly formed or some of their identifying characteristics (e.g., ownership, production, location, and organizational structure) have changed.   Two new methodologies are being established which will improve the quality of business identification and increase the detail of information on frame units. The first involves the creation of the BLS Universe Data Base (UDB) system, which is patterned after the Producer Price Index Establishment Universe Maintenance System Currently, BLS maintains a computer system which combines the UI address files to create a national data base to be used by BLS surveys as a sampling frame. In early 1990, the UDB replaced this system. Improvements in the new data base will increase sampling applications and record linkage capabilities. Beginning with the operation of the UDB, a number of new data elements for reporting units will be stored. These elements will enable the system to trace both transfers of ownership and changes in configuration of reporting units, and to link records more comprehensively during file updates. State files will be provided quarterly instead of annually, and the production cycle will be shortened to allow frame users access to more current UI data.   The second major development is the creation of a new BLS Federal/State statistical enhancement project, the Business Establishment List (BEL). Since 1989, the BEL program has been working to transform UI data into an establishment-based micro-data file with work site identification information and physical location addresses. To accomplish this end, the BEL program has redefined many of the requirements for State collection and reporting of UI data to BLS. Employment and wage dam from multiestablishment employers will be reported at the work site level rather than aggregated as one unit. Subunits of multiunit employers will be identified with unique 3-digit reporting unit numbers. Information on these subunits will include primary and secondary names, a work site description, and a physical location address. The increases in business identification information and level of detail on UI files should reduce frame refinement work.   76           APPENDIX A.5. QUARTERLY AGRICULTURAL SURVEYS (QAS)   I. Introduction   The Quarterly Agricultural Surveys, conducted by the National Agricultural Statistics Service, provide inventory and production estimates for crops and livestock at State and national levels. The Quarterly Agricultural Survey utilizes two frames: A list frame for sampling efficiency and an area frame for coverage completeness. The sampling unit for the list frame is a name. The sampling unit for the area frame is a parcel of land (segment). The reporting unit in both cases is all land operated by one or more persons under a single land-operating arrangement. Each calendar quarter a list sample of farm operators (75,000) is contacted by mail, telephone, or personal visit for inventory information on the land they operate. Sampled. segments (i 6,000) selected from the area frame are also screened for farm operators (55,000). The multiple frame estimator utilized by NASS requires the matching of names between the two frames to identify those in the area frame who had no chance of selection from the list. These are referred to as nonoverlap operators. The nonoverlap domain estimate compensates for the incompleteness of the list, thereby completing coverage of the target agricultural population.     II. List frame construction and maintenance   The purpose of the list frame for the Quarterly Agricultural Survey is to improve sampling efficiency. Names, addresses, phone numbers, and measures of size for farm operators permit stratification for more efficient sample selection and allow the use of less expensive survey methods for more efficient dam collection. This list is not expected to be complete. Farming operations go in and out of business too quickly to expect to have a complete list However, considerable gains in efficiency can be expected from utilizing a list frame containing a significant proportion of the larger operations (Vogel and Bosecker 1974).   Incompleteness of the list is not a coverage problem when the list is backed up by a complete area frame. List incompleteness contributes to lower sampling efficiency in a two-frame estimator, but use of the area frame removes coverage bias due to omission of population units from the list Duplication in the list also lessens sampling efficiency, but one may appropriately compensate for coverage bias through the detection of duplicates in the sample (Gurney and Gonzalez 1972).   The list frame for the Quarterly Agricultural Survey was created using numerous input sources of farm operators. Many of the same operators appeared on several source lists. Therefore, two to three times as many names were brought together during creation of the list than eventually made up the list frame. Record linkage procedures based on work by Fellegi and Sunter (1969) and described by Coulter and Mergerson (1978) were used to standardize and remove duplication in the construction of the final composite list.   Master lists were built for several States in 1979, and all States were using the list frame system by 1982. After the initial creation phase, a continuous maintenance program has been in place to keep the frame current. New operations are added, those no longer operating are deleted, and the data associated with each active operation are updated as new information becomes available.   Six subsystems are utilized to facilitate management and utilization of the list sampling frame. The Source List Editor Subsystem standardizes input records into a common format, reduces matched records to a single record, identifies all components within the name and address of each record, and codes all names as an individual, partnership, or corporation. A Record Linkage Subsystem employs different linkage procedures for each class of names (individual, partnership, or corporate) to group potential duplicates by class into definite links, possible links, and non- links. The Group Resolution Subsystem codes a record to represent each linkage group,   77           matches across the linkage groups and classes, and produces the computer-identified possible matches for visual inspection. Duplication is kept to a minimum by removal of the computer- determined definite links and the identified duplicates among match groups.   The fourth list sampling frame subsystem is Dam Select. This program determines the "best" data from among several input sources to attach to the list record. This might be the largest value, most current value, or the value coming from the best source, depending upon guidelines specified for each of the variables of interest. The Sample Select Subsystem then stratifies the list frame using control variables for many different surveys and selects multiple samples simultaneously.   Finally, the Mail and Maintenance Subsystem is a frame and sample management system to create mailing labels and/or listings, amend the list frame with transactions from surveys, create special comments or changes for specific surveys, provide a history and track all changes after sample selection, and combine or organize survey samples for special needs, such as elimination of multiple contacts with the same unit for different surveys in the same time period.   Utilization of all components of the list frame system provides the means to maximize list coverage for the agricultural variables of interest and minimize duplication within the list. Remaining undercoverage in the list is compensated for through the area frame sample, and remaining list duplication is adjusted for using information received for the sample.   The most serious problem that could befall the list frame in the Quarterly Agricultural Survey multiple-frame context would be for names from the area frame sample to somehow be added to the list. This would compromise the ability to estimate from the area frame the proportion of the population of farm operators who are not on the list. The necessity for independence between the list and area frames is emphasized in all National Agricultural Statistics Service training manuals and classes. A thorough discussion of the potential for list contamination and the consequences is given in Vogel and Rockwell (1977).     III. Area frame construction and maintenance   The primary purpose of the area frame is to provide a probability mechanism for estimating the entire population of crops and livestock in the United States. Since all crop a=age and all livestock are physically located on land, complete representation is assured if the total land area is divided into sampling units. The description of how this is accomplished through land use stratification by State and county throughout the United States is given by Cotter and Nealon (1987).   In the Quarterly Agricultural Survey, the area sample supplements the list sample, accounting for list incompleteness, to provide full coverage of the agricultural population of interest. Typically, the area frame nonoverlap domain covers 10 to 20 percent of the total crop and livestock inventories.   Since complete coverage is a primary function of the area frame, there are a number of control practices in place to ensure all land is represented without duplication or omission. First, a premium is placed on the use of good, identifiable, permanent boundaries which can be marked on maps and photographs and recognized by field representatives at the site. Rand use stratification boundaries and sampling unit boundaries are drawn to provide a clear demarcation even at the expense of some sampling efficiency if necessary.   Second, the areas defined by strata and clusters of sampling units are electronically digitized so the total for the frame can be computed and compared to the known land area for the county and State. The accumulated State area is allowed to vary + 0.5 percent from the published area.   78           Both the original frame materials containing the boundaries and a graphic representation of the digitized boundaries are reviewed for completeness.   Finally, the sampled segments are also digitized to determine land area. Field representatives accumulate reported acres in each segment and compare the reported total against the digitized total. This control ensures complete coverage of each sampled segment.     IV. Rules of association   A given area of land may be represented in the Quarterly Agricultural Survey in several ways through the list frame as well as appearing in the area frame. The operation may have a name unto itself as well as having the name(s) of one or more operators associated with the land. Any of several partners may be sampled to provide the information requested for the same parcel of land.   To control this potential for duplication, there are several rules of association set forth in field representative instructions and in supervising and editing manuals. A list-dominant rule provides for the list frame to account for any land which may be reached through the list frame, that is, an area of land may belong to the area nonoverlap domain only if none of the names associated with the land is represented on the list.   Within the list frame, potential for duplication is controlled through priority rules governing which names associated with a given parcel of land will be considered the dominant sampling unit. All data for an operation will be associated with the list name assigned greatest priority. An operation name, if any, is given top priority because the name tends to be attached to land operated under that title for a longer period of time than the names of individual operators. Ibis is particularly true in the case of managed land, where the operation must have its own name appear on the list to be considered overlap. The name of the hired manager is not used to determine the overlap status of the operation.   In the absence of an operation name or if the operation name is not on the list, the land area may be represented through the list frame by a combination of the names of the individuals who make up a partnership (second priority) or, finally, by the name of any individual actively considered an operator, alone or in partnership, if that individual participates in making the day-to- day decisions affecting the farming of land.   Partnership operations present a particularly difficult situation. It must first be determined that a true partnership in operating the land exists, i.e., more than one person jointly operates the land. Thus, since each person can report for that operation, a rule to account for the data only once is needed. To do this, the Quarterly Agricultural Survey utilizes a highest stratum rule. If more than one partner is on the list, the data will be accepted only from the partner in the list frame stratum with the highest number. If more than one partner belongs to the same highest stratum, the data will be divided equally between them. The procedure minimizes the division of data among sampling units and attaches the data with operators having the largest stratifying control data and highest sampling rate.   By far the largest portion of the list frame consists of names of individuals who operate their own farms. However, an individual may be involved in more than one operating arrangement. According to the established rules of association, individuals should report for each of the different land operations in which they are an active operator. For example, if a person has an individual operation and is a partner in another operation, that individual should provide a report for each operation. Each operation will then be considered separately according to the priority rules governing its representation on the list frame.   79           V. Error avoidance   Quality control measures during construction of the frames and proper rules of association are only the first steps in ensuring proper coverage. The rules governing the representation of each population unit must be observed during data collection. The Quarterly Agricultural Survey makes use of written instructions, formal g, active supervision, questionnaire prompting, performance evaluations, and reinterview samples to aid and monitor field representative activities. Completed questionnaires are reviewed by office personnel and submitted to computer edit and analysis both within and between questionnaires.   Problem areas requiring a great deal of attention to minimize coverage errors in the Quarterly Agricultural Survey include:   - Obtaining all names actively associated with the sampling unit, - Determination of the nonoverlap domain, - Obtaining an accurate report of the total acres being operated, - Reporting all data, regardless of ownership, on land operated, and - Nonresponse.   Beller (1979) documented these areas of concern in "Error Profile for Multiple Frame Surveys."   If field representatives do not obtain all the names appropriate for the sampling unit, the rules of association described earlier cannot be applied properly. Errors could lead to either omission or duplication, depending upon the frame from which the unit was sampled and the status of the missing names on the list frame. Emphasis is given during field representative training, in the instruction manuals, and on the questionnaires to the importance of providing the operation name, if any, and the name(s) of all operators.   If the operation were found in an area frame sampled segment, all of the names will be checked against the list. When any of the names is found, the list frame identification number is attached to the corresponding name, and the operation belongs to the overlap domain. If the operation were sampled through the list frame, all other associated names are obtained and checked against the list.   Even if all names are available, it is not always an easy task to determine whether a name from the area frame is the same as one on the list. More than one individual may have the same name near the same location. In these cases, middle initials, telephone numbers, Social Security numbers, and other identifiers help determine true matches. Spellings may differ slightly or nicknames may have been used. Great care is taken to investigate possible matches. Even after the pairings have been made and the list identification number has been attached to the name from the area sample, another verification on a computer listing of matched names is required. Investigations into the operational application of these rules are reported by Bosecker and Kelly (1975), Hill and Rockwell (1977), and Nealon (1984).   An accurate report of the total acres operated is important to the area frame estimates of list incompleteness, i.e., the nonoverlap domain component. The importance of reported total land stems from the use of a proportional or weighted allocation of data associated with reporting units. The weight is determined by prorating whole farm data for the area nonoverlap respondent in proportion to the amount of land operated inside the sampled segment versus the total land operated. Complete coverage is achieved and duplication is avoided when the sum of a farm's land parcels across all possible area frame sampling units equals the total farm size, i.e., the sum of proportional weights equals one.   80           The potential for reporting error exists because the respondent may not include all types of land when providing the total farm acres. The portion of the farm inside the sampled area segment is outlined on an aerial photograph, facilitating a fun accounting of the acreage. The remaining acreage in the farm is more dependent upon the respondent's concept of acreage to report. The questionnaire is designed to remind the respondent of all the acres operated, whether owned, rented, or managed, and all types of land, including woods, waste, and roads, so the land outside the sampled segment is reported comparably to the land inside.   Once the total land operated has been established, all requested data are to be reported regardless of who owns the crop or livestock commodity. This applies to both area and list sample respondents. Emphasis on this concept is required because a natural inclination of some respondents is to report only what they own. Since coverage for the Quarterly Agricultural Survey is dependent upon accounting for the variables of interest through the acres where they are located, much effort is expended to ensure compliance with the concept and accuracy in the reported data.   Nonresponse in the Quarterly Agricultural Survey varies by State but typically ranges from 10 to 20 percent. Two types of adjustments are made using data from only the respondents to make inferences for the total agricultural population. The first procedure adjusts sample sizes downward to the number of respondents by list stratum. Therefore, the assumption is made that within each list stratum, nonrespondents share the same agricultural characteristics as respondents.   Evidence that respondents average fewer head of livestock than nonrespondents is provided by Gleason and Bosecker (1978) and Crank (1979). Therefore a second approach is also used. Information is provided by the field representative through observation or secondary sources on the presence or absence of individual commodities for nonrespondents. Through imputation or summary adjustment, this information is used to associate respondent data with nonrespondents having similar operations.   A coverage problem posed by nonrespondents which is sometimes overlooked concerns the status or classification of the sampling unit as a viable operating entity. Simple adjustment to sample sizes for nonresponse assumes the same proportions of nonrespondents as of respondents are out of business. Imputation may assume a unit is in business. However, there are two main sources of nonresponse--refusals and inaccessibles. Refusals most often have the items of interest (which they do not want to report, so they refuse), while inaccessibles may be in business but unreachable or may not be found because they are out of business.   Nonrespondents in the Quarterly Agricultural Survey are coded as in or out of business using available information in the same way individual commodities are coded for their likelihood of existing on an operating unit. Units for which there is no evidence of current operation may therefore be more properly handled as zero- contribution sampling units.     VI. Comparative analysis   Many of the commodity totals estimated through the Quarterly Agricultural Survey move through the agricultural marketing channels and are therefore amenable to comparison with administrative data. Some examples of this include slaughter data for hogs and cattle, milk production for dairy cows, crushings for soybeans, and sales of cotton. Even though all of the commodity may not be accounted for through one source or process, a limited number of possibilities affords the opportunity to construct a balance sheet to account for total production.   For example, survey measurements of soybean production for 1987 plus carryover soybean stocks in storage establish the total available soybeans for the 1988 marketing season. By   81           monitoring exports, soybean Processing, seed use, imports, and remaining stocks in Storage at the end of the cycle, a reasonable accounting of the uses for the total available soybeans can be made. Because of sampling errors in surveys and imperfections in administrative records, there will be residual or unexplained differences between supply and use. However, differences exist within reasonable limits and can be monitored over time. Problems in survey coverage become readily apparent with this use of administrative data.   Another useful source of data for comparison is the Census of Agriculture conducted by the Bureau of the Census at 5-year intervals. Operators provide inventory numbers for a specific date (December 31) and production statistics for the census year. The census has its own problems in achieving complete coverage and is of course also subject to nonsampling errors during data collection. However, the target population is the same as that of the Quarterly Agricultural Survey and differences between the two measurements lead to useful analysis for evaluating coverage.   The checks and balances which exist for the Quarterly Agricultural Survey estimate s subject the results of this survey to a scrutiny by the data users which is rare among government surveys. Measurements for given dates are verifiable by subsequent events.   82           APPENDIX A.6. MONTHLY REPORT OF INDUSTRIAL NATURAL GAS DELIVERIES   I. Introduction   The Energy Information Administration publishes monthly State estimates of volumes of natural gas delivered to consumers, by major type of consumer. Form EIA-857, "Monthly Report, of Natural Gas Purchases and Deliveries to Consumers,," is completed monthly by a sample of firms that =sport natural gas to consumers. These firms include inter and intrastate pipeline companies and local distribution companies. An important aspect of the EIA-857 data system for this discussion is that the dam are used to estimate monthly deliveries of natural gas to each of three consumer sectors: Residential, commercial, and industrial. In addition, estimates for total gas deliveries to consumers within each State and within each sector for all States are published, as is an estimated grand total for all gas delivered to consumers in the United States. These estimates are published in the Natural Gas Monthly. National estimates are also published in the Monthly Energy Review.   There have been two versions of Form EIA-857. The first version was approved for use in December 1984 by the Office of Management and Budget, and was in place through 1987. It asked for volumes of natural gas sold to consumers and for revenues derived from those sales. The second version of the form has been in place since January 1988. Unlike the former version, this form requires reporting on a custody rather than equity basis, asking for deliveries to consumers and revenues derived from that portion of the deliveries that is sold. Figures 1 and 2 illustrate the phenomena of deliveries to industrial consumers and sales to industrial consumers. The former illustrates the physical flow of gas from the well to the buyer while the latter shows possible flows of ownership of that gas. It can be seen that the two need not be identical.   The frame, for the initial EIA-857 survey included companies responding to either of two annual surveys: EIA-176, "Annual Report of Natural and Supplemental Gas Supply and Disposition," a custody- based form; or Form FERC-50, "Alternate Fuel Demand Due to Natural Gas Deficiencies," an equity-based form. A company was eligible for selection if its reports on either data system indicated deliveries or sales to consumers in either the residential, commercial, or industrial sectors. The Form FERC-50 was discontinued in 1986. Therefore, the frame now comprises only respondents to the Form EIA-176.     II. Problems in coverage   At the outset, EIA recognized that there was a problem in asking for sales ("gas for which consumers were billed") rather than for total deliveries. There were some obvious logical gaps in coverage of total deliveries when measured on an equity basis. Most errors from inquiries on an equity basis arose from the fact that not all the sellers were in the frame. The frame covered all physical deliveries of natural gas. EIA knew this at the onset, and expected the effect to be trivial in the residential and commercial sectors, where the two phenomena are largely coincident. However, industrial customers were large users of fuel and had a responsibility to minimize their operating costs, so it was expected that there would be some third-party sales. Indeed, Form EIA-176 had been asking about volumes delivered to industrial customers for several years, and a noticeable volume was reported in that category.   The first solution to this problem of frame undercoverage was to adjust the monthly estimates from the survey. This adjustment was through State factors derived by comparing the EIA's official annual volumes for total deliveries to industrial consumers obtained through the EIA-176 data system to the corresponding 12- month sum of reported sales to industrial consumers. Since the monthly pattern of change in the mix of gas reported by the monthly system compared to total gas was unknown, the adjustment factor was applied consistently throughout the year.   83   Published monthly volumes of deliveries to industrial consumers thus had two components: The weighted sample responses and an imputed difference based on known totals in the previous year.   After the end of a year, Form EIA- 176 was used to determine the total volume delivered to consumers. Figures derived from this data system have been EIA's official volumes and prices for natural gas delivered to the residential, commercial, or industrial sectors. Accordingly, when the EIA-176 full-year totals became available, the monthly published estimates were revised. These revisions first took into account minor revisions to submissions from respondents during the year. Then the difference between the 12-month total of monthly estimates from weighted submissions and the value from the EIA-176 system was allocated among the months in proportion to the distribution of the weighted submissions.     III. Trend in share of industrial gas transported for others   Responding to changes in the national regulatory environment and opportunities afforded by the surplus of natural gas in the mid- 1980's, the natural gas industry changed the way it conducted business. More and more transactions involving large consumers were conducted through the spot market. In the spot market, a third party, the seller, was involved. No longer was there a firm relationship between the delivery and sale of gas to industrial customers. Rather, the relationship became quite fluid, with the potential existing for purchases from many sources, even within the same month. The role of the pipeline or, less often, the distribution company was becoming more akin to that of a common carrier. Their role increasingly was simply to move the commodity from one place to another.   As activity of this sort increased, it had the following impact on EIA's estimates:   - More of the sellers were out of the sample; - More of the sellers were unknown to EIA as the importance of a new sort of function, the broker, increased; and - The tenability of the assumptions underlying the adjustment of submitted volumes based on the previous year's experience was reduced.   By 1987, the situation had deteriorated substantially. EIA's estimates for industrial gas consumption indicated a substantial downward trend, even though gas was at a relative price advantage compared to residual fuel oil with which it competes in plants and factories having fuel switching capability. Furthermore, an increasing amount of the total gas in the national system could not be accounted for.   It was obvious that assessing the market through an equity-based data system was yielding estimates that were going increasingly awry. In reaction to these changes in the industry, EIA instituted long-term and short-term fixes. The long-term fix was to convert the monthly system from an equity to a custody basis. The short- term fix was to reevaluate the assumptions behind the monthly adjustment of weighted submissions to arrive at an estimate for publication and a change in the adjustment protocol.   Examination of the patterns in the proportion of industrial gas delivered for the account of others indicated three major facts:   - The proportion of industrial gas delivered for the account of others was increasing on a national basis; - The pattern within States was too inconsistent to deal with adequately; and   85           The assumption of a consistent relationship between reported sold volumes and true total industrial consumption throughout the year indicated by the previous year's relationship was probably damaging the estimates.   Confronted with an upward trend, EIA reevaluated the assumption behind the procedure for adjusting weighted submissions to an estimate for publication. The assumption of a constant relationship throughout the year between the two, with a sharp step implied between December and January was discarded. Because of the inconsistent pattern within States between years, the adjustment procedure at the State level remained as it was.   The new national adjustment procedure involved two major changes. First, a linear trend in the change in the proportion of gas transported for the account of others was introduced Second, the pivotal month for the change was shifted from January to June. In years for which there were end-of-year dam (i.e., annual data for the succeeding year had been determined), the trend line was fit at both the beginning and end of the 12-month period. For 1987, when the problem was most acute, there was no end-of-year peg. The linear trend from 1985 to 1986 was allowed to continue through 1987 for purposes of the adjustment algorithm. Figure 3 shows the result of the new adjustment procedure compared to the old.   The long-term solution to the coverage problem arising from changes in the industry was a redesign of the form. As noted earlier, a new version of Form EIA-857 has been in place since January 1988. Responding to the changes in the industry, the new version has shifted from an equity to a custody basis. The change to a custody basis solved the coverage problem arising from the inability to identify companies that sold but did not deliver gas. The frame for the sample now includes respondents to the annual EIA-176 survey, which is a census of companies that deliver natural gas to end users.   EIA now believes it is more adequately monitoring the volume of natural gas being delivered to industrial consumers. Furthermore, the survey is now monitoring the relative amount that is being delivered for the =spotters' own accounts and that is being delivered for the account of others.     IV. Comparative analysis   With the introduction of the new version of Form EIA-857, reporting as it does on the same basis as required by the annual Form EIA- 176, comparison of responses to identify potential coverage or respondent error problems in the monthly dam system has been possible. During the summer of 1988, after 6 months of data were submitted to the monthly system and data in the annual system had been edited and cleaned, a comparison of relative volumes of industrial deliveries reported on a company and State basis was carried out. Several outliers were identified, some of which were of sufficient magnitude to substantially affect the validity of estimates derived through the data system. Follow-up inquiries to the companies indicated that, in a number of cases, the respondents were not reporting the volumes for which EIA had been asking. In those cases, revisions to the submissions were obtained. In other cases, the differences truly reflected changes in the volume of gas being delivered in the companies' marketing or delivery areas.     V. Conclusion   In this case study, some of the types of coverage error that have been encountered in the Energy Information Administration's efforts to monitor monthly deliveries of natural gas to industrial end users and the measures taken to correct them have been described. The initial version of the form, couched in terms of sales, never completely represented the population of interest because of known problems of coverage in the frame.   86   Analytic attempts to adjust for that undercoverage came increasingly into question as the organization of the industry changed in unanticipated ways. The eventual solution was a change to the form, and the data system, to monitor physical deliveries rather than equity transfers (sales). With this change, together with additional questions on the form asking for both sales volumes and volumes delivered for the account of others, the coverage of deliveries of gas to industrial consumers has appeared to have improved dramatically. An additional benefit of the change in the form has been that comparative analysis between the monthly and annual data, system on a respondent level is now possible. This provides a means of checking for possible respondent error, as the two data systems now ask for directly comparable information, though in different time periods.   88           APPENDIX A.7. CURRENT POPULATION SURVEY (CPS)   I. Introduction   The Current Population Survey is a housing unit sample survey conducted monthly by the Bureau of the Census for the Bureau of Labor Statistics (U.S. Bureau of Labor Statistics 1989). Its primary purpose is to obtain estimates of employment, unemployment, and other characteristics of the general labor force, of the population as a whole, and of various subgroups of the population. Although Hanson(1978) and U.S. Department of Commerce (1978a) describe the sample design based on the 1970 decennial census, this work also applies to the current sample design, which is based on the 1980 decennial census.   The Current Population Survey sample design is a multistage stratified sample of the U.S. population. It is a State-based design that reflects urban and rural areas, different types of industrial and farming areas, and the major geographic divisions of each State. It is a rotating panel design wherein sampled units are interviewed for four consecutive months, dropped for eight months, and then interviewed for another four months. Each month, a new sample panel, one-eighth of the total sample, is introduced for the first time.   The target population for the Current Population Survey includes every person in the United States who is 15 years of age and over and is not institutionalized or in military service. Since 1967, the official tabulations have been restricted to dam for persons 16 years of age and over. Institutionalized persons are those in correctional and health care facilities who are in the custodial care of someone else and are not free to come and go as they choose. Nfilitary persons are those who are on active duty in the Armed Forces. The target population is not restricted to citizens of the United States. Any person residing in the United States at the time of interview is a member of the population of interest. "The United States" refers to the 50 States and the District of Columbia. However, persons who are living on the premises of embassies of foreign countries, and persons who are citizens of foreign countries and are merely visiting or traveling in the United States are excluded.     II. Sample design   To reach the target population, a sample of residences that could be occupied by persons who are not institutionalized or in the military is first selected. The primary frame for the current sample is the file of addresses created for the 1980 Census of Population and Housing.   Primary sampling unit (PSU). Before any sampling takes place, the entire United States is partitioned into basic geographic units of sampling. Traditionally, these basic units have been counties for all areas except New England, where the township is used. In some States, boroughs and independent cities are also used as basic units. These basic units are combined to create primary sampling units. Every basic unit is contained in one and only one PSU, and the complete list of PSU's geographically encompasses the entire United States. Each large metropolitan area is considered to be a PSU, although some are split for administrative reasons. Other primary sampling units are formed by grouping one or more adjacent basic geographic units. About 2,000 PSU's are formed out of the more than 3,200 basic geographic units in the United States.   The PSU's are grouped into strata, with the largest-population PSU's being placed in a stratum by themselves. One PSU is selected with probability proportional to population from each stratum. The Current Population Survey sample design consists of 713 sampled PSU's.   Within-PSU sampling.. The primary frame for within-PSU sampling is the list of addresses created for the 1980 decennial census. First, a sample of census enumeration districts, a   89           geographic area containing 400 addresses on the average, is selected with probability proportional to the number of housing units or housing unit equivalents.   A housing unit is a group of rooms, or a single room, occupied or intended for occupancy as separate living quarters. Separate living quarters are those in which the occupants do not live and eat with any other person in the structure and that have direct access from the outside of the building or through a common hall. Housing unit equivalents are the noninstitutional group average number of persons per housing unit. those persons who live in such places as dormitories, rooming and boarding houses, communes, hotels, motels, and convents, and can be characterized as typically sharing some living arrangements. Each group quarters and any separate housing units associated with it are considered a special place.   An ED selected for the sample may be either an address (or list) ED or an area ED. An address ED is one that is in an area that issues permits to build new residential housing and contains less than 4 percent incomplete addresses (that is, post office boxes, rural route numbers, and so forth). An area ED is either one that is not in a permit-issuing area, or one that is in a permit-issuing area but contains 4 percent or more incomplete addresses.   Address segments. If the sampled ED is an address ED, the housing units in that ED are grouped into clusters of housing units (usually four to a cluster), so that the number of clusters formed equals the number of measures in the ED. The clusters are matched to determine the specific measures selected for the Current Population Survey. The 1980 census basic addresses (house number and street name) in a cluster are entered on listing sheets and are assigned for field interviewing as address segments. At the appropriate time, a Current Population Survey field representative visits the sampled cluster of one or more basic addresses in the address segment.   At take-all basic addresses (which the 1980 census listed as having one, two, three, four, and sometimes five housing units), the Current Population Survey field representative makes a new listing of all the housing units at the basic address and interviews all of therm. Listings of take-all addresses are updated prior to the fifth month in the sample to account for any changes.   At non-take-all basic addresses (which the 1980 census listed as having more than four housing units), the field representative makes a new listing of all the housing units at the basic address, and interviews only those listed on lines of the listing sheet predetermined for the current sampled cluster. Non-take-all addresses are updated about once a year to account for any changes.   In the 1980 Current Population Survey design, address segments constitute about 60 percent of the sample.   Special place segments. If the sampled measure in an address ED is a group quarters measure, measures of size are computed for each special place in the ED. The sampled special places are identified and assigned for field interviewing as special place segments. The field representative lists the special place the month prior to its first interview. The regional office applies a predetermined random start and a sampling interval to the listing to select sampling units for interview. The special place listing is updated at least once a year to account for changes.   In the 1980 Current Population Survey design, special place segments constitute about 1.5 percent of the sample.   Area segments. In area ED's, blocks or chunks of land are identified which contain one or more measures. Measures of size are assigned to these blocks or chunks, and then sampled using a random start and sampling interval approach. A map is prepared outlining the sampled block or   90           chunk which is then assigned as an area segment. A month before the area segment is first to be interviewed, the field representative lists all housing units and special places in the area segment. The regional office applies a predetermined random start and sampling interval to select sampling units to be in sample. A different random start is used for each new measure (new Current Population Survey sample). Area segments are updated about once a year to account for any changes.   If the area segment is in a permit-issuing area, any new residential construction listed in the area segment is deleted from the sample, since it has a chance to be sampled from the new construction frame described below. New construction is represented in permit-issuing area segments in this way to control the variance of the size of the area segment between the 1980 decennial census and the time it is in sample. If the area segment is in a non-permit-issuing area, the new residential construction listed in the area segment is sampled, since that is its only chance to come into sample.   In the 1980 Current Population Survey design, area segments constitute about 28 percent of the sample. About half the area segments are permit-issuing and half are non-permit-issuing.   Supplemental frame for new construction. To represent in address ED's and area ED's the construction of residential housing that has occurred since the 1980 decennial census for areas that issue permits to build new residential housing, the building permits issued for residential housing units within the sampled PSU's are sampled. The source of building permit counts within the sampled PSU's is the Building Permits Survey conducted by the Bureau of the Census. These counts are obtained each month and form the basis for a sampling frame of new construction housing units.   Each month, Bureau of the Census field representatives visit a sample of building permit offices to copy addresses from permits for new residential construction issued that month. The addresses are added to the new construction frame with some geographic clustering to minimize interviewer travel costs. Those corresponding to current sampled measures are selected and assigned for field interviewing as permit segments.   As of April 1990, permit segments constitute about 13 percent of the CPS sample. The proportion of the sample that is located in permit segments increases with time since the 1980 decennial census. To maintain a constant sample size, reductions in the old construction part of the sample are made periodically.     III. Magnitude of coverage errors   The Bureau of the Census measures overall coverage error monthly in the CPS by age, sex, race, and for Hispanics and non-Hispanics by comparing survey-based estimates to estimates based upon the most recent decennial census updated for births, deaths, immigration, emigration, and aging of the population. (The procedure by which the survey-based estimates are adjusted for noncoverage is given in this appendix, section VI.)   Tables 14 and 15 (from Hainer, et al. 1988) show average 1986 CPS coverage ratios for selected demographic categories. The ratio .932 in the upper left cell of the table means that CPS coverage for the total population is 6.8 percent lower than coverage in the census. (Note that a noninterview adjustment is applied before the coverage ratios are computed, so that noninterviews do not contribute to the ratios in these tables being less than I.O.) The coverage of males is consistently lower than that of females, except for Hispanics aged 14 to 19. The group aged 20 to 24 has the lowest coverage, for both whites and blacks. Hainer, et al. (1988) state that "... overall undercoverage for black males is 17 percent worse than the census, and males 20-24 are 27 percent worse." The data in table 15 show that the undercoverage of Hispanics seems to   91           be even lower than for blacks. Twenty percent of all Hispanics are missed. Note also that these ratios do = account for undercoverage in the census.     Table 14. 1986 average coverage ratios by age, sex, and race for the CPS   Total 14+ 14-19 20-24 25-44 45-64 65+   Total .932 .946 .887 .924 .935 .967   White Total .939 .951 .902 .930 .941 .972 Male .925 .950 .885 .914 .936 .946 Female .950 .951 .919 .946 .946 .986   Black Total .874 .904 .778 .856 .884 .946 Male .833 .884 .733 .805 .861 .927 Female .907 .924 .820 .910 .906 .956   Table 15. 1986 average coverage ratios for Hispanics by age and sex for the CPS   Total 14+ 14-19 20-29 30-49 50+   Total .798 .845 .769 .808 .800 Male .773 .870 .731 .762 .782 Female .823 .820 .792 .853 .816     IV. Possible sources of coverage error   Census misses. In the 1980 CPS design, there is no process to account for basic addresses missed in the 1980 census in address ED's. This probably accounts for less than 1 percent of the CPS sample size. In area ED's, census misses have a chance to be sampled, since the CPS field representative makes a new listing of area segments as they come into sample.   Conversion from nonresidential to residential. Structures that were entirely nonresidential at the time of the 1980 census were not listed in the census. In address ED's, if they are converted to residential, e.g., lofts, they have no chance to be sampled. The new construction housing population includes only permits for whole-structure construction and not any conversions. The use of permits for conversions is not as clearly defined and systematic across building permit offices as it is for whole-structure construction. There is no good estimate of the extent of conversion of existing structures from nonresidential to residential in address ED's.   Time lag between permit issuance and entering the CPS sample. In the 1980 CPS design, there is a 7-month lag between the time a permit is issued for new construction and the time the structure has a chance of entering the CPS sample. This is- the time it takes to list, key, cluster, sample, and prepare permit segment materials for the field. Thus, for a short period of time there are units in the new construction population that may not be represented. Linebarger (1975) estimated that approximately 12 percent of the units for which building permits were issued could be interviewed 4 months after the date of issuance. This is a cumulative figure, however, so most of these units could not have been interviewed during each of the 4 months.   Permit lag at the time of the 1980 census. Some housing units for which permits were issued prior to the 1980 census were not built until after the 1980 census, and, therefore, were not listed in the 1980 census. In address ED's and permit-issuing area ED's, these housing units had no   92           chance to be sampled from the census frame. However, net undercoverage was avoided by using a start date prior to the 1980 census for beginning the new construction frame. The start date was selected so that the expected number of housing units for which permits were issued prior to the start date but not built in time to be listed in the 1980 census (zero chance of selection for the 1980 CPS design) would be equal to the expected number of housing units for which permits were issued after the start date and were built in time to be listed in the 1980 census (two chances of selection for the 1980 CPS design). (See Statt, et al. 1981 for details.)   Illegal new construction. About 2 percent of the new construction in address ED's and permit-issuing area ED's is built without benefit of permit. These newly constructed units have no chance to be selected for the CPS sample. The Bureau of the Census in a 1964 study estimated illegal construction to be approximately 3.3 percent of all new construction (U.S. Bureau of the Census 1989). However, current illegal construction is believed to be lower than the 1964 estimate due to tighter zoning laws.   New construction of special places. Permits issued for the construction of new special places, whether entirely new or additional structures in existing special places, are not sampled. Thus, in address ED's, they have no chance to come into sample. This should contribute very little to the undercoverage, given the small proportion of the CPS sample that is in special places. In area segments, whether permit-issuing or non-permit-issuing, all special places are listed and sampled without regard to their date of construction.   Mobile homes. Individual mobile homes placed at addresses that were not listed in the 1980 census have no chance to be sampled in address ED's. Likewise, permits for new mobile home parks in address ED's, ff issued, are not listed and sampled. In both permit-issuing and nonpermit-issuing area segments, individual mobile homes and mobile home parks are listed and sampled.   The Survey of Mobile Home Placements during April 1980 to August 1985 revealed an undercoverage of new mobile homes of 25 percent (Schwanz 1988b). Coverage improvement for mobile homes is being investigated for the post- 1990 census redesign of the Bureau of the Census' demographic surveys. State health department and county tax office data are being evaluated for use as coverage improvement frames for some geographic areas, as are data from the Survey of Mobile Home Placements. If these two approaches are not feasible or are insufficient, then an area canvass approach will be used.   Year-built determination. In permit-issuing area segments, the field representative must determine the year each residential structure listed was built. Those built after the 1980 census have a chance to be sampled from the new construction housing frame. They must be deleted from permit-issuing area segment listings so they do not have two chances of selection. If the year built is determined incorrectly, a structure built before the 1980 census may be deleted by mistake, resulting in undercoverage. Likewise, field representatives may determine the year built when they should not (for example, at a mobile home or special place) and mistakenly delete such a unit from the sample. Limited investigations, however, indicate that field representatives just as often retain units in sample in error as delete them in error, due to the year- built procedure. The result is no net loss in coverage.     V. Past attempts to remedy undercoverage   The following coverage improvement procedures developed by the American Housing Survey were used in the 1970 CPS design but are not being used in the 1980 CPS design.   Successor Check. This check was conducted in address ED's to improve coverage of conversions from nonresidential to residential, individual mobile homes placed at an address that   93           did not exist in the 1970 census, and existing structures moved to an address that did not exist in the 1970 census. Tbe field representative started at a designated sampling unit, followed a specified path of travel, listed a string of eight housing units, and determined whether they existed at that address at the time of the 1980 census. The Bureau of the Census matched the strings of housing units against the 1970 census listing of addresses. Those not found in the 1970 census listing and not built after the 1970 census (and therefore not in the new construction frame) were added to the successor check frame. The improvement in coverage was marginal. Due to matching errors, some housing units added to the CPS sample were determined in the field to duplicate housing units selected for the CPS sample from the 1970 census frame. The field successor check procedure was both expensive and very difficult for the field representatives to apply consistently. AR these factors led to dropping the successor check from the 1980 design (Montie and Schwanz 1977).   The Woodall frame. This frame was a commercial list of new mobile home parks. The Woodall company stopped collecting this information in 1975, so it could not be used as a source of coverage improvement for new mobile homes in the 1980 design.   The windshield survey. In address ED's, a frame of new mobile home parks was created as follows. A probability sample of about 200 tracts was selected in those PSU's expected to be most likely to create new mobile parks. The field representatives canvassed the tracts, listed any mobile home parks found (presumably by driving around and spotting them through the windshield of the car), and determined when the mobile home park was created. The Bureau of the Census matched the mobile home parks listed against the 1970 census listings for the sampled tracts. Those mobile home parks not found in the 1970 census listings were added to the windshield frame (Montie and Schwanz 1977). The limited scope of the frame (only 200 tracts) and the expense of the field canvassing and clerical matching prevented this frame from being added to the 1980 design. It may, however, be used again for the post- 1990 census design.   Incomplete addresses. In the 1970 design, incomplete addresses were routinely deleted from the 1970 census listings prior to selecting the CPS sample. In the 1980 design, incomplete addresses are retained and given a chance for selection. Since an address ED must have less than 4 percent incomplete addresses, and very few address ED's have any incomplete addresses, very few incomplete addresses actually are selected for the CPS sample in the 1980 design. When they are, locator materials are prepared to help the field representative locate the incomplete addresses. The locator materials consist of a copy of the 1980 census listing that includes the incomplete address and a copy of the ED map. The 1980 census enumerator was supposed to spot the incomplete address and surrounding addresses on the ED map. Field representatives have had no difficulty locating incomplete addresses using these locator materials in the 1980 CPS design.     VI. Evaluation and adjustment methods   Area vs. list frame coverage. In the post-1980 census design, the National Health Interview Survey (NHIS) is using an all-area design (Parsons and Casady 1986). That is, in ED's that would be address ED's in other Bureau of the Census surveys, blocks or parts of blocks are selected and assigned for field listing and interviewing as block segments. In ED's that would ordinarily be area ED's, area segments are formed in the usual way. The field representative canvasses the block segment and lists all its housing units and special places. Thus, the addresses listed in the 1980 census are not used. To offset the greater cost of an all-area listing approach, block segment listings are not periodically updated. It was felt that listing errors would be lower in block segments than in traditional area segments because, in address ED's, blocks or parts of blocks could be clearly defined, easily canvassed, and accurately listed. It was expected that updating would pick up mostly new construction, which already has a chance to be sampled in the new construction frame.   94           To evaluate the coverage of listing in block segments, the Bureau of the Census matched the listings of a subset of block segments to the 1980 census listing of addresses for the same block. A preliminary report (Waite 1989) shows that after reconciliation, NHIS block segments have an overall underlisting estimate of at least 3 percent. This compares to an underlisting estimate of I percent for traditional area segments, as measured by the 1985-88 coverage reinterview of NIES area segments. It was not expected that block segment listings would be worse than area segment listings in NFUS. Perhaps the match operation does a better job of picking up underlisting than does coverage reinterview. In addition, NHIS coverage ratios in the 1980 design are no worse than in the 1970 design. Perhaps the NHIS block segment underlisting was compensated for by overlisting. The match study was a one-way match and did not measure overlisting. A new two-way field match and update study has been proposed for NHIS in 1990 to improve measurement of coverage error in the NHIS all-area.design.   Adjustment methods. As stated in section III, the Bureau of the.Census produces population estimates monthly by updating the last decennial census figures for births, deaths, immigration, emigration, and aging of the population. In addition to using these population estimates to measure undercoverage, as discussed in section M, they are used in a weighting adjustment in the CPS and most other housing unit surveys conducted by the Bureau of the Census. In this adjustment, the weight for each person in the sample is modified so that the CPS estimates for the population by age, sex, and race categories and for Hispanic and non-Hispanic groups agree with the independently determined population estimates. To the extent that labor force characteristics of missed persons are the same as the labor force characteristics of covered persons in the same age-sex-race group, this weighting adjustment reduces the bias caused by undercoverage. Hanson (1978) points out: "... this adjustment should be regarded as possibly ameliorating, but certainly not as removing the potential bias involved in coverage losses." Furthermore, the adjustment does not account for undercoverage in the decennial census. (See Hanson (1978) and U.S. Bureau of Labor Statistics (1989) for more details.)   In preparing the updated population estimates, the Bureau of the Census uses data on births and deaths from the National Center for Health Statistics and on military population deaths from the Department of Defense and the Coast Guard. The Immigration and Naturalization Service and three other agencies provide data on immigration. Estimates of the Armed Forces and the institutionalized population are subtracted out in order to produce estimates of the civilian noninstitutional population. AR these computations are done by single year of age, by race, and by sex. (For more details, see U.S. Bureau of Labor Statistics (1989).)   95           APPENDIX B. GLOSSARY OF ACRONYMS   ADL Activities of daily living ARS Agency Reporting System ASM Annual Survey of Manufactures BEL Business Establishment List BLS Bureau of Labor Statistics CE Consumer Expenditure (Survey) CES Current Employment Statistics (Survey) CHSS Cooperative Health Statistics System COS Company Organization Survey CPI Consumer Price Index CPP Current Point of Purchase Survey CPS Current Population Survey D&B; Dun and Bradstreet DOD Department of Defense ECI Employment Cost Index ED Enumeration district EIN Employer identification number EIA Energy Information Administration FEA Federal Energy Administration HCFA Health Care Finance Administration IADL Instrumental activities of daily living ICT Intercensus transfer IPC Institutional Population Component IPP International Price Program IRS Internal Revenue Service LFS Labor Force Survey (of Canada) NASS National Agricultural Statistics Service NCHS National Center for Health Statistics NCHSR National Center for Health Services Research NCRF National Census of Residential Facilities NCS National Crime Survey NHIS National Health Interview Survey NLTCS National Long-term Care Survey NMCUES National Medical Cost Utilization and Expenditure Survey NMES National Medical Expenditure Survey NMFI National Master Facility Inventory OES Occupational Employment Statistics (Survey) OSH Occupational Safety and Health PPI Producer Price Index PSU Primary sampling unit QAS Quarterly Agricultural Surveys RDD Random-digit dialing R&D; Research and Development Survey SAIC Science Applications International Corporation SIC Standard Industrial Classification SEPP Survey of Income and Program Participation SSA Social Security Administration SSEL Standard Statistical Establishment List TAR Tape Address Register UDB Universe Data Base UI Unemployment Insurance WPI Wholesale Price Index   96           APPENDIX C. GLOSSARY OF TERMS   AREA FRAMES   A sampling frame based on lists of geographical units (Groves 1989, p. 100).   AREA SAMPLING   "The entire area in which the population is located is subdivided into smaller areas, and each elementary unit ... is associated with one and only one such area..." (Hansen, Hurwitz, and Madow 1953, Vol. I, p. 244).   "A method of sampling used when no complete frame of reference is available. The total area under investigation is divided into small sub-areas which are sampled at random or by some restricted random process. Each of the chosen sub-areas is then fully inspected and enumerated, and may form a frame for further sampling if desired. The term may also be used (but is not to be recommended) as meaning the sampling of a domain to determine area, e.g., under a crop" (Kendall and Buckland 1971).   AREA SEGMENT LISTING ERRORS   Listing errors associated with area sampling (this report, p. 39). See LISTING ERRORS and AREA SAMPLING.   BIRTHS   Units that came into existence after frame construction.   BUSINESS   See ESTABLISHMENT.   CENSUS   "The complete enumeration of a population or groups at a point in time with respect to well defined characteristics: for example, Population, Production, Traffic on particular roads. In some connection the term is associated with the data collected rather than the extent of the collection so that the term Sample Census has a distinct meaning" (Kendall and Buckland 1971).   "The modem population census may be defined as the process of collecting, compiling and publishing demographic, social and economic data about the population of a defined territory at a specified time ... either on a de facto or de jure basis" (Pollard, Usef, and Pollard 1974, p. 3).   CLASSIFICATION ERRORS   "Error caused by conceptual problems and misinterpretations in the application of classification systems to survey data" (Hansen, Hurwitz and Madow 1953, Vol. I. p. 84).   An error that occurs because units as members of the target population are misrepresented as out-of-scope units. To the extent frame or sampled units are misclassified as out of the scope of the survey, undercoverage occurs. Classification error may also occur by the misrepresentation of units as members of the target population when in truth they are not. This results in   97           overcoverage. Classification errors are a type of rule-of- association error leading to noncoverage (this report, p. 33).   COMPANY   A company or enterprise consists of one or more establishments under common ownership or control (U.S. Executive Office of the President 1987, p. 12).   COMPLETE COVERAGE   "A survey (or census) should be called complete if virtually all of the units in the population under study are covered" (Moser and Kalton 1971, p. 54).   CONCEPTUAL ERROR   "In planning a survey the purposes of the survey are made explicit. The purposes are then translated into a set of definitions of the characteristics for which data are to be collected and into a set of specifications for collecting, processing, and publishing. The possibilities of error arise where the statistician fails to understand the purposes of the survey, where the definitions that are set up may not be pertinent to the purposes, where the specifications (for the sample, the questionnaire, the method of collecting the data, the methods of selection and training of personnel, processing methods, etc.) would lead to error even if followed exactly" (Hansen, Hurwitz, and Madow 1953, Vol. I, p. 83- 84).   CONSUMER UNITS   A consumer unit comprises either: (1) all members of a particular household who are related by blood, marriage, adoption, or other legal arrangements; (2) a person living alone or sharing a household with others or living as a roomer in a private home or lodging house or in permanent living quarters in a hotel or motel, but who is financially independent; or (3) two or more persons living together who pool their income to make joint expenditure decisions. Financial independence is determined by the three major expense categories: Housing, food, and other living expenses. To be considered financially independent, at least two or three major expense categories have to be provided by the respondent (U.S. Bureau of Labor Statistics 1986, p. 46).   CONTENT ERROR   "Errors of observation or objective measurement, of recording, ... which result in associating a wrong value of the characteristic with a specified unit. (Coverage errors are excluded from this definition.)" (U.S. Bureau of the Census (no date), p. 48).   COVERAGE ERROR   Errors in coverage occur when target population units are missed during frame construction or sample data collection (undercoverage) or when they are duplicated or enumerated in error (overcoverage). Errors in definitions or applying a definition, as well as errors in locating a sampled unit, may affect coverage. Coverage errors may also occur if subclasses of the population have no probability of being included or inappropriate probabilities of being included in a sample. Response errors, which may occur when a respondent misunderstands a question, responds incorrectly because of a belief that an incorrect answer may increase his prestige, etc. or when an interviewer mis-asks a question or mis-records a response, may also result in errors in coverage (Hansen, Hurwitz, and Madow 1953, Vol. I, p. 84).   98           "Undercoverage: units (e.g., households, persons, establishments, farms) that should be in the frames (or lists) from which a sample is selected are not in those frames, or units in the sample are mistakenly classified as ineligible or are omitted from the sample or from the units interviewed" (Madow, et al. 1983, p. 3).   The error in an estimate that results from (1) failure to include in the frame all units belonging to the defined population; failure to include specified units in the conduct of the survey (undercoverage), and (2) inclusion of some units erroneously either because of a defective frame or because of inclusion of unspecified units or inclusion of specified units more than once, in the actual survey (overcoverage)" (U.S. Bureau of the Census (no date), p. 48). "The failure to give any chance of sample selection to some persons in the population" (Groves 1989, p. vi).   "Exists because some persons are not part of the list or frame (or equivalent materials) used to identify members of the population. Because of this they can never be measured, whether a complete census of the frame is attempted or a sample is studied" (Groves 1989, p. 11).   "Refers to the discrepancy between statistics calculated on the ft= population and the same statistics calculated on the target population. Coverage error arises from failure to give some units in the target population any chance of being included in the survey, from including ineligible units in the survey, or from having some target population units appear several times in the frame population.... Coverage error is a function of both the proportion of the target population that is not covered by the frame and the difference on the survey statistic between those covered and those not covered.... Coverage error is a property.of a statistic, not a survey" (Groves 1989, pp. 83-85).   See also NONCOVERAGE, OVERCOVERAGE and UNDERCOVERAGE.   CROSS-SECTIONAL COVERAGE ERROR   An error in a sample estimate that results from unaccounted for changes in the sample population from the time of frame establishment to the first interview. A type of temporal error that is a source of coverage error (this report, p. 36).   CROSS-SECTIONAL TEMPORAL COVERAGE ERROR   See CROSS-SECTIONAL COVERAGE ERROR.   CROSS-SECTIONAL SURVEY   A survey in which data are gathered on "a cross-section of the population at a single point in time" (Bailey 1982, p. 34).     DEATHS   Inactive frame elements (this report, p. 15).   A sampling unit which has been identified as out of business or out of scope (this report, p. 17).   DUPLICATION   See OVERCOVERAGE.   99           ELEMENTS   "The elements of a population are the units for which information is sought; they are the individuals, the elementary units comprising the population about which inferences are to be drawn. They are the units of analysis, and their nature is determined by the survey objectives" (Kish 1965, pp. 6-7).   "The smallest units into which the population can be divided" (Sukhanne and Sukhatme 1970, p. 222).   "Each entity from the population that is the ultimate sampling objective is called a sampling element" (Bailey 1982, p. 85).   "An object on which a measurement is taken" (Scheaffer, Mendenhall, and Ott 1986, p. 20).   ESTABLISHMENT   An establishment is "an economic unit, generally at a single physical location, where business is conducted or where services or industrial operations are performed; for example: a factory, mill store, hotel, movie theater, mine, farm, ranch, bank, railroad depot, airline terminal, sales office, warehouse, or central administrative office" (U.S. Executive Office of the President 1987, p. 12).   FRAME   The frame is any material, device, etc., which is used to provide observational access to the population (Dalenius 1974).   A list of the sampling units which make up the population (Cochran 1963, p. 7).   "Physical lists and procedures that can account for all the sampling units without the physical effort of actually listing them" (Kish 1965, p. 53).   The frame consists of previously available descriptions of the objects or material related to the physical field in the form of maps, lists, directories, etc., from which sampling units may be constructed and a set of sampling units selected; and also information on communications, transport, etc., which may be of value in improving the design for the choice of sampling units, and in the formation of strata, etc. (United Nations 1964, p. 7).   See also SAMPLING UNITS.   FRAME POPULATION   "The materials or devices which delimit, identify, and allow access to the elements of the target population" (Wright and Tsao 1983, p. 26).   "Is the set of persons for whom some enumeration can be made prior to the selection of the survey sample" (Groves 1989, p. 82).   FRAME UNITS   See SAMPLING UNITS.   100           IN-SCOPE UNIT   Sampling units that if properly classified would be part of the population of interest. They would be included in the frame if properly classified.   LIST FRAMES   When the elements of the population have been numbered or otherwise identified, the population together with its identification system is called a list (Hansen, Hurwitz, and Madow 1953, Vol. U, P. 1).   Nongeographically defined units for drawing a sample (Hansen, Hurwitz, and Jabine 1963).   A list of all the sampling units in the population. This list provides the basis for the selection and identification of units in the sample (Sukhatme and Sukhatme 1970, p. 2).   LISTING ERRORS   An error in a sample estimate that occurs due to a failure to find units which should be listed, failure to classify a unit as being within the scope of the list, listing a unit which is not within the scope of the list, or listing a unit more than once. As defined in this report, a type of coverage error that occurs in surveys in which the frame sampling unit and the ultimate sampling unit for the survey are different. Three basic listing errors are cited in this report: Area segment listing errors, household listing errors, and nonhousehold listing errors (this report, p. 39).   LOCATION ERRORS   An error in a sample estimate that arises because of an incorrect association of reporting units with sampling units when the sampling units themselves are not uniquely or clearly defined or when they are difficult to locate. A type of rule-of-association error that is a source of coverage error (this report, p. 31).   LONGITUDINAL COVERAGE ERROR   An error in a sample estimate that results from unaccounted-for changes in the sample population from the first to subsequent interviews. A type of temporal error that results in coverage error (this report, p. 36).   LONGITUDINAL SURVEY   A survey in which data are gathered over an extended period of time (Bailey 1982, p. 34). NONCOVERAGE "Noncoverage includes the problems of "incomplete frames;" a term that seems to imply omissions in preparing the frame. But also refers to"missed units,"omissions due to faulty execution of survey procedures" (Kish 1965, p. 528).   "Missing elements, also called noncoverage and incomplete frame" (Kish 1965, p. 56). "Households missing from a telephone survey sampling frame" (Groves, et al. 1988, p. 4). See also UNDERCOVERAGE.   101           NONSAMPLING ERRORS   "An error in sample estimates which cannot be attributed to sampling fluctuations. Such errors may arise from many different sources such as defects in the frame, faulty demarcation of sample units, defects in the selection of sample units, mistakes in the collection of data due to personal variations or misunderstandings or bias or negligence or dishonesty on the part of the investigator or of the interviewer, mistakes at the stage of the processing of the data, etc." (Kendall and Buckland 1971).   "The error in an estimate arising at any stage in a survey from such sources as varying interpretation of questions by enumerators, unwillingness or inability of respondents to give correct answers, nonresponse, improper coverage, and other sources exclusive of sampling error. This definition includes all components of the Mean Square Error (MSE) except sampling variance" (U.S. Bureau of the Census (no date), p. 50).   NONRESPONSE   See UNIT NONRESPONSE.   OBSERVATION UNITS   "The units from which the observations are obtained. In interview surveys they are called. respondents" (Kish 1965, p. 8).   OBSERVATIONAL ERROR   "Observational errors are deviations of the answers of respondents from their true values on the measure" (Groves 1989, p. 11).   "Errors which are caused by obtaining and recording observations incorrect[e]ly" (Kish 1965, p. 520).   OUT-OF-SCOPE ELEMENTS   Elements that if properly classified would not be part of the population of interest. If properly classified, they would be dropped from the frame (this report, p. 20).   See also CLASSIFICATION ERRORS.   OUT-OF-SCOPE UNITS   Sampling units that if properly classified would not be part of the population of interest.   OVERCOVERAGE   "Target population units [which] ... are duplicated or enumerated in error" (Hansen, Hurwitz, and Madow 1953, Vol. I, p. 84).   Errors leading to the inclusion of units which are not members of the target population (this report, p. 1).   Some members of the population are represented more than once (this report, p. 14).   102           PLANT   See ESTABLISHMENT.   POPULATION OF INTEREST   See TARGET POPULATION.   RANDOM-DIGIT DIALING   "Random digit dialing (RDD) methods are based on the frame of all possible telephone numbers. The telephone number frame is commonly assembled by appending suffixes to area code-prefix combinations obtained from Bell Communications Research (BCR) for a fee." (Groves, et al. 1988, p. 81).   RECORDING ERRORS   Error arising when information is correctly known but incorrectly recorded. As defined in this report, a type of nonsampling error that can cause coverage error (this report, p. 47).   RELEVANCE   "Standards of relevance are concerned with the difference between the ideal goal of a survey and the statistics called for by the survey specifications" (Hansen, Hurwitz, and Pritzker 1967).   RELEVANCE ERROR   See CONCEPTUAL ERROR.   RESPONDENT ERROR   See RESPONDENT REPORTING ERROR.   RESPONDENT REPORTING ERROR   In this report, respondent reporting error refers to all errors which occur during the interview process, whether they are caused by the interviewer, the respondent, vague concepts, faulty instructions, imprecise questions, or combined effects of several of these. A type of household listing error that can cause coverage error (this report, p. 43).   RULES OF ASSOCIATION   Those rules which allow establishment of a linkage between a selection of listed units with known probabilities to a selection of reporting units with known probabilities (Hansen, Hurwitz, and Jabine 1963).     Delineate the relationship between sampling units and the final reporting unit (this report, p. 31).   Also known as rules of correspondence (Groves 1989, p. 99).   103           RULE-OF-ASSOCIATION ERROR   An error in an estimate that results when the relationship between sampling units and the final reporting unit is delineated incorrectly. Rule-of-association errors have been classified into three basic types for purposes of this report: location errors, classification errors, and temporal errors. Rule-of-association errors are a source of coverage error (this report, p. 31).   SAMPLING FRAME   See FRAME.   SAMPLING UNITS   "These units must cover the whole of the population and they must not overlap, in the sense that every element in the population belongs to one and only one unit" (Cochran 1963, p. 7).   The population is subdivided into a finite number of distinct and identifiable units called sampling units (Sukhatme and Sukhatme 1970, p. 2).   "Contain the elements, and they are used for selecting elements into the sample" (Kish 1965, p. 8).   "A sampling unit is either a single sampling element or a collection of elements" (Bailey 1982, p. 85).   "Nonoverlapping collections of elements from the population that cover the entire population" (Scheaffer, Mendenhall, and Ott 1986, p. 21).   "The elements of the population from which we select the sample" (Hansen, Hurwitz and Madow 1953, Vol. ][I, p. 5).   TARGET POPULATION   "The population about which information is wanted" (Cochran 1963, p. 6).   "The set of persons of finite size which will be studied" (Groves 1989, p. 82).   TEMPORAL ERRORS   An error in an estimate that results when the frame or sample is not updated to represent the population of interest for the survey's reference period. A type of rule-of-association error that is a source of coverage error (this report, p. 36).   TYPE A NON-INTERVIEW   U.S. Bureau of the Census nomenclature used to indicate a noninterview of a household that is occupied by persons eligible for interview (this report, p. 34).   See also UNIT NONRESPONSE.   104           TYPE B NONINTERVIEW   U.S. Bureau of the Census nomenclature used to indicate a noninterview of a household that is either unoccupied but could become occupied, or occupied by persons not eligible for interview (this report, p. 34).   TYPE C NON-INTERVIEW   U.S. Bureau of the Census nomenclature used to indicate a noninterview of a household because the sampling unit is ineligible for the sample (this report, p. 34).   UNDERCOVERAGE   failure to include in the frame all units belonging to the defined population" (U.S. Bureau of the Census (no date), p. 48). ,   "The number of persons in telephone households who are not enumerated in sample households in a telephone survey" (Groves, et al. 1988, p. 4).   See also NONCOVERAGE.   UNITS   See SAMPLING UNITS.   UNIT NONRESPONSE   "Unit nonresponse occurs if a unit is selected for the sample and is eligible for the survey, but no response is obtained for the unit or the obtained response is unusable" (Madow, et al 1983, Vol. 1, p. 18).   "The failure to elicit responses for units of analysis in a population or sample because of various reasons such as absence from home, failure to return questionnaire, refusals, omission of one or more entries in a form, vacant houses, etc." (U.S. Bureau of the Census (no date), p. 50).   "We shall use the term nonresponse to refer to the failure to measure some of the units in the selected sample" (Cochran 1963, p. 355).   "In sample surveys, the failure to obtain information from a designated individual for any reason (death, absence, refusal to reply) is often called nonresponse" (Kendall and Buckland 1971).   "Nonresponse refers to many sources of failure to obtain observations (response, measurements) on some elements selected and designated for the sample" (Kish 1965, p. 532).   "Is an error of nonobservation. Nonresponse is the failure to obtain complete measurements on the survey sample" (Groves 1989, p. 133).   As cited in this report, a 'source of coverage error (p. 1).   105           REFERENCES   Adams, D. (1989), "Recommendation to Not Consider the Half-open Interval Approach to Sample Selection (Doc. #4.2-R-2)," Unpublished memorandum to Work Group 4, U.S. Bureau of the Census.   Alexander, C. (1986), "The Present Consumer Expenditure Survey's Weighting Method," in Population Controls and Weighting Sample Visits, Washington, DC: U.S. Bureau of Labor Statistics.   Anderson, D., Schoenberg, B., and Haerer, A. (1988), "Prevalence Surveys of Neurologic Disorders: Methodologic Implications of the Copiah County Study," Journal of Clinical Epidemiology, 41, 339- 345.   Armington, C., and Odle, M. (1981), "Sources of Employment Growth, 1978-1980," Mimeograph, Washington, DC: Brookings Institution, Business Microdata Project.   Bailar, B. (1984), "The Quality of Survey Data," in Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 43-52.   Bailey, K. (1982), Methods of Social Research (2nd ed.), New York: The Free Press.   Beller, N. (1979), "Error Profile for Multiple-Frame Surveys," Economic Statistics and Cooperative Service Report 63, Washington, DC: U.S. Department of Agriculture. Also in Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 221-222.   Bernhardt, M., and Helfand, S. (1980), "Reconciliation of the Economic Censuses Results and Current Survey Programs," in Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 169-174.   Biderman, A., and Cantor, D. (1984). "A Longitudinal Analysis of Bounding; Respondent Conditioning and Mobility as Sources of Panel Bias in the National Crime Survey," Proceedings of the Social Statistics Section, American Statistical Association, pp. 708-713.   Biemer, P. (1983), "Optimal Dual Frame Sample Design: Results of a Simulation Study," in Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 630-635.   Birch, D.L. (1979), "The Job Generation Process," Mimeograph, Cambridge, MA: Institute of Technology, Program on Neighborhood and Regional Change.   Blalock, H. (1968), "The Measurement Problem: The Gap Between the Languages of Theory and Research," in Methodology in Social Research, eds. H. Blalock and A. Blalock, New York: McGraw-Hill, pp. 1-27.   Bosecker, R. (1984), "List vs. Area Overlap Determination: List Dominant and Frozen Domain Procedures," National Agricultural Statistics Service Staff Report, Washington, DC: U.S. Department of Agriculture.   _____, and Clark, M. (1988), "Modifying the Weighted Estimator to Eliminate Screening Interviews in Residential Areas," National Agricultural Statistics Service Research Report, Washington, DC: U.S. Department of Agriculture.   106           _____, and Kelly, W. (1975), "Summary of Results from Nebraska Concept Study," Statistical Reporting Service Staff Report, Washington, DC: U.S. Department of Agriculture.   Casady, R., Nathan, G., and Sirken, M. (1985)," Alternative Dual System Network Estimators," International Statistical Review, 53, 183-197.   Clogg, C., Massagli, M., and Eliason, S. (1986), "Population Undercount as an Issue in Social Research," in Proceedings of the Second Annual Research Conference, U.S. Bureau of the Census, pp. 335-343.   Cochran, W.G. (1963), Sampling Techniques, New York: John Wiley and Sons, Inc.   Cohen, S., Flyer, P., and Potter, D. (1987), "Sample Design of the Medical Expenditure Survey Institutional Population Component," Paper presented at the Annual Meeting of the American Public Health Association, New Orleans, LA.   Colledge, M. (1989), "Coverage and Classification Maintenance Issues in Economic Surveys," Panel Surveys, eds. D. Kasprzyk, G. Duncan, G. Kalton, and M.P. Singh, New York: John Wiley and Sons, pp. 80-107.   Connor, J., Heeringa, S., and Jackson, J. (1985), "Measuring and Understanding Economic Change in Michigan," Mimeograph, University of Michigan, Institute for Social Research.   Cook, P. (1985), "The Case of the Missing Victims: Gunshot Woundings in the National Crime Survey," Journal of Quantitative Criminology, 1, 91-102.   Cotter, J., and Nealon, J. (1987), "Area Frame Design for Agricultural Surveys," National Agricultural Statistics Service Report, Washington, DC: U.S. Department of Agriculture.   Coulter, R., and Mergerson, J. (1978), "An Application of a Record Linkage Theory in Constructing a List Sampling Frame," in Proceedings of the Tenth Symposium on the Interface of Computer Science and Statistics, pp. 416-420.   Cowan, C., Breakey, W., and Fischer, P. (1988), "The Methodology of Counting the Homeless," in Homelessness, Health and Human Needs, Washington, DC: National Academy Press.   Crank, K. (1979), "The Use of Current Partial Information to Adjust for Nonrespondents," Statistical Reporting Service Memorandum, Washington, DC: U.S. Department of Agriculture.   Czaja, R., Snowden, C., and Casady, R. (1986), "Reporting Bias and Sampling Errors in a Survey of a Rare Population Using Multiplicity Counting Rules," Journal of the American Statistical Association, 81, 411-419.   Dalenius, T. (1974), Ends and Means of Total Survey Design, Stockholm: University of Stockholm.   _____, (1985), "Elements of Survey Sampling," Notes prepared for the Swedish Agency for Research Cooperation with Developing Countries (SAREC).   Deming, W. (1960), Sample Design in Business Research, New York: John Wiley and Sons, Inc.   107           _____, (1961), "Uncertainties in Statistical Data, and Their Relation to the Design and Management of Statistical Surveys and Experiments," Bulletin of the International Statistical Institute, 38, Part IV, 365-383.   Fay, R. (1989), "An Analysis of Within-Household Undercoverage in the Current Population Survey," in Proceedings of the Fifth Annual Research Conference, U.S. Bureau of the Census, pp. 156-175.   _____, Passel, J., and Robinson, J. (1988), "The Coverage of Population in the 1980 Census," Publication PHC80-E4, Washington, DC: U.S. Government Printing Office.   Fellegi, I., and Sunter, A. (1969), "A Theory for Record Linkage," Journal of the American Statistical Association,64,1183-1210. Also in Record Linkage Techniques-1985,eds. B. Kilss and W. Alvey, Washington, DC: Internal Revenue Service, pp. 51-78.   Gleason, C., and Bosecker, R. (1978), "The Effect of Refusals and Inaccessibles in List Frame Estimates," Economic Statistics and Cooperative Service Report, Washington, DC: U.S. Department of Agriculture.   Groves, R. (1989), Survey Error and Survey Costs, New York: John Wiley and Sons, Inc.   ______, Biemer, P., Lyberg, L., Massey, J., Nicholls, W., and Waksberg, J. (eds.) (1988), Telephone Survey Methodology, New York: John Wiley and Sons, Inc.   Grzesiak, T., and Tupek, A. (1987), "Measuring Employment of New Businesses in the Cur-rent Employment Statistics Survey," Paper presented at the International Roundtable on Business Survey Frames.   Gurney, M., and Gonzalez, M. (1972), "Estimates for Samples from Frames When Some Units Have Multiple lastings," Proceedings of the Social Statistics. Section, American Statistical Association, pp. 283-288.   Hainer, P. (1987), "A Brief and Qualitative Anthropological Study Exploring the Reasons for Census Coverage Error Among Low Income Black Households," Report submitted to U.S. Bureau of the Census.   _____, Hines, C., Martin, E., and Shapiro, G. (1988), "Research on Improving Coverage in Household Surveys," Proceedings of the Fourth Annual Research Conference, US. Bureau of the Census, pp. 513-539.   Hanczaryk, P., and Sullivan, J. (1980), "Evaluation of Coverage of the Administrative Records Frame for the 1977 Economic Censuses - Employer Segment," Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 154-159.   Hansen, M., Hurwitz, W., and Jabine, T. (1963), "The Use of Imperfect Lists for Probability Sampling at the U.S. Bureau of the Census," Bulletin of the International Statistical Institute, 40, 497-517.   _____, and Madow, W. (1953), Sample Survey Methods and Theory (Vol. I, Methods and Applications), New York: John Wiley and Sons, Inc.   _____, ______, and ______, (1953), Sample Survey Methods and Theory (Vol. II, Theory), New York: John Wiley and Sons, Inc.   108           ______, ______, and Pritzker, H. (1967), "Standardization of Procedures for Data, Measurement Errors and Statistical Standards in the Bureau of the Census," Bulletin of the International Statistical Institute, 42, Part I, 49-64.   Hanson, R. (1978), "The Current Population Survey: Design and Methodology," Technical Paper No. 40, Washington, DC: U.S. Government Printing Office.   Hartley, H.O. (1962), "Multiple Frame Surveys," Proceedings of the Social Statistics Section, American Statistical Association, pp. 203-206.   Harwood, A. (1970), "Participant Observation and Census Data in Urban Research," Paper presented at the Annual Meeting of the American Anthropological Association, San Diego, CA.   Hauber, F., Bruininks, R., Hill, R., Lakin, K., and White, C. (1984), "National Census of Residential Facilities: Fiscal Year 1982," Report, University of Minnesota, Department of Educational Psychology.   Hawkes, W., Jr. (1985), "Census Data Quality - A User's View," Proceedings of the First Annual Research Conference, U.S. Bureau of the Census, pp. 177-192.   Hill, G., and Rockwell, D. (1977), "Associating a Reporting Unit with a List Frame Sampling Unit," Internal memorandum, U.S. Department of Agriculture.   Hirschberg, D., Yuskavage, R., and Scheuren, F. (1977), "The Impact on Personal and Family Income of Adjusting the CPS for Undercoverage," Proceedings of the Social Statistics Section, American Statistical Association, pp. 70-80.   Jacobs, C. (1986), "Interim Evaluation of Listing Process Audit," Unpublished memorandum to Housing Working Group, U.S. Bureau of Labor Statistics.   Jean, A., and McArthur, E. (1984), "Some Data Collection Issues for Panel Surveys with Application to the SIPP," Survey of Income and Program Participation Working Paper Series No. 8407, Washington, DC: U.S. Bureau of the Census.   Jessen, R. (1978), Statistical Survey Techniques, New York: John Wiley and Sons, Inc.   Johnston, D., and Wetzel, J. (1969), "Effect of the Census Undercount on Labor Force Estimates," Monthly Labor Review, March, 3-13.   Joncas, M. (1985), "Cluster Listing Check Program for the Redesigned LFS Sample," Unpublished report, Ottawa: Statistics Canada.   Kalton, G., and Anderson, D. (1986), "Sampling Rare Populations," Journal of the Royal Statistical Society - A, 149 (Pt. 1), 65-82.   _____, and Lepkowski, J. (1985), "Following Rules in SIPP," Journal of Economic and Social Measurement, 13, 319-328.   Kendall, M.G., and Buckland, W.R. (1971), A Dictionary of Statistical Terms (3rd ed.), international Statistical Institute.   King, K. (1988), "SIPP: Monitoring the Rates of Ineligible Households," Internal memorandum, U.S. Bureau of the Census, February 24.   109           ______, Petroni, R., and Singh, R. (1987), "Quality Profile for the Survey of Income and Program Participation," SEPP Working Paper Series 8708, Washington, DC: U.S. Bureau of the Census.   Kish, L. (1965), Survey Sampling, New York: John Wiley and Sons, Inc.   Konschnik, C. (1987), "Summary of the Results of the Area Sample Recheck for the Period Aug. 1986-Oct. 1986," Internal memorandum, U.S. Bureau of the Census.   Lepkowski, J. (1988), "Telephone Sampling Methods in the United States," in Telephone Survey Methodology, eds. R. Groves, et al., New York: John Wiley and Sons, Inc., pp. 73-98.   _____, and Groves, R. (1986), "A Mean Squared Error Model for Dual Frame, Mixed Mode Survey Design," Journal of the American Statistical Association, 81, 930-937.   Lessler, L. (1980), "Errors Associated With the Frame," Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 125-130.   Linebarger, J. (1975), "New Construction Time-Lag Study," Internal memorandum to C. Bostrum dated August 22, U.S. Bureau of the Census.   Madow, W., Nisselson, H.. and Olkin, I., (eds.) (1983), Incomplete Data in Sample Surveys, 3 Vols., New York: Academic Press.   Manton, Y, (1988),."A Longitudinal Study of Functional Change and Mortality in the United States," Journal of Gerontology, 43, 5153- 5161.   Marks, E., and Nisselson, H. (1977), "Problems of Nonsampling Error in the Survey of Income and Education: Coverage Evaluation," in Proceedings of the Social Statistics Section, American Statistical Association, pp. 414-417.   _____, Seltzer, W., and Krotki, K. (1974), Population Growth Estimation: A Handbook of Vital Statistics Measurement, New York: The Population Council.   Martin, E. (1981), "A Twist on the Heisenberg Principle: Or, How Crime Affects Its Measurement," Social Indicators Research, 9, 197- 223.   Matthews, R. (1988), "Screening Residential Tracts for Agricultural Activity," National Agricultural Statistics Service Staff Report, Washington, DC: U.S. Department of Agriculture.   McArthur, E., and Short, K. (1986), "Measurement of Attrition from SIPP Through the Fifth Wave of the 1984 Panel," Internal memorandum, U.S. Bureau of the Census.   McDonald, R. (1984), "The "Underground Economy"and BLS Statistical Data," Monthly Labor Review, January, 4-18.   McGowan, H. (1982), "Telephone Ownership in the National Crime Survey," Unpublished memorandum, U.S. Bureau of the Census.   Montie, I.C., and MacKenzie, W. (1978), "Open-Ended Segments: Variation on Area Segmenting and List Frame Supplementation," in Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 233-237.   110           Montie, I.C. and Schwanz, D.J. (1977), "Coverage Improvement in the Annual Housing Survey," .in Proceedings of the Social Statistics Section, American Statistical Association, pp. 163-172.   Moser, C.A., and Kalton, G. (1971), Survey Methods in Social Investigation, (2nd ed.), New York: Basic Books, Inc.   Nealon, J. (1984), "Review of the Multiple Frame and Area Frame Estimator," National Agricultural Statistical Service Staff Report, Washington, DC: U.S. Department of Agriculture.   Newbrough, U. (1988), "CPS Reinterview Quality Control Results for 1987," Internal memorandum, U.S. Bureau of the Census.   Parsons, V., and Casady, R. (1986), "Variance Estimation and the Redesigned National Health Interview Survey," in Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 406-411.   Pennie, K. (1990), "Coverage Comparisons Between the 1980 Census and the Current Population Survey," Internal memorandum to P. Waite, U.S. Bureau of the Census, in draft.   Pollard, A., Usef, F. and Pollard, G. (1974), Demographic Techniques, Rushcutters Bay, Australia: Pergamon Press Pty. Ltd.   Potter, D., Cohen, S., and Mueller, C. (1987), The 1986 Inventory of Long-term Care Places as a Sampling Frame," Paper presented at the Annual Meeting of the American Statistical Association, San Francisco, CA.   Research Triangle Institute (RTI) (1981), "Complementing the Survey of Hospitals and Other Health Institutions (1980 Complement Survey)," Project report, Rn Contract No. 255 U1913, Research Triangle Park, NC: Author.   Scheaffer, R., Mendenhall, W., and Ott, W. (1986) Elementary Survey Sampling (3rd ed.), Boston: Duxbury Press.   Scheuren, F., and Oh, H.L. (1985), "Fiddling Around with Nonmatches and Mismatches," in Record Linkage Techniques-1985, eds. B. Kilss and W. Alvey, Washington, DC: U.S. Internal Revenue Service, pp. 79-88.   Schreiner, I. (1987). "CPS Reinterview Quality Control Results for 1986," Internal memorandum, U.S. Bureau of the Census.   Schwanz, D. (1988a), "1985 Type-A Unable-to-Locate Rates for the AHS National Unit Samples," Internal memorandum, U.S. Bureau of the Census.   _____, (1988b), "Mobile Home New Construction for 1985 AHS - National," Internal memorandum to Ed Montfort dated February 3, U.S. Bureau of the Census.     Science Applications International Corporation (SAIC) (1985), "Evaluation of the Quality and Utility of the National Master Facility Inventory," Final report to the Division of Health Care Statistics, National Center for Health Statistics Contract 282-83- 2114, Vienna, VA: Author.   111           Shapiro, G. (1979), "Coverage Comparisons Between the Census and Current Population Survey (CPS)," Internal memorandum to Charles D. Jones, U.S. Bureau of the Census.   _____, (1986), "Second Set of Experimental Interviews," Internal memorandum, U.S. Bureau of the Census, July 16.   _____, and Kostanich, D. (1988), "High Response Error and Poor Coverage Are Severely Hurting the Value of Household Surveys," in Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 443-448.   Shimizu, I. (1983), "Identifying and Obtaining the Yellow Pages for a National Area Sample," Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 558-562.   _____, (1986), "The 1985 National Nursing Home Survey Design," Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 516-520.   Singh, R. (1989), "SEPP 85: Household Coverage," Internal memorandum to G. Shapiro, U.S. Bureau of the Census, June 22.   Sirken, M. (1970), "Household Surveys with Multiplicity," Journal of the American Statistical Association, 65, 257-266.   _____, (1983), "Handling Missing Data by Network Sampling," in Incomplete Data in Sample Surveys, eds. W. Madow, H. Nisselson, and I. Olkin, Vol. 2. New York: Academic Press, pp. 81-90.   _____, and Levy, P. (1974), "Multiplicity Estimation of Proportions Based on Ratios of Random Variables," Journal of the American Statistical Association, 19, 68-74.   _____, and Royston, P. (1976), "Effect of Selected Survey Design Factors on the Registered Deaths Reported in a Single-time Retrospective Household Survey," in Proceedings of the Social Statistics Section, American Statistical Association, pp. 773-777.   Spillman, B. (1989), Internal memorandum to A. Swell, National Center for Health Services, June 12.   Statt, R., Vacca, E., Wolters, C., and Hernandez, R. (1981), "Problems Associated with Using Building Permits as a Frame of Post-Census Construction: Permit Lag and ED Identification," Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 226-231.   Strahan, G. (1987), "Nursing Home Characteristics: Preliminary Data from the 1985 National Nursing Home Survey," Advance Data from Vital and Health Statistics, No. 131, Publication PHS-87-1250, Washington, DC: U.S. Government Printing Office.   Sukhatme, P.V., and Sukhatme, B.V. (1970), Sampling Theory of Surveys with Applications (2nd rev. ed.), Ames, IA: Iowa State University Press.   Teitz, M., Glasmeier, A., and Svensson, D. (1981), "Small Business and Employment Growth in California," Working Paper No. 348, University of California at Berkeley, Institute of Urban and Regional Development   112           Thomas, A. (1986), "BLS Establishment Estimates Revised to March 1985 Benchmarks," Washington, DC: U.S. Bureau of Labor Statistics.   Thornberry, O., and Massey, J. (1988), "Trends in United States Telephone Coverage Across Time and Subgroups," in Telephone Survey Methodology, eds. R. Groves, et al., New York: John Wiley and Sons, Inc., pp. 25-51.   Tortora, R. (1987), "Quantifying Nonsampling Errors and Bias," Journal of Official Statistics (Statistics Sweden), 3, 339-342.   United Nations (1964), Recommendations for Preparation of Sampling Survey Reports (Provisional Issue), Series C, No. 1 Rev. 2, New York.   _____, (1982), Nonsampling Errors in Household Surveys: Sources, Assessment and Control, New York: Author.   U.S. Bureau of Labor Statistics (1985), Employment and Earnings, 32 (12).   _____, (1986), Consumer Expenditure Survey, Washington, DC: U.S. Government Printing Office.   _____, (1988), BLS Handbook of Methods, Bulletin 2285, Washington, DC: U.S. Government Printing Office.   _____, (1989), Employment and Earnings, 36 (12).   U.S. Bureau of the Census (no date), Course on Nonsampling Errors, Lectures 1-9, International Statistics Program Center, Washington, DC.   _____, (1968), "The Current Population Survey Reinterview Program, January 1961 through December 1966," Technical Paper 19, Washington, DC: U.S. Government Printing Office.   _____, (1971), "The Annual Survey of Manufactures: A Report on Methodology," Technical Paper 24, Washington, DC: U.S. Government Printing Office.   _____, (1973), "The Coverage of Housing in the 1970 Census," Report PHC(E)-5, Washington, DC: U.S. Government Printing Office.   _____, (1984), "Census of Agriculture, Coverage Evaluation," Vol. 2, Pt. 2, Publication AC82-SS-2, Washington, DC: U.S. Government Printing Office.   _____, (1986), "Appendix A: Source and Reliability Statement for the Long-term Care Survey," in National Long-term Care Survey and National Survey of Informal Caregivers, 1982 Report on Methods and Procedures Used in the Survey, Part 1, Documentation, Springfield, VA: U.S. National Technical Information Services, Order No. 86- 161783, p. A-3.   _____, (1987), "Programs to Improve Coverage in the 1980 Census," Report PHC80-E3, Washington, DC: U.S. Government Printing Office.   _____, (1989), Current Construction Reports, Housing Starts, C20- 89-4, Housing Starts Compilation, Washington, DC: U.S. Government Printing Office.   113           U.S. Department of Commerce (1978a), "An Error Profile: Employment as Measured by the Current Population Survey," Statistical Policy Working Paper 3, Washington, DC: U.S. Government Printing Office.   _____, (1978b), "Glossary of Nonsampling Error Terms: An Illustration of a Semantic Problem in Statistics," Statistical Policy Working Paper 4, Washington, DC: U.S. Government Printing Office.   _____, (1980), "Report on Statistical Uses of Administrative Records," Statistical Policy Working Paper 6, Washington, DC: U.S. Government Printing Office.   U.S. Executive Office of the President, Office of Management and Budget (1987), Standard Industrial Classification Manual: 1987, Order No. PB87-1000012, Springfield, VA: National Technical Information Service.   U.S. National Center for Health Statistics (1965), "Development and Maintenance of a National Inventory of Hospitals and Institutions," Vital and Health Statistics, Series 1, No. 6 (PHS Publication Number 1000), Washington, DC: U.S. Government Printing Office.   _____, (1968), "The Agency Reporting System for Maintaining the National Inventory of Hospitals and Institutions," Vital and Health Statistics, Series 1, No. 6 (PHS Publication Number 1000), Washington, DC: U.S. Government Printing Office.   _____, (1983), "Nursing and Related Care Homes as Reported from the 1980 National Master Facility Inventory Survey," by A. Sirrocco, Vital and Health Statistics, Series 14, No. 29 (PHS Publication Number 84-1824), Washington, DC: U.S. Government Printing Office.   _____, (1986), "Nursing and Related Care Homes as Reported from the 1982 National Master Facility Inventory Survey," by D. Roper, Vital and Health Statistics, Series 14, No. 32 (PHS Publication Number 86-1827), Washington, DC: U.S. Government Printing Office.   _____, (1987), "Public Use Data Tape Documentation: The 1986 Inventory of Long-term Care Places," NTIS Publication 88-110614, Springfield, VA: National Technical Information Service.   U.S. Office of Management and Budget (1986), "Federal Longitudinal Surveys," Statistical Policy Working Paper 13, Washington, DC: U.S. Government Printing Office.   Valentine, C., and Valentine, B. (1971), "Summary of Missing Men - A Comparative Methodological Study of Under-numeration and Related Problems," Unpublished paper, U.S. Bureau of the Census.   Vogel, F., and Bosecker, R. (1974), "Multiple Frame Livestock Surveys, A Comparison of Area and List Sampling," National Agricultural Statistics Service Staff Report, Washington, DC: U.S. Department of Agriculture.   _____, and Rockwell, D.. (1977), "Fiddling with Area Frame Information in List Development and Maintenance," Washington, DC: U.S. Department of Agriculture.   Waite, P.J. (1989), "Listing Accuracy in HIS Blocks," Internal memorandum to E. Davey and S.D. Matchett dated October 26, U.S. Bureau of the Census.   Waksberg, J. (1978), "Sampling Methods for Random Digit Dialing," Journal of the American Statistical Association, 73, 40-46.   114           Williams, L., and Chakrabarty, R. (1983), "The Michigan State Random Digit Dialing Survey of Sportsmen and Wildlife Associated Recreation," in Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 648-653.   Wright, T, and Tsao, H. (1983), "A Frame on Frames: An Annotated Bibliography" in Statistical Methods and the Improvement of data Quality, ed. T. Wright, New York: Academic Press, pp. 25-72.   115                 Reports Available in the Statistical Policy working Paper Series     1. Report on Statistics for Allocation of Funds (Available through UTIS Document Sales, PB86-211521/AS) 2. Report on Statistical Disclosure and Disclosure-Avoidance Techniques (NTIS Document Sales, PB86-211539/AS) 3. An Error Profile: Employment as Measured by the Current Population Survey (NTIS Document Sales PB86-214269/AS) 4. Glossary of Nonsampling Error Terms: An Illustration of a Semantic Problem in Statistics (NTIS Document Sales, PB86- 211547/AS) 5. Report on Exact and Statistical Matching Techniques (NTIS Document Sales, PB86-215829/AS) 6. Report on Statistical Uses of Administrative Records (NTIS Document Sales, PB86-214285/AS) 7. An Interagency Review of Time-Series Revision Policies (NTIS Document Sales, PB86-232451/AS) S. Statistical Interagency Agreements (NTIS Document Sales, PB86- 230570/AS) 9. Contracting for Surveys (NTIS Document Sales, PB83-233148) 10. Approaches to Developing Questionnaires (NTIS Document Sales, PB84-105055/AS)   II. A Review of Industry Coding Systems (NTIS Document Sales, PB84-135276) 12. The Role of Telephone Data Collection in Federal Statistics (NTIS Document Sales, PB85-105971) 13. Federal Longitudinal Surveys (NTIS Document Sales, PB86- 139730) 14. Workshop on Statistical Uses of Microcomputers in Federal Agencies (NTIS Document Sales, PB87-166393) 15. Quality in Establishment Surveys (NTIS Document Sales, PBSS- 232921) 16. A Comparative Study of Reporting Units in Selected Employer Data Systems (NTIS Document sales, PB90-205238) 17. Survey Coverage (NTIS Document Sales, PB90-205246) 18. Data Editing in Federal Statistical Agencies (NTIS Document Sales, PB90-205253) 19. Computer Assisted Survey Information Collection (NTIS Document Sales, PB90-205261)     Copies of these working papers may be ordered from NTIS Document Sales, 5285 Port Royal Road, Springfield, VA 22161 (703) 487-4650      

(wp17.html)

ARROW UP

 


Page Last Modified: April 20, 2007 FCSM Home
Methodology Reports