U.S. Department of Commerce

Survey of Business Owners (SBO)

You are here: Census.govBusiness & IndustryEconomy-Wide StatisticsSurvey of Business Owners › How the Survey Data Are Collected (2007 Survey Methodology)
Skip top of page navigation

2007 Methodology

SOURCES OF THE DATA

The 2007 Survey of Business Owners (SBO) questionnaire, Form SBO-1, was mailed to a random sample of businesses selected from a list of all firms operating during 2007 with receipts of $1,000 or more, except those classified in the following NAICS industries:

  • Crop and Animal Production (NAICS 111, 112)
  • Scheduled Passenger Air Transportation (NAICS 481111)
  • Rail Transportation (NAICS 482)
  • Postal Service (NAICS 491)
  • Funds, Trusts, and Other Financial Vehicles (NAICS 525)
  • Religious, Grantmaking, Civic, Professional, and Similar Organizations (NAICS 813)
  • Private Households (NAICS 814)
  • Public Administration (NAICS 92)

The list of all firms (or universe) was compiled from a combination of business tax returns and data collected on other economic census reports. The Census Bureau obtained electronic files from the Internal Revenue Service (IRS) for all companies reporting any business activity on any one of the following 2007 IRS tax forms:

  • 1040 Schedule C, "Profit or Loss from Business" (Sole Proprietorship)
  • 1065, "U.S. Return of Partnership Income"
  • any one of the 1120 corporation tax forms
  • 941, "Employer's Quarterly Federal Tax Return"
  • 944, "Employer's Annual Federal Tax Return"

The IRS provided certain identification, classification, and measurement data for businesses filing those forms.

For most firms with paid employees, the Census Bureau also collected employment, payroll, receipts, and kind of business for each plant, store, or physical location during the 2007 Economic Census.

For the 2007 SBO, firms could either report electronically by using Census Taker, the Census Bureau's secure online interactive application, or return their completed form by mail. Three report form remails to employer firms and two report form remails to nonemployer firms were conducted at one-month intervals to all delinquent respondents. The returned forms underwent extensive review and computer processing. All reports were geographically coded, data-keyed, and edited. The editing process identified records with significant problems. Corrections were performed interactively using standard procedures.

The data were then tabulated by the 2007 NAICS, subjected to further data analysis, and the resulting corrections applied to individual computer records. Corrected tabulations were then produced for the final published results available through American FactFinder (AFF), the Census Bureau's online, self-service data access tool.

The 2007 SBO-1 report form is available at http://www2.census.gov/econ/sbo/sample_forms/sbo1_2007.pdf.

A more detailed examination of census methodology is presented in the History of the 2007 Economic Census at http://www.census.gov/econ/census07/www/methodology/history.html.

     Top

INDUSTRY CLASSIFICATION OF FIRMS

A firm is a business organization or entity consisting of one domestic establishment (location) or more under common ownership or control. All establishments are included as part of the owning or controlling firm. For the economic census, the terms "firm" and "company" are synonymous.

The classifications for all firms are based on the North American Industry Classification System, United States, 2007 manual. Changes between 2002 and 2007 are discussed in the text at the beginning of this manual and are published online at http://www.census.gov/eos/www/naics/.

Firms with more than one domestic establishment are counted in each industry and geographic area in which they operate, but only once in the total for all sectors and the totals at the national and state levels. The method of assigning classifications and the level of detail at which single- or multi-unit firms were classified depends on whether an economic census report form was obtained at the establishment level.

  1. Establishments that returned an economic census report form were classified on the basis of their self-designation, product sales or shipments, and responses to other industry-specific inquiries.
  2. Establishments without an economic census report form:
    1. Small employers not sent a form were, where possible, classified on the basis of the most current kind-of-business or industry classification available from one of the Census Bureau's current sample surveys or the 2007 Economic Census. Otherwise, the classification was obtained from administrative records of other federal agencies. If the census or administrative record classifications proved inadequate (none corresponded to a 2007 Economic Census classification in the detail required for employers), the firm was sent a brief inquiry requesting information necessary to assign a kind-of-business or industry code.
    2. Nonemployers were classified on the basis of information obtained from administrative records of other federal agencies.
     Top

PRECAUTIONS IN ANALYZING AND INTERPRETING DATA

The SBO covers both firms with paid employees and firms with no paid employees. Although firms with no paid employees are included in this survey, they are omitted from many of the economic census reports. Because the 2007 SBO includes firms with no paid employees, caution should be exercised in comparing 2007 SBO data with published or unpublished data from other reports of the 2007 Economic Census.

All survey and census results contain measurement errors and may contain sampling errors. Information about these potential errors is provided or referenced with the data or the source of the data. The Census Bureau recommends that data users incorporate this information into their analyses as these errors could impact inferences. Researchers analyzing data to create their own estimates are responsible for the validity of those estimates and should not cite the Census Bureau as the source of the estimates but only as the source of the core data.

Please contact the Census Bureau for more detailed information and interpretation of the sampling and nonsampling errors.

     Top

BASIS OF REPORTING

The Economic Census is conducted on an establishment basis. A company operating at more than one location is required to file a separate report for each store, factory, shop, or other location. Each establishment is assigned a separate industry classification based on its primary activity and not that of its parent company. (For selected industries, only payroll, employment, and classification are collected for individual establishments, while other data are collected on a consolidated basis.)

The Survey of Business Owners (SBO) is conducted on a company or firm basis rather than an establishment basis. A company or firm is a business consisting of one or more domestic establishments that the reporting firm specified under its ownership or control at the end of 2007.

     Top

SAMPLING AND ESTIMATION METHODOLOGIES

Sampling. To design the 2007 SBO sample, the Census Bureau used the following sources of information to estimate the probability that a business was minority- or women-owned:

  • Administrative data from the Social Security Administration.
  • Lists of minority- and women-owned businesses published in syndicated magazines, located on the Internet, or disseminated by trade or special interest groups.
  • Word strings in the company name indicating possible minority ownership (derived from 2002 survey responses).
  • Racial distributions for various state-industry classes (derived from 2002 survey responses) and racial distributions for various ZIP Codes.
  • Gender, ethnicity, race, and veteran status responses of a single-owner business to a previous SBO or to the 2000 Decennial Census.

These probabilities were then used to place each firm in the SBO universe in one of nine frames for sampling:

  • American Indian
  • Asian
  • Black or African American
  • Hispanic
  • Non-Hispanic white men
  • Native Hawaiian and Other Pacific Islander
  • Other (a different race was supplied as a write-in to another source)
  • Publicly owned
  • Women

The SBO universe was stratified by state, industry, frame, and whether the company had paid employees in 2007. The Census Bureau selected large companies, including those operating in more than one state, with certainty. These companies were selected based on volume of sales, payroll, or number of paid employees. All certainty cases were sure to be selected and represented only themselves (i.e., had a selection probability of one and a sampling weight of one). The certainty cutoffs varied by sampling stratum, and each stratum was sampled at varying rates, depending on the number of firms in a particular industry in a particular state. The remaining universe was subjected to stratified systematic random sampling.

A firm selected into the sample was mailed the questionnaire that asked for the percentage of ownership, gender, ethnicity, race, veteran status, and several characteristic questions (e.g., age, education level) for up to four persons owning the largest percentages in the business.

Tabulation. Business ownership is defined as having 51 percent or more of the stock or equity in the business and is categorized by:

  • All firms classifiable by gender, ethnicity, race, and veteran status
    • Gender
      • Female-owned
      • Male-owned
      • Equally male-/female-owned
    • Ethnicity
      • Hispanic
        • Mexican, Mexican American, Chicano
        • Puerto Rican
        • Cuban
        • Other Hispanic, Latino, or Spanish origin
      • Equally Hispanic/non-Hispanic
      • Non-Hispanic
    • Race
      • White
      • Black or African American
      • American Indian and Alaska Native
      • Asian
        • Asian Indian
        • Chinese
        • Filipino
        • Japanese
        • Korean
        • Vietnamese
        • Other Asian
      • Native Hawaiian and Other Pacific Islander
        • Native Hawaiian
        • Samoan
        • Guamanian or Chamorro
        • Other Pacific Islander
      • Some other race
      • Minority
      • Equally minority/nonminority
      • Nonminority
    • Veteran status
      • Veteran-owned
      • Equally veteran-/nonveteran-owned
      • Nonveteran-owned
  • Publicly held and other firms not classifiable by gender, ethnicity, race, and veteran status

Businesses could be tabulated in more than one racial group because:

  1. the sole owner was reported to be of more than one race;
  2. the majority owner was reported to be of more than one race;
  3. a majority combination of owners was reported to be of more than one race.

The detail may not add to the total or subgroup total because a Hispanic or Latino firm may be of any race, and because a firm could be tabulated in more than one racial group. For example, if a firm responded as both Chinese and Black majority owned, the firm would be included in the detailed Asian and Black estimates, but would only be counted once toward the higher level all firms' estimates.

The sum of the detailed Hispanic origin may not add to the total because no one Hispanic subgroup (i.e., Mexican, Puerto Rican, Cuban, or Other Hispanic, Latino, or Spanish origin) owned a majority of the firm, but a combination of these subgroups did own a majority. In this case, the firm was included in the Hispanic estimate, but was not included in any of the subgroup estimates. For example, if a firm had two owners each with equal ownership, one responding Puerto Rican and the other responding Cuban, there is no one subgroup with a majority ownership, but the firm is Hispanic-owned. This firm would be tabulated in the Hispanic estimate, but would not appear in any of the subgroup estimates.

Also, the subgroup detail for both Asians and Native Hawaiians and Other Pacific Islanders may not add to the total for similar reasons as explained above.

For the tabulations by gender, ethnicity, race, and veteran status, the data for each firm in the SBO sample were weighted by the reciprocal of the firm's probability of selection.

     Top

RELIABILITY OF ESTIMATES

The figures shown in these datasets are, in part, estimated from a sample and will differ from the figures that would have been obtained from a complete census. Two types of possible errors are associated with estimates based on data from sample surveys: sampling errors and nonsampling errors. The accuracy of a survey result depends not only on the sampling errors and nonsampling errors measured, but also on the nonsampling errors not explicitly measured. For particular estimates, the total error may considerably exceed the measured error. The following is a description of the sampling and nonsampling errors associated with this tabulation.

Sampling variability. The particular sample used for this survey is one of a large number of all possible samples of the same size that could have been selected using the same sample design. Estimates derived from the different samples would differ from each other. The relative standard error and standard error are measures of the variability among the estimates from all possible samples. The estimated relative standard errors and estimated standard errors presented in the tables estimate the sampling variability, and thus measure the precision with which an estimate from the particular sample selected for this survey approximates the average result of all possible samples. Relative standard errors and standard errors are applicable only to those published cells in which sample cases are tabulated. A relative standard error is an expression of the standard error as a percent of the quantity being estimated.

The sample estimate and an estimate of its relative standard error can be used to estimate the standard error and then construct interval estimates with a prescribed level of confidence that the interval includes the average results of all samples. To illustrate, if all possible samples were surveyed under essentially the same condition, and estimates calculated from each sample, then:

  1. Approximately 68 percent of the intervals from one standard error below the estimate to one standard error above the estimate would include the average value of all possible samples.
  2. Approximately 90 percent of the intervals from 1.6 standard errors below the estimate to 1.6 standard errors above the estimate would include the average value of all possible samples.

Thus, for a particular sample, one can say with specified confidence that the average of all possible samples is included in the constructed interval.

Example of a confidence interval. Suppose the estimate is 51,707 and the estimated relative standard error is 2 percent. The standard error is then 2 percent of 51,707 or 1,034. An approximate 90-percent confidence interval is found by first multiplying the standard error by 1.6 and then adding and subtracting that result from the estimate to obtain the upper and lower bounds. Since 1.6 x 1,034 = 1,654, the confidence interval in this example is 51,707 + or - 1,654 or the range 50,053 to 53,361.

For the Characteristics of Businesses and Characteristics of Business Owners datasets, some data are expressed as percentages with standard errors rather than relative standard errors as indicated above. Construction of the confidence interval is illustrated by the following example.

Example of a confidence interval for percentage data. Suppose the estimate is 76.9 and the estimated standard error is 0.4 percent. An approximate 90-percent confidence interval is found by first multiplying the standard error by 1.6 and then adding and subtracting that result from the estimate to obtain the upper and lower bounds. Since 1.6 x 0.4 = 0.64, the confidence interval in this example is 76.9 + or - 0.64 or the range 76.26 to 77.54.

Nonsampling errors. All surveys and censuses are subject to nonsampling errors. Nonsampling errors are attributable to many sources, including the inability to obtain information for all cases in the universe, imputation for missing data, data errors and biases, mistakes in recording or keying data, errors in collection or processing, and coverage problems.

While explicit measures of the effects of these nonsampling errors are not available, adjustments are made to the published relative standard errors to account for error associated with imputation of missing data. It is believed that most of the important operational and data errors were detected and corrected through an automated data edit designed to review the data for reasonableness and consistency. Quality control techniques were used to verify that operating procedures were carried out as specified.

Unpublished estimates. Some unpublished estimates can be derived directly from datasets by subtracting published estimates from their respective totals. However, the estimates obtained by such subtraction would be subject to poor response, high sampling variability, or other factors that may make them potentially misleading.

Individuals who use estimates in datasets to create new estimates should cite the Census Bureau as the source of only the original estimates.

     Top

TREATMENT OF NONRESPONSE

Treatment of Nonresponse. Approximately 62 percent of the 2.3 million businesses in the SBO sample responded to the survey, compared to 75 percent for the 2002 survey. For the 2007 survey, 72 percent of the companies in the SBO sample returned a questionnaire, but 10 percent of the returns did not contain enough information to be considered a response for the estimates by race, gender, ethnicity or veteran status. Many of these respondents were sole proprietors that answered "No" to Item 8, "In 2007, did any individual own 10% or more of the rights, claims, interests, or stock in this business?" The inconsistency between response and sole ownership status indicates a possible problem with question wording that will be addressed in questionnaire design for the 2012 SBO.

About 4 percent of the 2007 nonrespondents were selected for and responded to the 2002 SBO. For these firms, data from the 2002 survey were used in place of the missing 2007 responses. For the remaining nonrespondents, gender, ethnicity, race and veteran status were imputed from donor respondents in the same sampling frame with similar characteristics (state, industry, employment status, size). Because the assignment of businesses to sampling frames relies heavily on administrative data, and there is a high level of agreement between sampling frame assignment and tabulated race or ethnicity for responding firms, the donor imputations are considered to be reliable. Estimates of sampling variability are adjusted to account for nonresponse. Estimates with high error (relative standard error for sales or receipts of 50 percent or more) are suppressed.

Overall, imputed data accounted for approximately 47 percent of the firm count estimates by gender, ethnicity, race, and veteran status and approximately 20 percent of the estimates of sales.

     Top

FIRM SIZE CATEGORIES

The firm size categories, both by receipts and employment, are based on the total nationwide receipts and/or employment of the firm.

The revenue and employment of a multi-unit firm is determined by summing the receipts and employment, respectively, of all associated establishments. The receipts size and employment size of a firm are determined by the summed revenue or employment of all associated establishments. The employment size group "0" includes firms for which no associated establishments reported paid employees in the mid-March pay period, but paid employees at some time during the year.

Receipts size and employment size are determined for the entire company. Hence, counterintuitive results are possible, for example, only 100 employees in a category of firms with 500 employees or more in a particular industry.

Data by receipts size of firm are presented by the following receipts size categories:

  • All firms
  • Firms with sales/receipts of less than $5,000
  • Firms with sales/receipts of $5,000 to $9,999
  • Firms with sales/receipts of $10,000 to $24,999
  • Firms with sales/receipts of $25,000 to $49,999
  • Firms with sales/receipts of $50,000 to $99,999
  • Firms with sales/receipts of $100,000 to $249,999
  • Firms with sales/receipts of $250,000 to $499,999
  • Firms with sales/receipts of $500,000 to $999,999
  • Firms with sales/receipts of $1,000,000 or more

Data by employment size of firm are presented by the following employment size categories:

  • All firms
  • Firms with no employees
  • Firms with 1 to 4 employees
  • Firms with 5 to 9 employees
  • Firms with 10 to 19 employees
  • Firms with 20 to 49 employees
  • Firms with 50 to 99 employees
  • Firms with 100 to 499 employees
  • Firms with 500 to 999 employees
  • Firms with 1,000 or more employees

Employer firms include firms with payroll at any time during 2007. Employment reflects the number of paid employees during the March 12 pay period.

     Top

DISCLOSURE

Confidentiality. In accordance with federal law governing census reports (Title 13 of the United States Code), no data are published that would disclose the operations of an individual establishment or business. However, the number of firms in a kind-of-business or industry classification is not considered a disclosure; therefore, this information may be released even though other information is withheld. Techniques employed to limit disclosure are discussed at http://www.census.gov/econ/census07/www/methodology/disclosure.html.

The information and data obtained from the Internal Revenue Service, the Social Security Administration, and other sources are also treated as confidential and can be seen only by Census Bureau employees sworn to protect the data from disclosure.

Disclosure avoidance. Disclosure is the release of data that have been deemed confidential. It generally reveals information about a specific individual or firm or permits deduction of sensitive information about a particular individual or establishment. Disclosure avoidance is the process used to protect the confidentiality of the survey data provided by an individual or firm. Using disclosure avoidance procedures, the Census Bureau modifies or removes the characteristics that put confidential information at risk of disclosure. Although it may appear that a table shows information about a specific individual or business, the Census Bureau has taken steps to disguise or suppress the original data while making sure the results are still useful. The techniques used by the Census Bureau to protect confidentiality in tabulations vary, depending on the type of data.

Noise infusion. For the 2007 SBO, the primary method of disclosure avoidance is noise infusion in which values are perturbed prior to tabulation by applying a random noise multiplier to the magnitude data, such as the sales and receipts for all firms. Disclosure protection is accomplished in a manner that causes the vast majority of cell values to be perturbed by at most a few percentage points. For sample-based tabulations, such as SBO, the estimated relative standard error for a published cell includes both the estimated sampling error and the amount of perturbation in the estimated cell value due to noise.

In certain circumstances some individual cells may be suppressed on a case by case basis for additional disclosure avoidance and the data replaced by one of the following characters:

  • D - Withheld to avoid disclosing data for individual companies; data are included in higher level totals
  • S - Estimates are suppressed when publication standards are not met, such as the relative standard error of the sales and receipts is 50 percent or more
  • X - Not applicable

To provide meaningful information for cells that have suppression of sensitive employment data, these characters are used to indicate the employment size of firm:

  • a - 0 to 19 employees
  • b - 20 to 99 employees
  • c - 100 to 249 employees
  • e - 250 to 499 employees
  • f - 500 to 999 employees
  • g - 1,000 to 2,499 employees
  • h - 2,500 to 4,999 employees
  • i - 5,000 to 9,999 employees
  • j - 10,000 to 24,999 employees
  • k - 25,000 to 49,999 employees
  • l - 50,000 to 99,999 employees
  • m - 100,000 employees or more

The character r is used to indicate that data have been revised.

     Top
[PDF] or PDF denotes a file in Adobe’s Portable Document Format. To view the file, you will need the Adobe® Reader® Off Site available free from Adobe.
[Excel] or the letters [xls] indicate a document is in the Microsoft® Excel® Spreadsheet Format (XLS). To view the file, you will need the Microsoft® Excel® Viewer Off Site available for free from Microsoft®.
The letters PPT indicate a document is in the Microsoft® PowerPoint® Format (PPT). To view the file, you will need the Microsoft® PowerPoint® Viewer Off Site available for free from Microsoft®.
This symbol Off Site indicates a link to a non-government web site. Our linking to these sites does not constitute an endorsement of any products, services or the information found on them. Once you link to another site you are subject to the policies of the new site.
Source: U.S. Census Bureau | Survey of Business Owners | (301) 763-3316 | csd.sbo@census.gov Last Revised: June 28, 2012