nci logo
NIH
U.S. National Institutes of Health National Cancer Institute

How do I define an analysis cohort based on one or more variables?

In SEER*Stat, to specify the subset of records to be used as your analysis cohort:

  1. Select a database on the Data Tab containing the records of interest.
  2. Make “selections” on the Selection Tab to specify an analysis cohort of all records or a subset of records in the chosen database.

Sample Analysis

Calculate the number of malignant lung and bronchus tumors diagnosed in Los Angeles from 1992-2009.

Step 1: Open a new Frequency session.

Step 2: Selecting the Appropriate Database

Each SEER database contains data for a specific set of registries and diagnosis years. This information is noted in the database name (registry terms are described in SEER Registry Groupings for Analyses). You can use SEER*Stat to view the list of registries in a database by 1) selecting the database on the Data Tab; 2) opening the dictionary; and 3) double-clicking the registry variable.

To analyze cancer data for Los Angeles, select the "SEER 18 Regs Research Data, Nov 2011 Sub (1973-2009 varying) " database.

Step 3: Using the Selection Tab

  • Move to the Selection Tab.
  • There are two basic mechanisms for making selections: adding selection statements and checking standard options. Selections reduce the number of records included in an analysis based on specific variables. If no selections are made, then your analysis cohort will include every case in the database (all cases in the 18 SEER registries for 1973-2009). In this exercise, we want the frequency of malignant lung cancer cases in Los Angeles for 1992-2009. As you will see, the SEER 18 registry database only contains Los Angeles data for 1992-2009.
  • Use the default options in the Select Only box, ensuring that the Malignant Behavior, Known Age, and Cases in Limited-Use Database options are checked.
  • Select Edit.
  • Using the controls at the top of the Case Selection window, you will create a search statement. The variables are listed in categories in the Variable box on the top left of the screen.
  • In the Variable box, use the "+" to expand the "Race, Sex, Year Dx, Registry, County" category.
  • Select "SEER registry".
  • Moving to the center of the window, check to see that "is = to" is selected as the Operator.
  • Scroll through the items in the Values box until you find and select "Los Angeles - 1992+" (each registry's years of data are noted in the labels).
  • Next select "Site rec with Kaposi and mesothelioma" from the "Site and Morphology" category. Note that a new line joined by the operator "AND" has been added to your selection statement.
  • Using "is = to" as the Operator, find and select "Lung and Bronchus" as the value.
  • At this time, the following should appear in the Selection Statement box at the bottom of the window:
    {Race, Sex, Year Dx, Registry, County.SEER registry} = 'Los Angeles - 1992+'
    AND {Site and Morphology.Site rec with Kaposi and mesothelioma} = ' Lung and Bronchus'
  • Use the OK button to close the Case Selection window.

Step 3:  Execute SEER*Stat

  • Use the  or select Execute from the Session menu to execute the session.
  • A dialog will display the progress of the job. When the job completes a new window will open containing the output table or matrix. In this exercise, we calculated one statistic (the number of malignant lung and bronchus cases of known age in Los Angeles, diagnosed from 1992 through 2009). To make a more complex table, see stratifying results by one or more variables.

Verify Results:  72,624 Cases