nci logo
NIH
U.S. National Institutes of Health National Cancer Institute

How to use SEER*Prep to Create a Database

To analyze your cancer data in SEER*Stat, follow these steps:

  1. Download and install SEER*Prep.
  2. Using your own data management software and resources, create text or compressed text files containing your data according to the specifications outlined in Input File Formats.
  3. Use the Verify Data function in SEER*Prep to identify inconsistencies, formatting errors, or invalid values in your data files. Note: This verification process will not identify all problems in the data and is not intended to replace quality control measures taken in step 2 when creating your data files.
  4. Use the Create File function in SEER*Prep to create the SEER*Stat database.
  5. Closely examine the report generated by SEER*Prep. Correct any problems and repeat steps 3 and 4, if necessary.
  6. Start SEER*Stat. Your new database will be available in the list of databases on the Data Tab (for the appropriate sessions).

Example:  Create a SEER*Stat Database containing Incidence and Population Data

Many users are interested in using SEER*Stat to calculate age-adjusted incidence or mortality rates. SEER*Stat requires case and population data to calculate rates. Therefore, you must supply two types of files: incidence or mortality data files and population data files. These files must meet the requirements discussed in Input File Formats.

Let us assume that you have incidence records for malignant cervical cancer cases diagnosed in 1995-1997 in the state of Maryland. In addition, you have population data for 19 age groups (< 1, 0-4, 5-9,..., 85+) and for races White, Black, and Other.

Step 1:  Get Detailed Descriptions of the Case and Population Input Files

  • Start SEER*Prep.
  • Open the Database Description file. Starting with SEER*Prep 2.4.6, only NAACCR 12.1 format is distributed with the software. Select Open from the File menu to select the NAACCR 12.1 Database Description file. The name of this file is "naaccr3339.ver12_1.d05242012.dd" or something similar (if an update is released, the date embedded in the filename will differ).
  • Once you select the Database Description file, SEER*Prep will load information for each variable into the box on the right side of the window. Initially, the list will be sorted by the variable location in the incidence data file (note the Case Start Col column). Click the Pop Start Col column header. The list should now be sorted by the variable location in the population data file (case-only variables will have a blank entry in this column).
  • Set the variables used to link the case data with the population data. Since this example assumes your data are for 19 age groups, the age variable should be "Age recode with < 1 year olds." Since this example assumes your data are for 3 races (White, Black, and Other), the race variable should be changed to "Race recode A."
  • Select a specific variable and open the Edit Variable window by double-clicking or using the Edit button. The Edit Window contains a complete description of the variable including its valid values. Edit the "Race" variable to view an example.
  • Select Generate Input File Description from the File menu to create a text file containing detailed format information for the case and population files.

Step 2:  Prepare your Incidence Data Files

  • Using software other than SEER*Prep, create an incidence data file according to the NAACCR version 12.1 rules specified in the documentation created in Step 1. The name of the file, the record length, and the variable formats must adhere to the rules described in Input File Formats. Note: you may store the data in more than one file. SEER*Prep will process the data files sequentially and combine the data into one SEER*Stat database.

Step 3:  Prepare your Population Data Files

  • Create population data file(s) that meet the criteria documented in the report created in Step 1. The filename(s), record length, and variable formats must also adhere to the rules described in Input File Formats.
  • When making this file be sure to include only the appropriate populations. For this example, the population data file should only contain records with populations for females, in the state of Maryland, for years 1995-1997. If male populations or populations for additional years are included, extra care will be required when using SEER*Stat with this database to prevent the generation of misleading statistics.

Step 4:  Create a Database Description File for Your Database

The Database Description files supplied with the software are meant to be used as templates. Follow these steps to create one containing the exact specifications for your database:

  • Start SEER*Prep.
  • Reopen the Database Description file used in Step 1.
  • Add your incidence file or files to the Input Case Files control.
  • Add your population file or files to the Input Population Files control.
  • Provide a name for the SEER*Stat database to be created. Edit the text in the Database Name control. The name entered here will be shown in the list of databases on SEER*Stat's Data Tab.
  • Press the Edit button next to the label "Study Cutoff Date for Survival". Enter the month and year when the study ended. This date is used to create several variables that SEER*Stat needs to perform survival analysis. If your incidence data does not contain follow-up information or if you are not interested in survival analysis with SEER*Stat, enter December of the latest year of diagnosis in your input incidence file (for this example, December 1997).
  • Use the Save As function on the File menu to save the Database Description with a new name. If you accidentally overwrite the file supplied with the system, you may download a new copy from Input File Formats.

Step 5:  Verify your Data Files

  • Using the checkmark on the toolbar or Verify Data from the Execute menu, create a Verify Report. SEER*Prep will generate a one way frequency of every variable in your incidence and population file.
  • Review the Verify Report and resolve any issues identified in the report.

Step 6:  Create Database

  • Click the lightning bolt on the toolbar or select Create Database from the Execute menu.
  • Use the default for record exclusions (see SEER*Prep help system for more information).
  • Enter a name for the Create Report. SEER*Prep will now create a SEER*Stat database. This database will contain your data converted to a binary format, indices, and dictionaries (format libraries) for SEER*Stat.
  • Review the Create Report, paying particular attention to any Notes/Warnings. These identify potential mismatches between your incidence and population files. For example, if your population file contained information for males, you would get a warning, since your incidence data is only for females.

Step 7:  Use Your New Database in SEER*Stat

  • Exit SEER*Prep and start SEER*Stat. Your new database will be available in the list of databases on the Data Tab (for the appropriate sessions).

For more information, review the answers to the SEER*Prep Frequently Asked Questions and other materials provided in Getting Help.

[Return to top]