# Importing SEER*Stat Data into DevCan Exercise 2

In this exercise, you will import data to calculate the probability, by race, of a female developing or dying of malignant cancer of the breast or genital areas between 2000 and 2002.

## Key Points and Reminders

• This exercise is like the previous one, except that it includes multiple cancer sites.

## Step 1: Prepare the Cancer Incidence data

1. Start SEER*Stat.
2. Start a new Rate Session.
3. On the Data tab, select the database "Incidence - SEER 13 Regs Research Data, Nov 2004 Sub for Expanded Races (1992-2002)".
4. On the Statistic tab, select Rates (Crude) as your type of statistic.
5. Go to the Selection tab.
6. Open the File menu and click Dictionary.
7. Open the Race, Sex, Year Dx, Registry, County folder.
8. Create a User-Defined variable based on "Race Recode Y". (You may have already created this variable in Exercise 1. If you saved it then, you can reuse it now.) It should have three groupings: "All Races" (which includes all available values), "White", and "Black". (Learn more about naming groupings to be imported into DevCan.) Call this variable "Race recode Y (All, White, Black)".
9. Open the Site Specific Sequence Numbers folder.
10. Create a User-Defined variable based on "Site - malignant (most detail)". It should include the following groupings:
• Breast - mal
• Cervix Uteri - mal
• Corpus Uteri - mal
• Uterus, NOS - mal
• Ovary - mal
• Vagina - mal
• Vulva - mal
• Other Female Genital Organs - mal
These groupings should already exist; instead of creating them anew, you can simply delete all of the variable's other groupings.

Call the new variable "Site - mal (most detail) - Female genital".
11. When you are done creating the variables, Close the dictionary.
12. On the Selection tab, Edit the Race, Sex, Year Dx, Registry, County (Pop, Case Files) selection statement to read:
{Race, Sex, Year Dx, Registry, County.Sex} = 'Female'
AND {Race, Sex, Year Dx, Registry, County.Year of diagnosis} = '2000','2001','2002'
13. Edit the Other (Case Files) selection statement to read:
{Site Specific Sequence Numbers.SS seq # - mal (most detail)} = 1
AND {User-Defined.Site - mal (most detail) - Female genital} = 'Breast - mal','Cervix Uteri - mal','Corpus Uteri - mal','Uterus, NOS - mal','Ovary - mal','Vagina - mal','Vulva - mal','Other Female Genital Organs - mal'
14. Do not check the "Select Only Malignant Behavior" or "Select Only the First Matching Record for Each Person" boxes. (Learn more about the "Select Only..." checkboxes.)
15. On the Table tab, arrange the variables as follows:
• Page
• Site - mal (most detail) - Female genital
• Row
• Race recode Y (All, White, Black)
• Column
• Age Recode with <1 year olds
Remember not to arrange the variables in a different order.
16. Go to the Output tab.
17. Enter a title for the matrix.
18. Choose to Display Rates as Cases Per 100,000.
19. Execute the session. Your matrix will be calculated and displayed in a new window.
20. Save the matrix with a filename that identifies it as the Cancer Incidence matrix. Compare it to ssdc2_cancer_incidence.sim if necessary.
21. Open the Matrix menu. Select Export, then Text File.
22. Set up the options as follows:
• Output Variables as: Numeric Representation
• Line Delimiter: DOS/Windows (CR/LF)
• Missing Character: Space
• Field Delimiter: Tab
• Check the boxes to Remove All Thousands Separators (Commas) and Remove Flags (Footnote), Prefix and Suffix Characters. Leave the other checkboxes unmarked.
23. Export the matrix with a filename that identifies it as the Cancer Incidence data.

• Naming groupings to be imported into DevCan: DevCan expects that the first characters in the name of any grouping in an age variable will be the starting age of that grouping. So, for example, a grouping containing the ages 65 - 69 could be named "65-69" or "65 and up", but if it were named "Ages 65-69" or ">= 65", DevCan would not be able to import the variable. Note that this will cause a problem in the case of invalid data, since SEER*Stat uses the label "Invalid value(s)" for a grouping containing data in an invalid format. Non-age variables, such as we use in these tutorials, are not affected by this restriction.
• The "Select Only..." checkboxes: In Exercise 1, you marked the "Select Only Malignant Behavior" and "Select Only the First Matching Record for Each Person" boxes on the Selection tab. In this exercise, you should leave them unmarked, because they are redundant with criteria you have already established in the selection statements. In particular, note that marking "Select Only the First Matching Record for Each Person" would limit your results to the first cancer of any of the types you specified, whereas you want to find the person's first of each of multiple types of cancer, which you have already indicated by specifying a sequence number of '1'.

## Step 2: Prepare the Cancer Mortality data

1. Start a new Rate Session.
2. On the Data tab, select the database "Mortality - All COD, Public-Use With State, Total U.S. for Expanded Races (1990-2002)".
3. On the Statistic tab, select Rates (Crude) as your type of statistic.
4. Go to the Selection tab.
5. Edit the Race, Sex, Year Dth, State, Cnty, Reg (Pop, Case Files) selection statement to read:
{Race, Sex, Year Dx, Registry, County.Sex} = 'Female'
AND {Race, Sex, Year Dth, State, Cnty, Reg.Year of death} = '2000','2001','2002'
AND ({Race, Sex, Year Dth, State, Registry.SEER registry} = 'San Francisco-Oakland SMSA','Connecticut','Detroit (Metropolitan)','Hawaii','Iowa','New Mexico','Seattle (Puget Sound)','Utah','Atlanta (Metropolitan)','San Jose-Monterey','Los Angeles','Rural Georgia'
OR ({Race, Sex, Year Dth, State, Registry.SEER registry} = 'Alaska'
AND {Race, Sex, Year Dth, State, Registry.Race recode Y} = 'American Indian/Alaska Native'))
Note that the SEER 13 registries are not listed consecutively; double-check that you have selected the right ones. Also note the order of parentheses in the last three lines of the statement. Since the Alaska Native Tumor Registry only collects data on cancer incidence in patients whose race is "American Indian/Alaska Native", you must narrow the mortality selection statement so that cancer deaths among other races in that registry are not included in the analysis.
6. Edit the Other, Case Files selection statement to read:
{Site and Morphology.Cause of death recode} = 'Breast','Cervix Uteri','Corpus Uteri','Uterus, NOS','Ovary','Vagina','Vulva','Other Female Genital Organs'
Now your analysis will include all females in the SEER 13 Registries who died from any of these types of cancer between 2000 and 2002.
7. Go to the Table tab.
8. Open the File menu and click Dictionary.
9. Open the Race, Sex, Year Dth, State, Registry folder.
10. Just as you did in the Incidence session, Create a User-Defined variable based on "Race Recode Y". (If you saved this variable to the dictionary in Exercise 1, you can simply reuse it now.) It should have three groupings: "All Races" (which includes all available values), "White", and "Black". Call it "Race recode Y (All, White, Black)". As before, it is up to you whether you save this variable to the dictionary.
11. Open the Site and Morphology folder and Create a User-Defined variable based on "Cause of death recode". This variable must correspond to the "Site - mal (most detail) - Female genital" variable you created in the Incidence session, so delete each grouping which does not match a grouping in that variable. You should be left with these groupings:
• Breast
• Cervix Uteri
• Corpus Uteri
• Uterus, NOS
• Ovary
• Vagina
• Vulva
• Other Female Genital Organs
Each of these groupings should have one value, the one that shares its name. Call this variable "Cause of death recode - Female genital".
12. Close the dictionary.
13. Arrange the variables as follows:
• Page
• Cause of death recode - Female genital
• Row
• Race recode Y (All, White, Black)
• Column
• Age Recode with <1 year olds
Remember not to arrange the variables in a different order.
14. Go to the Output tab.
15. Enter a title for the matrix.
16. Choose to Display Rates as Cases Per 100,000.
17. Execute the session. Your matrix will be calculated and displayed in a new window.
18. Save the matrix, in the same place as the previous one, with a filename that identifies it as the Cancer Mortality matrix. Compare it to ssdc2_cancer_mortality.sim if necessary. Do not close the session window.
19. Open the Matrix menu. Select Export, then Text File.
20. Edit the settings the same way as before.
21. Export the matrix with a filename that identifies it as the Cancer Mortality data.

## Step 3: Prepare the All Causes of Mortality data

1. Return to the session window you used to create the Cancer Mortality matrix. If you have closed this window, you can still retrieve it:
1. If the Cancer Mortality Matrix is not open in SEER*Stat, re-open it. If it is open, click on it to be sure it is the active window.
2. Open the Matrix menu and select Retrieve Session.
2. Go to the Selection tab.
3. Clear the Other (Case Files) selection statement.
4. Go to the Table tab.
5. Remove the "Cause of death recode - Female genital" variable from the Page section. Now your analysis will include all deaths of females in the SEER 13 Registries between 2000 and 2002, instead of only deaths from cancer of the breast and genital areas.
6. Go to the Output tab and enter a new title for the matrix.
7. Execute the session. Your matrix will be calculated and displayed in a new window.
8. Save the matrix, with a filename that identifies it as the All Causes of Mortality matrix. Compare it to ssdc2_all_causes_of_mortality.sim if necessary.
9. Open the Matrix menu. Select Export, then Text File.
10. Edit the settings the same way as before.
11. Export the matrix with a filename that identifies it as the All Causes of Mortality data.

## Step 4: Import the data into DevCan

1. Start DevCan.
2. Open the Database menu and click Import SEER*Stat Data.
3. Use the Browse buttons to locate the ".dic" files you exported from SEER*Stat. Click Execute.
4. You will be prompted to enter a new database name with which to save this data. Choose any name that is not already in use by another DevCan database. Click OK when done.
5. DevCan will ask whether you want to import Counts and Populations from the exported SEER*Stat files. You should click Yes. If you choose to import Rates instead, you will not be able to calculate confidence intervals in DevCan. After you make this choice, the data will be imported and loaded automatically.
6. DevCan will display a summary of the variables being imported, with a warning that the number of variables in your files is inconsistent. This is because the Incidence of Cancer and Cancer Mortality files needed an extra variable to represent the different types of cancer you specified, but the All Causes of Mortality file does not need that variable since it simply lists all deaths in the chosen time period. The warning message will tell you not to continue unless you are importing data on multiple cancer sites. You are, so click OK. The data will be imported and loaded automatically.
7. The "Site - mal (most detail) - female genital" and "Race Recode Y (All, White, Black)" cohort variables you created in SEER*Stat will be listed in the Parameters section. In turn, click to highlight each cohort variable, then click and drag in the Items Selected section to highlight the values for which you want to generate reports.
8. Use the drop-down list on the DevCan toolbar to select how you want the output displayed.
9. Open the Session menu and click Execute.
10. Your probabilities will be calculated and displayed. Click the tabs above the output window to switch between different sets of statistics. If you selected multiple values for any of the variables, click the combinations of values in the Cohort section to see each set of statistics.
11. If you want to Save and/or Print the reports, use the appropriate commands on the File menu.