**1. extract SEER info using the index variable CAINDX**; array _mon(*) modx1-modx10; array _yr(*) yrdx1-yrdx10; array _hist(*) hist1-hist10; do i=1 to 10; if i=caindx then do; **found the selected cancer**; ca_month = _mon(i); **month of diagnosis for the selected cancer**; ca_year = _yr(i); **year of diagnosis for the selected cancer**; ca_hist = _hist(i); **histology for the selected cancer**; i=11; end; end; **2. Keep surveys before any cancer**; If INSEER = 1; If NUMCABEF = 0; **3. Keep all surveys before the selected cancer and the cancer is the first primary (seq no. = 00 or 01)**; If INSEER = 1; If FIRSTCA=1; If NUMCABEF = 0; **4. Keep most recent or last survey before the selected cancer and the cancer is the first primary **; **a. select all surveys before the cancer as in #2**; If INSEER = 1; If FIRSTCA=1; If NUMCABEF = 0; **b. take the most recent or last survey before the cancer**; proc sort data=input_data; **replace input_data with actual dataset name**; by linkid srvdate srvseq; run; data input_data; set input_data; by linkid srvdate srvseq; if last.linkid; run; **5. Keep all surveys after the selected cancer, and patient did not have any other cancer, only one primary with SEQ = “00”**; If INSEER = 1; If ONLYPRIM=1; If NUMCABEF = 1 and NUMCAAFT = 0; **6. Keep the most recent or first survey after the selected cancer**; **a. select all surveys where the selected cancer is the most recent cancer before the survey**; If INSEER = 1; If MostRectCA=1; *most recent cancer before the survey*; **b. take the most recent or first survey after the selected cancer**; proc sort data=input_data; **replace input_data with actual dataset name**; by linkid srvdate srvseq; run; data input_data; set input_data; by linkid srvdate srvseq; if first.linkid; run; **7. Patient must have lived in SEER area at time of survey**; If SEERAREA = 1; **8. Delete duplicate survey records for patients in more than one cohort. It occurs when the same survey completed**; ** by a patient is used as the follow-up survey in one cohort and the baseline survey in another cohort.**; proc sort nodupkey data=input_data; **replace input_data with actual dataset name**; by linkid srvdate; run; **9. Identify patients with surveys before and after cancer diagnosis.**; ** In this example, select patients with no cancer before their first survey,**; ** and their first cancer must = cancer of interest**; title2 "SEER patients with no cancer before the first survey and that the first cancer is the cancer of interest"; **cancer specific(ex. Breast cancer) survey level file for cohorts 1-8*; filename in pipe "gunzip -c mhos.analysis.ch1to8.requests.breast.v8x.gz"; proc cimport infile=in data=in; run; **cancer of interest is the first cancer**; data fstca; set in; where firstca=1; run; **survey records before the cancer**; data before; set fstca; if numcabef=0; run; **to get the most recent(last) survey before the cancer diagnosis**; proc sort data=before; by linkid srvdate cohort srvtype; run; data mrc_bef; set before; by linkid srvdate cohort srvtype; if last.linkid; run; **survey records after the cancer**; data after; set fstca; if 0