Chapter 2b Creation of New Race-Ethnicity Codes and Socioeconomic Status (SES) Indicators for MEdicare Beneficiaries - Chapter 2b 2. Methods and Data (continued) 2.5.2 Running the GeoCode ProgramIn testing the GeoCode program, we discovered that the program had a tendency for erratic performance. The help staff at GeoLytics seemed unable to explain the variations in performance. The primary problem was due to a lookup error-"failure to open data member" (eFOM). Between two and six percent of addresses we tested returned this error. Upon examination, we could not find any syntax errors that prevented these records from being successfully coded, and the technical support people at GeoLytics could not explain why these errors were occurring. However, we found that when we ran the addresses receiving the eFOM error code back through the GeoCode CD program a second time by themselves, they were matched at a 100 percent success rate.The GeoLytics GeoCode CD program product allows the user to choose a variety of options that alter the balance between completeness of address coverage and speed of processing. In order to obtain maximum coverage, and thereby match the most addresses possible, we ran the GeoCode CD program with the following options turned on:Allow phonetic match of state name. – The geocoder phonetically matches the full state name in an address (but not an abbreviation).Allow place-based ZIP code match. – If a street is not found in a ZIP, the geocoder scans other ZIP codes associated with the place (typically a city or a town) for a match.Allow phonetic match of street name. – The geocoder uses a phonetic match for street names (e.g., an input address with the street name "Maine St." is considered a match with Main St. in the database).Disregard parity for address match. – Normally, the geocoder matches even/odd addresses with even/odd address ranges. This option disregards this practice.Allow closest address match. – The geocoder finds the closest address range to match the house number (rather than an exact one).Allow fuzzy street type match. – The geocoder will match addresses with the same street name, even if the street types are different (e.g., Greenwood Drive is considered a match with Greenwood Road).Geocode no matter what. – If it cannot find an exact match, the geocoder will assign to the address the census coordinates associated with the center of a ZIP code (ZIP centroid12), or the center of a state (state centroid).The GeoCode program outputs two files as it runs—a text file (*.txt) summarizing the geocoder performance, the accuracy codes, and the error codes; and a database file (*.dbf) containing the fields selected by the user. For each database file, we selected the following fields13:FieldDescriptionSEQNOSequential NumberADDRESSInput AddressACCURACYAccuracy and Error CodesBLOCKMatched Block CodePLACEPlace FIPS CodeMCDMCD (Minor Civil Division) CodeSTATEState FIPS CodeZIPZIP Code for 2003PLACENAMEMatched Place NameAreaKeyBlock Group CodeThe sequential number field contains a number between 1 and n, where n is the total number of records processed by the program. The input address is the address in the STREET, CITY, STATE ZIP format constructed and output by the address cleaning SAS program. Accuracy and error codes are explained below. The matched block code is a string of fifteen digits that indicates, respectively, an individual's state (2 digit FIPS code), county (3 digit FIPS code), census tract (6 digit FIPS code), and block (4 digit FIPS code, the first digit in the 4-digit string indicates the block group). The full string constitutes a unique, block-level identifier. Any persons living within the same block will have the same matched block code. Place indicates the city or town FIPS code, and MCD indicates the Minor Civil Division code. The area key is basically a substring of the matched block code that contains the first twelve, rather than the full fifteen digits, and constitutes a unique block group-level identifier.Return to Contents 2.5.3 Summary of GeoCode program accuracy codesFailure details. The geocoding process can fail for a number of reasons, including setup or programmatic errors, a missing database entry, or an invalid input address. Failures fall under two general categories: syntax/lookup errors and programmatic/setup errors. Failed GeoCode results are indicated by error codes, which are summarized in Tables 2.6 and 2.7. Table 2.6 GeoCode program syntax and lookup errorsError CodeError MessageeIHNMissing or invalid house number*eIStMissing or invalid street name*eITyMissing or invalid street typeeINaMissing or invalid city nameeISNMissing or invalid state name/abbrev*eIZIMissing or invalid ZIP code*eIAdIncomplete or malformed address*eUAFUnknown address formateMiAMissing addresseNZIFailed to lookup ZIP codeeANFAddress not foundeSNFStreet not found*Errors encountered while geocoding EDB addresses.Source: GeoLytics Incorporated of East Brunswick, New Jersey—GeoCode CD program 2003, Version 1.02. Table 2.7 GeoCode program programmatic and setup errorsError CodeError MessageeGNOGeoCode has not been openedeFODFailed to open databaseeFOFFailed to open data file NAMEeFOMFailed to open data member NAME*eMiFMissing file NAMEeGOFGeneral open failure, file NAMEeFA1Failed to allocate memoryeNASNo address data for state NAME*eNSZNo data for state-zip NAMEeSSOString size overfloweOKIOutput file kind invalid NAMEeOF1Output failure NAMEeOLIOutput field list invalid NAME*Errors encountered while geocoding EDB addresses.Source: GeoLytics Incorporated of East Brunswick, New Jersey—GeoCode CD program 2003, Version 1.02.Success details. The GeoCode program also indicates how successful it has been in matching addresses to FIPS codes. In addition to indicating accurate or exact matches, it indicates what kinds of "adjustments" it made to successfully match the address to a place with a FIPS code. Successful match details are presented in Table 2.8. Some successful results will generate accuracy codes indicating that the geocoder could only code the address by using some of the fallback matching options described above. Its worth noting that GeoCode CD may employ more than one of these fallback matching options to find a match for a particular address. Table 2.8 GeoCode program accuracy codes and messagesAccuracyAccuracy MessageaNP1Place not found*aNPaAddress match with no parity*aCAdClosest address match*aFTyFuzzy street type match*aPhMPhonetic match*aNMaNo match foundaNMPNo match performedaPBZPlace-based ZIP match*aSpCSpelling corrected*aStCState centroid used*aSEnStreet end used*aZICZIP centroid used*aInDInaccurate direction**Accuracy options encountered while geocoding EDB addresses.Source: GeoLytics Incorporated of East Brunswick, New Jersey—GeoCode CD program 2003, Version 1.02.Test results using the GeoCode program on the CAHPS sample addresses. Table 2.9 below summarizes the error and accuracy results from the CAHPS sample test file. It indicates that 8.4 percent of the 830,728 CAHPS sample addresses taken from the EDB were dropped because they were uncodeable by the GeoCode program for some reason, very often for having a box number instead of a street address. It also shows that of the remaining 760,961 addresses (91.6 percent of the original total), all but four-tenths of a percent (0.4 percent) were successfully geocoded. The process we followed in this test yielded an overall total successful match of 91.2 percent of the EDB addresses to Census block group level FIPS codes. Table 2.9 Summary of GeoCode error and accuracy results for the CAHPS test fileCAHPS/EDB Test FileResultsNumberPercentOriginal number of records830,728100.0Number of records dropped (uncodeable)69,7678.4Addresses processed760,96191.6...Successfully geocoded (first iteration)719,22094.5...Successfully geocoded eFOM records (second iteration)38,3225.0...Total failed3,4190.4GeoCode success rate757,54299.6Percent total test file records matched 91.2Success details* Accurate Match477,74662.8Place Not Found77,27310.2Address match with no parity5,9310.8Closest address match37,9845.0Fuzzy street type match86,70111.4Phonetic match37,8475.0Place-based ZIP match16,5192.2Spelling corrected00.0State centroid used9050.1Street end used3,8710.5ZIP centroid used63,0318.3Inaccurate direction20,5252.7Failure details Failed due to syntax error3,4180.4...Missing or invalid house number3,3670.4...Missing or invalid state name/abbreviation00.0...Missing or invalid ZIP code470.0...Incomplete or malformed address40.0Failed due to lookup error38,3235.0...Failed to open data member (eFOM)38,3225.0...No address data for state10.0*Note: Success detail categories reflect distribution of accuracy codes. These codes are NOT mutually exclusive. Some addresses can have up to four accuracy codes associated with them.Source: Result of running GeoCode CD program 2003 Version 1.02 on addresses from Medicare EDB from mid-2003 for respondents to the Medicare CAHPS fee-for-service, managed care enrollee, and disenrollee surveys for 2000-2002.Return to Contents 2.5.4 Application of the GeoCode Program Processing to the Full EDBWe obtained the 10 segments of the full unloaded EDB from CMS in mid-2003. Because each segment of the EDB contained more than four million beneficiary records, we processed each segment separately, first extracting the addresses and other necessary identification variables from the EDB, correcting the addresses using the SAS programs we developed, and finally running them through the GeoCode program. Each segment of the EDB was run through the GeoCode program separately. The program took from 16 to 36 hours to process and match the more than four million records contained in each segment. As indicated above in the description of the test results on the CAHPS sample addresses, it was necessary to rerun the addresses with an eFOM error that failed to match on the first iteration, and virtually all of them were successfully matched on the second iteration through the GeoCode program.Run EDB segments through the GeoCode program. The results of the GeoCode program processing are summarized in Table 2.10 for all 10 segments of the unloaded EDB combined. The results were extremely similar for each of the 10 segments. Overall, 86.8 percent of the 41,742,407 addresses of Medicare beneficiaries were processed through the Geocode program. Ninety-nine and two tenths percent of the addresses that were processed (or 36,223,053) were successfully matched to a FIPS code that included the block group. As Table 2.8 shows, 61 percent of the matches made were exact with the addresses that were input.Import Geocode output files and merge with EDB records. We used PROC IMPORT in SAS 8.2 to transform the database (*.dbf) files produced by the GeoCode program into SAS data files (*.sas7bdat). Using the ADDRESS field we prepared as input from the EDB to the GeoCode program as the common key (common to the EDB and the GeoCode output), we merged the output files (containing Census-based geographic identifiers including the AreaKey number string that identifies block groups) onto the EDB records.Return to Contents 2.5.5 Results of Geo-coding the Sample of 1.96 Million Medicare BeneficiariesThe sample of 1.96 million Medicare fee-for-service beneficiaries is a subset of the beneficiaries geocoded from the mid-2003 EDB. The results of the geocoding for the 1.96 million are presented in Table 2.11. While the table indicates that 81 percent (1,588,121 out of 1,960,121) of the addresses for the sample members were successfully geocoded, this was with allowing the use of ZIP code and state centroid when there was no other way to achieve a successful match of the input address to a Census-listed address. It should be noted that we did rerun unmatched addresses from the mid-2003 EDB as well as those that changed from the mid-2003 through the Geocode CD in the hope of more completely and correctly geocoding sample members.We know from analyses performed in sub-task one of this task order that most of the state centroid matches (4,090) are not true matches at all, but forced to the state centroid by the GeoCode CD program on addresses that are foreign. The same may be true of some of the Zip (159,217) centroid matches as well. We feel very confident saying, however, that based upon our validation of address block group matching against the Census, that the true match rate at the block group level for the sample is most likely at least 75 percent. Table 2.10 Summary of GeoCode error and accuracy codes for the 10 segments of the EDB combinedResultsSumsPercentOriginal number of records41,742,407100.0Number of records dropped (uncodeable)5,223,76612.5Addresses processed36,518,64187.5...Successfully geocoded (first iteration)35,108,32996.1...Successfully geocoded eFOM records (second iteration)1,114,7243.1...Total failed295,5880.8GeoCode success rate36,223,05399.2Percent total EDB records matched 86.8Success details* Accurate Match20,028,63361.0Place Not Found3,216,8689.8Address match with no parity281,5540.9Closest address match1,821,8935.5Fuzzy street type match3,919,79211.9Phonetic match1,752,8585.3Place-based ZIP match799,8362.4Spelling corrected100.0State centroid used47,2520.1Street end used181,2700.6ZIP centroid used2,972,2749.0Inaccurate direction1,027,3773.1Failure details Failed due to syntax error262,1760.8...Missing or invalid house number175,5610.5...Missing or invalid state name/abbreviation40.0...Missing or invalid ZIP code86,3350.3...Incomplete or malformed address2760.0Failed due to lookup error1,022,2673.4...Failed to open data member (eFOM)1,018,4833.4...No address data for state3,7840.0*Note: Success detail categories reflect distribution of accuracy codes. These codes are NOT mutually exclusive. Some addresses can have up to four accuracy codes associated with them.Source: Result of running GeoCode CD program 2003 Version 1.02 on addresses from Medicare EDB from mid-2003 for respondents to the Medicare CAHPS fee-for-service, managed care enrollee, and disenrollee surveys for 2000-2002. Table 2.11 Success with Geocoding of the Medicare Beneficiaries Included in the RTI Sample of 1.96 MillionSampleNumberTotal Sample1,960,121Successfully geocoded1,588,607GeoCoding Success Rate81.0% Success Details Exact Match920,390Other Accuracy Code504,910Zip Centroid159,217State Centroid4,090Source: Result for sample of 1.96 million of running GeoCode CD program 2003, Version 1.02 on addresses from Medicare EDB from mid-2003.12The centroid of a 5-digit ZIP code area is the balance point of the polygon formed by its boundaries. The centroid is calculated based on the coordinate extremes of the polygon.13One field we did not include, the MATCH field, contained the full address that the GeoCode search engine determined to be the closest match to the input address. We had intended to include this field, but during the testing phase, we discovered problems with the MATCH field that led to major problems when trying to transform the *.dbf files into SAS files.Return to Contents Proceed to Next Section Current as of January 2008 Internet Citation: Chapter 2b: Creation of New Race-Ethnicity Codes and Socioeconomic Status (SES) Indicators for MEdicare Beneficiaries - Chapter 2b. January 2008. Agency for Healthcare Research and Quality, Rockville, MD. http://www.ahrq.gov/research/findings/final-reports/medicareindicators/medicareindicators2b.html