Catalog API

The HealthData.gov API is used to provide software developers with programmatic access to the contents of our data catalog. The API can be used to find recently added datasets, to search the catalog, to download the contents of the catalog for analysis, or to build a new data catalog tool.

HealthData.gov uses CKAN for its API. We are running CKAN version 1.7. Documentation for CKAN’s API can be found at http://docs.ckan.org/en/latest/api-v2.html, and CKAN’s support for RDF is described at http://wiki.ckan.org/RDF_and_CKAN.

Questions about the data catalog API can be sent to the HealthDataGov Google Group.

The base URL for the HealthData.gov API is http://hub.healthdata.gov/api.

In this article:

Accessing the Complete Data Catalog Listing

A JSON listing of every dataset in the catalog can be accessed at http://hub.healthdata.gov/api/2/rest/dataset.

The response is a JSON list of dataset IDs (GUIDs). It looks like this:

["0056861d-28cd-4f8d-97b3-6205517637c3", "00aada73-a456-4547-ac5a-e5ffdc6b4847", "02588273-41d6-4ae5-a90a-1e336d0f129e", "03edc320-4eb7-4089-b66a-a54760a44b28", "0477da33-0795-4669-bba5-cc494604b022", "05457387-7ab6-4c1a-9dba-b1e5bdd5f2ad", "05b7319c-20a1-43f5-a01a-3847933d4ccf", "0660d0f4-b600-4d1e-a0be-228fc2857a12", "067da109-762f-4417-acea-521f227aea42", "088c4f1b-b266-40e8-a12b-cda2b97670eb", "08d78f4d-40c0-4691-948c-a4f17df65e59", "09bda462-ef6b-43ee-955f-b3e40d288eec", . . .

You can also get a list of slugs (i.e. the name that goes into the URL for each dataset) rather than GUIDs using http://hub.healthdata.gov/api/1/rest/dataset. The response is a JSON list of strings:

["dietary-supplements-labels", "nursing-home-compare", "child-growth-charts", "home-health-compare", "genetics-home-reference", "renal-dialysis-facility-medicare-cost-report-data-1996", "renal-dialysis-facility-medicare-cost-report-data-2001", "health-resources-county-comparison", "home-health-agency-medicare-cost-report-data", "omha-appeals-listed-state", "renal-dialysis-facility-medicare-cost-report-data", "hospital-medicare-cost-report-data-fy1995", "2008-basic-stand-alone-hospice", "find-shortage-areas-hpsas-eligible", "part-national-summary-data-file-cy2004", "departmental-appeals-board-decisions", . . .

You can alternatively get the complete dataset metadata record with each response using http://hub.healthdata.gov/api/search/dataset?all_fields=1&rows=500. The fields are described in more detail below.

Searching the Data Catalog

The data catalog can be searched using URLs such as:

http://hub.healthdata.gov/api/search/dataset?q=medicare&start=0&rows=20

Use the q parameter to specify the search term. Note that the results are paged. Use start and rows to specify the page of results to load. See the CKAN search API documentation for details. The response for the above search is:

{"count": 164,
 "results":
  ["medicare-enrollment-dashboard", "medicare-tools-downloadable",
   "medicare-appeals-council-decisions", "medicare-appeals-council-decisions-1",
   "medicare-medicaid-statistical", "medicare-geographic-variation", "data-compendium",
   "chronic-conditions-chart-book", "helpful-contacts", "plans-quality-compare",
   "2008-chronic-conditions", "2008-basic-stand-alone-hospice",
   "2008-basic-stand-alone-durable", "2008-basic-stand-alone-prescription",
   "active-project-reports", "2008-basic-stand-alone-home",
   "2008-basic-stand-alone-skilled", "2008-basic-stand-alone-carrier", "claims-listed-state",
   "omha-appeals-listed-state"]
}

The primary fields (see below) support filtering. Use author=___ to filter the results by agency. For instance http://hub.healthdata.gov/api/search/dataset?author=Centers%20for%20Medicare%20%26%20Medicaid%20Services returns only datasets submitted by the Centers for Medicare & Medicaid Services:

{"count": 149,
 "results":
  ["2008-basic-stand-alone-carrier", "2008-basic-stand-alone-durable",
  "2008-basic-stand-alone-home", "2008-basic-stand-alone-hospice",
  "2008-basic-stand-alone-inpatient", "2008-basic-stand-alone-outpatient",
  "2008-basic-stand-alone-prescription", "2008-basic-stand-alone-skilled",
  "2008-chronic-conditions", "active-project-reports"]
}

You can also search for recently revised entries using URLs such as http://hub.healthdata.gov/api/search/revision?since_time=2012-05-05. The result is a list of revision GUIDs. You can find the dataset GUID from a revision GUID by appending the revision GUID to http://hub.healthdata.gov/api/2/rest/revision/, such as http://hub.healthdata.gov/api/2/rest/revision/b1dae0c1-10d6-4c4d-8f2b-e9eb46d59d7d, which gives this output:

{
 "id": "b1dae0c1-10d6-4c4d-8f2b-e9eb46d59d7d",
 "timestamp": "2012-05-30T22:16:35.228513",
 ...
 "packages": ["e5784720-a9a5-407e-bc36-84420289f1a9"],
 "groups": []
}

The dataset GUID is the GUID in the packages element (b7de8bdd-2edc-4713-888d-d6cb87c7196b). You can plug that into the dataset details API explained next.

Accessing Individual Datasets

Dataset metadata is available in machine-readable form in JSON, RDF/XML, and Notation 3.

JSON

To access a particular dataset in JSON, append the dataset GUID to http://hub.healthdata.gov/api/2/rest/dataset/. The response is a JSON object containing information about the dataset. For instance the URL http://hub.healthdata.gov/api/2/rest/dataset/e5784720-a9a5-407e-bc36-84420289f1a9 gives:

{
 "id": "e5784720-a9a5-407e-bc36-84420289f1a9",
 "metadata_created": "2012-05-30T22:16:35.228513",
 "metadata_modified": "2012-05-30T22:16:35.228513",
 "author": "Centers for Medicare & Medicaid Services",
 "tags": ["claims", "enrollment", "expenditures", "inpatient", "managed care",
    "medicaid", "prescription drug"],
 "name": "validation-reports",
 "notes_rendered": "<p>Medicaid Analytic eXtract (MAX) Validation Reports ...",
 "url": "http://www.cms.gov/MedicaidDataSourcesGenInfo/MVR/list.asp",
 "notes": "Medicaid Analytic eXtract (MAX) Validation Reports These documents contain ...",
 "title": "MAX Validation Reports",
 "extras": {
  "Unit of Analysis": "Person",
  "hd2-workflow-id": "753",
  "Agency": "Department of Health & Human Services",
  "Geographic Granularity": "State",
  "Technical Documentation": "http://www.cms.gov/MedicaidDataSourcesGenInfo/MVR/list.asp",
  "Collection Frequency": "Annually",
  "Agency Program URL": "http://www.cms.gov/MedicaidDataSourcesGenInfo/MVR/",
  "Date Updated": "2011-10-19",
  "Date Released": "2003-01-01",
  "author_id": "http://healthdata.gov/id/agency/cms",
  "Subject Area 1": "Medicaid",
  "Geographic Scope": "State"
 },
 "revision_id": "b1dae0c1-10d6-4c4d-8f2b-e9eb46d59d7d"
}

To find the URL for a dataset, you can also look for the link in the “Metadata API” field on the dataset page on www.healthdata.gov.

CKAN has three types of fields: primary fields, “extras” (general metadata), and “resources” (downloadable files). All but the primary fields are optional. Field definitions are documented at the end of this page.

RDF XML and Notation 3 (N3)

You can also access the dataset metadata in RDF, in either XML or Notation 3 format. The URL to these resources can be made by concatenating http://hub.healthdata.gov/dataset/, the dataset GUID or name, and either “.rdf” or “.n3”. (It is the public page for the dataset on our CKAN site plus the file extension. Alternatively you can set the HTTP Accept header to application/rdf+xml or text/n3 on the public page URL.)

Taking the same dataset as above, the RDF metadata can be accessed at http://hub.healthdata.gov/dataset/e5784720-a9a5-407e-bc36-84420289f1a9.rdf. We use Dublin Core, DCAT, and other vocabularies as appropriate.

You can also find the URL in the Metadata API field on www.healthdata.gov's dataset pages.

JSON Schema

The JSON output for datasets uses the following schema:

Primary Fields

field type description

id

GUID

The unique identifier for the dataset in the HealthData.gov API.

title

plain text

The display name for the dataset.

notes

plain text

The description of the dataset.

notes_rendered

HTML text

The description of the dataset rendered in HTML using Markdown.

author

plain text

The name of the federal agency that submitted the dataset to HealthData.gov.

url

url

The URL to the home page for the dataset, which may link to downloadable files.

tags

array of strings

Tags associated with the dataset.

Extras Fields

field type description

author_id

uri

A URI uniquely identifying the agency submitting the data. The URI is in the http://healthdata.gov/id/agency space and while it does not currently resolve to a resource it can be used as a canonical identifier for the agency.

Group Name

plain text

A display name shared across datasets that are related.

Agency

plain text

The name of the federal department submitting the data. Generally “Health and Human Services.”

Subject Area 1

string

A subject area. Subjects come from a fixed vocabulary, currently: Administrative, Biomedical Research, Children's Health, Epidemiology, Health Care Cost, Health Care Providers, Medicaid, Medicare, Other, Population Statistics, Quality Measurement, Safety, Treatments.

Subject Area 2

string

A subject area. See above.

Subject Area 3

string

A subject area. See above.

Date Released

date

The date the dataset was first made available to the public (possibly before it was posted on HealthData.gov). Format: YYYY-MM-DD.

Date Updated

date

The date the dataset was last changed, i.e. the last change to the data itself and not necessarily the metadata record. Format: YYYY-MM-DD.

Agency Program URL

url

The URL of the agency program responsible for the data.

Collection Frequency

string

The frequency with which the data was collected, which is sometimes different from the frequency at which the data is published. Possible values are Annually, Semi-Annually, Quarterly, Monthly, Weekly, Daily.

Coverage Period Start

date

The start of the coverage period, i.e. the date range that the data pertains to. Format: YYYY-MM-DD.

Coverage Period End

date

The end of the coverage period, i.e. the date range that the data pertains to. If the coverage period end date is omitted, the dataset may cover the period from the start date to the present time. Format: YYYY-MM-DD.

Coverage Period Fiscal Year Start

year

For coverage periods that are based on fiscal years rather than calendar years, the starting fiscal year of the coverage period. Format: YYYY.

Coverage Period Fiscal Year End

year

For coverage periods that are based on fiscal years rather than calendar years, the ending fiscal year of the coverage period. If the coverage period end fiscal year is omitted, the dataset may cover the period from the starting fiscal year to the present time. Format: YYYY.

Unit of Analysis

plain text

The unit of analysis, i.e. the object of study. Examples are “recalled food items” and “renal dialysis facility”.

Geographic Scope

plain text

The geographic region covered by the dataset. If omitted, the dataset is typically national in scope.

Geographic Granularity

string

The granularity of the geographic coverage. Possible values are Latitude/Longitude Coordinate, Street Address, Census Tract, City, MSA (metropolitian statistical area), ZIP Code, County, State, Sub-National Region, and Country.

Technical Documentation

url

The URL to technical documentation for the dataset.

Data Dictionary

url

The URL to a data dictionary for the dataset.

Collection Instrument

url

The URL to information about the data collection instrument.

License Agreement Required

integer

Whether a license agreement must be agreed to before using the data (1 if yes, 0 if no, omitted if not known).

License Agreement

url

The URL to a license agreement that must be agreed to before using the data.

Resource Fields

A dataset may have one or more resource records, each of which represents a downloadable file or a query tool interface. Multiple files are often specified when the dataset is available in multiple formats. Each resource record uses these fields:

field type description

url

url

The URL of the downloadable file or the query interface.

name

plain text

The display name of the media format, e.g. CSV. Currently the same as the format attribute.

format

string

The media format. Possible values are API, CSV, ESRI, Feed, KML, Map, Query Tool, RDF, Text, Widget, XLS, XML.