Catalog API

The HealthData.gov API is used to provide software developers with programmatic access to the contents of our data catalog. The API can be used to find recently added datasets, to search the catalog, to download the contents of the catalog for analysis, or to build a new data catalog tool.

HealthData.gov uses CKAN for its API. We are running CKAN version 1.7. Documentation for CKAN’s API can be found at http://docs.ckan.org/en/latest/api-v2.html, and CKAN’s support for RDF is described at http://wiki.ckan.org/RDF_and_CKAN.

Questions about the data catalog API can be sent to the HealthDataGov Google Group.

The base URL for the HealthData.gov API is http://hub.healthdata.gov/api.

In this article:

Accessing the Complete Data Catalog Listing
Searching the Data Catalog
Accessing Individual Datasets
JSON Schema

Accessing the Complete Data Catalog Listing

A JSON listing of every dataset in the catalog can be accessed at http://hub.healthdata.gov/api/2/rest/dataset.

The response is a JSON list of dataset IDs (GUIDs). It looks like this:

["0056861d-28cd-4f8d-97b3-6205517637c3", "00aada73-a456-4547-ac5a-e5ffdc6b4847", "02588273-41d6-4ae5-a90a-1e336d0f129e", "03edc320-4eb7-4089-b66a-a54760a44b28", "0477da33-0795-4669-bba5-cc494604b022", "05457387-7ab6-4c1a-9dba-b1e5bdd5f2ad", "05b7319c-20a1-43f5-a01a-3847933d4ccf", "0660d0f4-b600-4d1e-a0be-228fc2857a12", "067da109-762f-4417-acea-521f227aea42", "088c4f1b-b266-40e8-a12b-cda2b97670eb", "08d78f4d-40c0-4691-948c-a4f17df65e59", "09bda462-ef6b-43ee-955f-b3e40d288eec", . . .

You can also get a list of slugs (i.e. the name that goes into the URL for each dataset) rather than GUIDs using http://hub.healthdata.gov/api/1/rest/dataset. The response is a JSON list of strings:

["dietary-supplements-labels", "nursing-home-compare", "child-growth-charts", "home-health-compare", "genetics-home-reference", "renal-dialysis-facility-medicare-cost-report-data-1996", "renal-dialysis-facility-medicare-cost-report-data-2001", "health-resources-county-comparison", "home-health-agency-medicare-cost-report-data", "omha-appeals-listed-state", "renal-dialysis-facility-medicare-cost-report-data", "hospital-medicare-cost-report-data-fy1995", "2008-basic-stand-alone-hospice", "find-shortage-areas-hpsas-eligible", "part-national-summary-data-file-cy2004", "departmental-appeals-board-decisions", . . .

You can alternatively get the complete dataset metadata record with each response using http://hub.healthdata.gov/api/search/dataset?all_fields=1&rows=500. The fields are described in more detail below.

Searching the Data Catalog

The data catalog can be searched using URLs such as:

http://hub.healthdata.gov/api/search/dataset?q=medicare&start=0&rows=20

Use the q parameter to specify the search term. Note that the results are paged. Use start and rows to specify the page of results to load. See the CKAN search API documentation for details. The response for the above search is:

{"count": 164,
 "results":
  ["medicare-enrollment-dashboard", "medicare-tools-downloadable",
   "medicare-appeals-council-decisions", "medicare-appeals-council-decisions-1",
   "medicare-medicaid-statistical", "medicare-geographic-variation", "data-compendium",
   "chronic-conditions-chart-book", "helpful-contacts", "plans-quality-compare",
   "2008-chronic-conditions", "2008-basic-stand-alone-hospice",
   "2008-basic-stand-alone-durable", "2008-basic-stand-alone-prescription",
   "active-project-reports", "2008-basic-stand-alone-home",
   "2008-basic-stand-alone-skilled", "2008-basic-stand-alone-carrier", "claims-listed-state",
   "omha-appeals-listed-state"]
}

The primary fields (see below) support filtering. Use author=___ to filter the results by agency. For instance http://hub.healthdata.gov/api/search/dataset?author=Centers%20for%20Medicare%20%26%20Medicaid%20Services returns only datasets submitted by the Centers for Medicare & Medicaid Services:

{"count": 149,
 "results":
  ["2008-basic-stand-alone-carrier", "2008-basic-stand-alone-durable",
  "2008-basic-stand-alone-home", "2008-basic-stand-alone-hospice",
  "2008-basic-stand-alone-inpatient", "2008-basic-stand-alone-outpatient",
  "2008-basic-stand-alone-prescription", "2008-basic-stand-alone-skilled",
  "2008-chronic-conditions", "active-project-reports"]
}

You can also search for recently revised entries using URLs such as http://hub.healthdata.gov/api/search/revision?since_time=2012-05-05. The result is a list of revision GUIDs. You can find the dataset GUID from a revision GUID by appending the revision GUID to http://hub.healthdata.gov/api/2/rest/revision/, such as http://hub.healthdata.gov/api/2/rest/revision/b1dae0c1-10d6-4c4d-8f2b-e9eb46d59d7d, which gives this output:

{
 "id": "b1dae0c1-10d6-4c4d-8f2b-e9eb46d59d7d",
 "timestamp": "2012-05-30T22:16:35.228513",
 ...
 "packages": ["e5784720-a9a5-407e-bc36-84420289f1a9"],
 "groups": []
}

The dataset GUID is the GUID in the packages element (b7de8bdd-2edc-4713-888d-d6cb87c7196b). You can plug that into the dataset details API explained next.

Accessing Individual Datasets

Dataset metadata is available in machine-readable form in JSON, RDF/XML, and Notation 3.

JSON

To access a particular dataset in JSON, append the dataset GUID to http://hub.healthdata.gov/api/2/rest/dataset/. The response is a JSON object containing information about the dataset. For instance the URL http://hub.healthdata.gov/api/2/rest/dataset/e5784720-a9a5-407e-bc36-84420289f1a9 gives:

{
 "id": "e5784720-a9a5-407e-bc36-84420289f1a9",
 "metadata_created": "2012-05-30T22:16:35.228513",
 "metadata_modified": "2012-05-30T22:16:35.228513",
 "author": "Centers for Medicare & Medicaid Services",
 "tags": ["claims", "enrollment", "expenditures", "inpatient", "managed care",
    "medicaid", "prescription drug"],
 "name": "validation-reports",
 "notes_rendered": "<p>Medicaid Analytic eXtract (MAX) Validation Reports ...",
 "url": "http://www.cms.gov/MedicaidDataSourcesGenInfo/MVR/list.asp",
 "notes": "Medicaid Analytic eXtract (MAX) Validation Reports These documents contain ...",
 "title": "MAX Validation Reports",
 "extras": {
  "Unit of Analysis": "Person",
  "hd2-workflow-id": "753",
  "Agency": "Department of Health & Human Services",
  "Geographic Granularity": "State",
  "Technical Documentation": "http://www.cms.gov/MedicaidDataSourcesGenInfo/MVR/list.asp",
  "Collection Frequency": "Annually",
  "Agency Program URL": "http://www.cms.gov/MedicaidDataSourcesGenInfo/MVR/",
  "Date Updated": "2011-10-19",
  "Date Released": "2003-01-01",
  "author_id": "http://healthdata.gov/id/agency/cms",
  "Subject Area 1": "Medicaid",
  "Geographic Scope": "State"
 },
 "revision_id": "b1dae0c1-10d6-4c4d-8f2b-e9eb46d59d7d"
}

To find the URL for a dataset, you can also look for the link in the “Metadata API” field on the dataset page on www.healthdata.gov.

CKAN has three types of fields: primary fields, “extras” (general metadata), and “resources” (downloadable files). All but the primary fields are optional. Field definitions are documented at the end of this page.

RDF XML and Notation 3 (N3)

You can also access the dataset metadata in RDF, in either XML or Notation 3 format. The URL to these resources can be made by concatenating http://hub.healthdata.gov/dataset/, the dataset GUID or name, and either “.rdf” or “.n3”. (It is the public page for the dataset on our CKAN site plus the file extension. Alternatively you can set the HTTP Accept header to application/rdf+xml or text/n3 on the public page URL.)

Taking the same dataset as above, the RDF metadata can be accessed at http://hub.healthdata.gov/dataset/e5784720-a9a5-407e-bc36-84420289f1a9.rdf. We use Dublin Core, DCAT, and other vocabularies as appropriate.

You can also find the URL in the Metadata API field on www.healthdata.gov's dataset pages.

JSON Schema

The JSON output for datasets uses the following schema:

Primary Fields

field	type	description
id	GUID	The unique identifier for the dataset in the HealthData.gov API.
title	plain text	The display name for the dataset.
notes	plain text	The description of the dataset.
notes_rendered	HTML text	The description of the dataset rendered in HTML using Markdown.
author	plain text	The name of the federal agency that submitted the dataset to HealthData.gov.
url	url	The URL to the home page for the dataset, which may link to downloadable files.
tags	array of strings	Tags associated with the dataset.

Extras Fields

field	type	description
author_id	uri	A URI uniquely identifying the agency submitting the data. The URI is in the http://healthdata.gov/id/agency space and while it does not currently resolve to a resource it can be used as a canonical identifier for the agency.
Group Name	plain text	A display name shared across datasets that are related.
Agency	plain text	The name of the federal department submitting the data. Generally “Health and Human Services.”
Subject Area 1	string	A subject area. Subjects come from a fixed vocabulary, currently: Administrative, Biomedical Research, Children's Health, Epidemiology, Health Care Cost, Health Care Providers, Medicaid, Medicare, Other, Population Statistics, Quality Measurement, Safety, Treatments.
Subject Area 2	string	A subject area. See above.
Subject Area 3	string	A subject area. See above.
Date Released	date	The date the dataset was first made available to the public (possibly before it was posted on HealthData.gov). Format: YYYY-MM-DD.
Date Updated	date	The date the dataset was last changed, i.e. the last change to the data itself and not necessarily the metadata record. Format: YYYY-MM-DD.
Agency Program URL	url	The URL of the agency program responsible for the data.
Collection Frequency	string	The frequency with which the data was collected, which is sometimes different from the frequency at which the data is published. Possible values are Annually, Semi-Annually, Quarterly, Monthly, Weekly, Daily.
Coverage Period Start	date	The start of the coverage period, i.e. the date range that the data pertains to. Format: YYYY-MM-DD.
Coverage Period End	date	The end of the coverage period, i.e. the date range that the data pertains to. If the coverage period end date is omitted, the dataset may cover the period from the start date to the present time. Format: YYYY-MM-DD.
Coverage Period Fiscal Year Start	year	For coverage periods that are based on fiscal years rather than calendar years, the starting fiscal year of the coverage period. Format: YYYY.
Coverage Period Fiscal Year End	year	For coverage periods that are based on fiscal years rather than calendar years, the ending fiscal year of the coverage period. If the coverage period end fiscal year is omitted, the dataset may cover the period from the starting fiscal year to the present time. Format: YYYY.
Unit of Analysis	plain text	The unit of analysis, i.e. the object of study. Examples are “recalled food items” and “renal dialysis facility”.
Geographic Scope	plain text	The geographic region covered by the dataset. If omitted, the dataset is typically national in scope.
Geographic Granularity	string	The granularity of the geographic coverage. Possible values are Latitude/Longitude Coordinate, Street Address, Census Tract, City, MSA (metropolitian statistical area), ZIP Code, County, State, Sub-National Region, and Country.
Technical Documentation	url	The URL to technical documentation for the dataset.
Data Dictionary	url	The URL to a data dictionary for the dataset.
Collection Instrument	url	The URL to information about the data collection instrument.
License Agreement Required	integer	Whether a license agreement must be agreed to before using the data (1 if yes, 0 if no, omitted if not known).
License Agreement	url	The URL to a license agreement that must be agreed to before using the data.

Resource Fields

A dataset may have one or more resource records, each of which represents a downloadable file or a query tool interface. Multiple files are often specified when the dataset is available in multiple formats. Each resource record uses these fields:

field	type	description
url	url	The URL of the downloadable file or the query interface.
name	plain text	The display name of the media format, e.g. CSV. Currently the same as the format attribute.
format	string	The media format. Possible values are API, CSV, ESRI, Feed, KML, Map, Query Tool, RDF, Text, Widget, XLS, XML.

HealthData.gov

Home

About Us

Data

Blog

Q & A

Ideas

Developers

Challenges

Source Code

Catalog API

Accessing the Complete Data Catalog Listing

Searching the Data Catalog

Accessing Individual Datasets

JSON

RDF XML and Notation 3 (N3)

JSON Schema

Primary Fields

Extras Fields

Resource Fields

Search form

Catalog API

Accessing the Complete Data Catalog Listing

Searching the Data Catalog

Accessing Individual Datasets

JSON

RDF XML and Notation 3 (N3)

JSON Schema

Primary Fields

Extras Fields

Resource Fields