Catalog API

The API is used to provide software developers with programmatic access to the contents of our data catalog. The API can be used to find recently added datasets, to search the catalog, to download the contents of the catalog for analysis, or to build a new data catalog tool. uses CKAN for its API. We are running CKAN version 1.7. Documentation for CKAN’s API can be found at, and CKAN’s support for RDF is described at

Questions about the data catalog API can be sent to the HealthDataGov Google Group.

The base URL for the API is

In this article:

Accessing the Complete Data Catalog Listing

A JSON listing of every dataset in the catalog can be accessed at

The response is a JSON list of dataset IDs (GUIDs). It looks like this:

["0056861d-28cd-4f8d-97b3-6205517637c3", "00aada73-a456-4547-ac5a-e5ffdc6b4847", "02588273-41d6-4ae5-a90a-1e336d0f129e", "03edc320-4eb7-4089-b66a-a54760a44b28", "0477da33-0795-4669-bba5-cc494604b022", "05457387-7ab6-4c1a-9dba-b1e5bdd5f2ad", "05b7319c-20a1-43f5-a01a-3847933d4ccf", "0660d0f4-b600-4d1e-a0be-228fc2857a12", "067da109-762f-4417-acea-521f227aea42", "088c4f1b-b266-40e8-a12b-cda2b97670eb", "08d78f4d-40c0-4691-948c-a4f17df65e59", "09bda462-ef6b-43ee-955f-b3e40d288eec", . . .

You can also get a list of slugs (i.e. the name that goes into the URL for each dataset) rather than GUIDs using The response is a JSON list of strings:

["dietary-supplements-labels", "nursing-home-compare", "child-growth-charts", "home-health-compare", "genetics-home-reference", "renal-dialysis-facility-medicare-cost-report-data-1996", "renal-dialysis-facility-medicare-cost-report-data-2001", "health-resources-county-comparison", "home-health-agency-medicare-cost-report-data", "omha-appeals-listed-state", "renal-dialysis-facility-medicare-cost-report-data", "hospital-medicare-cost-report-data-fy1995", "2008-basic-stand-alone-hospice", "find-shortage-areas-hpsas-eligible", "part-national-summary-data-file-cy2004", "departmental-appeals-board-decisions", . . .

You can alternatively get the complete dataset metadata record with each response using The fields are described in more detail below.

Searching the Data Catalog

The data catalog can be searched using URLs such as:

Use the q parameter to specify the search term. Note that the results are paged. Use start and rows to specify the page of results to load. See the CKAN search API documentation for details. The response for the above search is:

{"count": 164,
  ["medicare-enrollment-dashboard", "medicare-tools-downloadable",
   "medicare-appeals-council-decisions", "medicare-appeals-council-decisions-1",
   "medicare-medicaid-statistical", "medicare-geographic-variation", "data-compendium",
   "chronic-conditions-chart-book", "helpful-contacts", "plans-quality-compare",
   "2008-chronic-conditions", "2008-basic-stand-alone-hospice",
   "2008-basic-stand-alone-durable", "2008-basic-stand-alone-prescription",
   "active-project-reports", "2008-basic-stand-alone-home",
   "2008-basic-stand-alone-skilled", "2008-basic-stand-alone-carrier", "claims-listed-state",

The primary fields (see below) support filtering. Use author=___ to filter the results by agency. For instance returns only datasets submitted by the Centers for Medicare & Medicaid Services:

{"count": 149,
  ["2008-basic-stand-alone-carrier", "2008-basic-stand-alone-durable",
  "2008-basic-stand-alone-home", "2008-basic-stand-alone-hospice",
  "2008-basic-stand-alone-inpatient", "2008-basic-stand-alone-outpatient",
  "2008-basic-stand-alone-prescription", "2008-basic-stand-alone-skilled",
  "2008-chronic-conditions", "active-project-reports"]

You can also search for recently revised entries using URLs such as The result is a list of revision GUIDs. You can find the dataset GUID from a revision GUID by appending the revision GUID to, such as, which gives this output:

 "id": "b1dae0c1-10d6-4c4d-8f2b-e9eb46d59d7d",
 "timestamp": "2012-05-30T22:16:35.228513",
 "packages": ["e5784720-a9a5-407e-bc36-84420289f1a9"],
 "groups": []

The dataset GUID is the GUID in the packages element (b7de8bdd-2edc-4713-888d-d6cb87c7196b). You can plug that into the dataset details API explained next.

Accessing Individual Datasets

Dataset metadata is available in machine-readable form in JSON, RDF/XML, and Notation 3.


To access a particular dataset in JSON, append the dataset GUID to The response is a JSON object containing information about the dataset. For instance the URL gives:

 "id": "e5784720-a9a5-407e-bc36-84420289f1a9",
 "metadata_created": "2012-05-30T22:16:35.228513",
 "metadata_modified": "2012-05-30T22:16:35.228513",
 "author": "Centers for Medicare & Medicaid Services",
 "tags": ["claims", "enrollment", "expenditures", "inpatient", "managed care",
    "medicaid", "prescription drug"],
 "name": "validation-reports",
 "notes_rendered": "<p>Medicaid Analytic eXtract (MAX) Validation Reports ...",
 "url": "",
 "notes": "Medicaid Analytic eXtract (MAX) Validation Reports These documents contain ...",
 "title": "MAX Validation Reports",
 "extras": {
  "Unit of Analysis": "Person",
  "hd2-workflow-id": "753",
  "Agency": "Department of Health & Human Services",
  "Geographic Granularity": "State",
  "Technical Documentation": "",
  "Collection Frequency": "Annually",
  "Agency Program URL": "",
  "Date Updated": "2011-10-19",
  "Date Released": "2003-01-01",
  "author_id": "",
  "Subject Area 1": "Medicaid",
  "Geographic Scope": "State"
 "revision_id": "b1dae0c1-10d6-4c4d-8f2b-e9eb46d59d7d"

To find the URL for a dataset, you can also look for the link in the “Metadata API” field on the dataset page on

CKAN has three types of fields: primary fields, “extras” (general metadata), and “resources” (downloadable files). All but the primary fields are optional. Field definitions are documented at the end of this page.

RDF XML and Notation 3 (N3)

You can also access the dataset metadata in RDF, in either XML or Notation 3 format. The URL to these resources can be made by concatenating, the dataset GUID or name, and either “.rdf” or “.n3”. (It is the public page for the dataset on our CKAN site plus the file extension. Alternatively you can set the HTTP Accept header to application/rdf+xml or text/n3 on the public page URL.)

Taking the same dataset as above, the RDF metadata can be accessed at We use Dublin Core, DCAT, and other vocabularies as appropriate.

You can also find the URL in the Metadata API field on's dataset pages.

JSON Schema

The JSON output for datasets uses the following schema:

Primary Fields

field type description



The unique identifier for the dataset in the API.


plain text

The display name for the dataset.


plain text

The description of the dataset.


HTML text

The description of the dataset rendered in HTML using Markdown.


plain text

The name of the federal agency that submitted the dataset to



The URL to the home page for the dataset, which may link to downloadable files.


array of strings

Tags associated with the dataset.

Extras Fields

field type description



A URI uniquely identifying the agency submitting the data. The URI is in the space and while it does not currently resolve to a resource it can be used as a canonical identifier for the agency.

Group Name

plain text

A display name shared across datasets that are related.


plain text

The name of the federal department submitting the data. Generally “Health and Human Services.”

Subject Area 1


A subject area. Subjects come from a fixed vocabulary, currently: Administrative, Biomedical Research, Children's Health, Epidemiology, Health Care Cost, Health Care Providers, Medicaid, Medicare, Other, Population Statistics, Quality Measurement, Safety, Treatments.

Subject Area 2


A subject area. See above.

Subject Area 3


A subject area. See above.

Date Released


The date the dataset was first made available to the public (possibly before it was posted on Format: YYYY-MM-DD.

Date Updated


The date the dataset was last changed, i.e. the last change to the data itself and not necessarily the metadata record. Format: YYYY-MM-DD.

Agency Program URL


The URL of the agency program responsible for the data.

Collection Frequency


The frequency with which the data was collected, which is sometimes different from the frequency at which the data is published. Possible values are Annually, Semi-Annually, Quarterly, Monthly, Weekly, Daily.

Coverage Period Start


The start of the coverage period, i.e. the date range that the data pertains to. Format: YYYY-MM-DD.

Coverage Period End


The end of the coverage period, i.e. the date range that the data pertains to. If the coverage period end date is omitted, the dataset may cover the period from the start date to the present time. Format: YYYY-MM-DD.

Coverage Period Fiscal Year Start


For coverage periods that are based on fiscal years rather than calendar years, the starting fiscal year of the coverage period. Format: YYYY.

Coverage Period Fiscal Year End


For coverage periods that are based on fiscal years rather than calendar years, the ending fiscal year of the coverage period. If the coverage period end fiscal year is omitted, the dataset may cover the period from the starting fiscal year to the present time. Format: YYYY.

Unit of Analysis

plain text

The unit of analysis, i.e. the object of study. Examples are “recalled food items” and “renal dialysis facility”.

Geographic Scope

plain text

The geographic region covered by the dataset. If omitted, the dataset is typically national in scope.

Geographic Granularity


The granularity of the geographic coverage. Possible values are Latitude/Longitude Coordinate, Street Address, Census Tract, City, MSA (metropolitian statistical area), ZIP Code, County, State, Sub-National Region, and Country.

Technical Documentation


The URL to technical documentation for the dataset.

Data Dictionary


The URL to a data dictionary for the dataset.

Collection Instrument


The URL to information about the data collection instrument.

License Agreement Required


Whether a license agreement must be agreed to before using the data (1 if yes, 0 if no, omitted if not known).

License Agreement


The URL to a license agreement that must be agreed to before using the data.

Resource Fields

A dataset may have one or more resource records, each of which represents a downloadable file or a query tool interface. Multiple files are often specified when the dataset is available in multiple formats. Each resource record uses these fields:

field type description



The URL of the downloadable file or the query interface.


plain text

The display name of the media format, e.g. CSV. Currently the same as the format attribute.



The media format. Possible values are API, CSV, ESRI, Feed, KML, Map, Query Tool, RDF, Text, Widget, XLS, XML.