How do I use Data.gov?

Data.gov includes searchable catalogs that provide access to "raw" datasets and various tools. In the "raw" data catalog, you may access data in XML, Text/CSV, KML/KMZ, Feeds, XLS, or ESRI Shapefile formats. To better understand these file formats and other information about Data.gov, please view the Glossary of Terms. The catalog of tools links you to sites that include data mining and extraction tools and widgets. Datasets and tools available on Data.gov are searchable by category, agency, keyword, and/or data format.

Once in the catalog, click on the "name" (i.e, the name of the dataset or tool of interest) and you will be taken to a page with more details and metadata on that specific dataset or tool. Please note that by accessing datasets or tools offered on Data.gov, you agree to the Data Policy, which you should read before accessing any data.

If there are additional datasets that you would like to see included on this site, please suggest more datasets here

What is the purpose of the Dialogue link?

The "DIALOGUE" link guides visitors to a collaborative site to enable two-way conversation with the public throughout the evolution of Data.gov, e.g., posting of draft documents for the public's review, obtaining feedback to specific posted questions, and obtaining suggestions in general. Documents will be posted for review, for comment, and ranking of comments through an up or down arrow icon.   

What is data?

Data are values or sets of values representing a specific concept or concepts. Data become "information" when analyzed and possibly combined with other data in order to extract meaning, and to provide context. The meaning of data can vary according to its context (Source: Federal Enterprise Architecture Data Reference Model).

What is the difference between the three catalogs on Data.gov?

  • "Raw" Data Catalog: Data.gov features a catalog with instant view/download of platform-independent, machine readable data (e.g., XML, CSV, KMZ/KML, or shape file formats), as well as a link to a metadata page specific to the respective dataset. The metadata page will have additional links to authoritative source information from the sponsoring agency's website including any pertinent agency technical documentation regarding the dataset.
  • Tools Catalog: Data.gov features a tool catalog to provide the public with simple, application-driven access to Federal data with hyperlinks. This catalog features widgets, data mining and extraction tools, applications, and other services.
  • Widgets: are interactive virtual tools that provide single-purpose services such as showing the user the latest news, the current weather, the time, a calendar, a dictionary, a map program, a calculator, desktop notes, photo viewers, or even a language translator, among other things.
  • Data Mining and Extraction Tools: offer applications that allow users to either produce maps, tables, or charts of the subset of data that are specific to the user's interests or to build their own dataset extracted from a data source. Note that many of these sites also offer downloadable data.The data mining tools that provide access to data that are not available for the raw data catalog are of particular interest.
  • RSS Feeds: RSS (also known as Really Simple Syndication) is a format of data feed some agencies use to provide frequently updated content. When an agency feed is syndicated it means an individual can subscribe to it and get automatically updated content from the aggregator of the particular feed.
  • Geodata Catalog: Data.gov features a geodata catalog that includes trusted, authoritative, Federal geospatial. This catalog includes links to download the datasets and a metadata page with details on the datasets, as well as links to more detailed Federal Geographic Data Committee (FGDC) metadata information.

How were the datasets in Data.gov selected?

Data.gov was initially launched in May 2009 with a limited number of Federal datasets and tools. These initial entries in the catalog were nominated by Executive Branch agencies as examples of raw datasets that already enjoy a high degree of consensus around definitions, are in formats that are readily usable, include the availability of metadata, and provide support for machine-to-machine data transfer. In addition to raw datasets, some agencies offered data extraction and mining tools, and widgets. Shortly after its launch, Data.gov was integrated with federal geospatial datasets that can be accessed using the Data.gov geodata catalog. The Open Government Directive required agencies to register at least three new high-value datasets on Data.gov by January 22, 2010. Many datasets provided prior to and to meet the deadline as well as continuing submissions are high-value in accordance with Open Government Directive provisions.

What is the plan for expanding the number of datasets made available through Data.gov?

Additional datasets and tools will be added to Data.gov regularly as a result of agency submissions and specific requests made by users of this site; the content, structure, and scope of the site will evolve over time and the catalogs will continue to grow as datasets are added.

Who developed Data.gov?

Data.gov was developed by the Federal CIO Council as an interagency Federal initiative and is hosted by the General Services Administration.

What standards were used to develop the metadata displayed on Data.gov?

Data.gov uses an adaptation of the Dublin Core metadata standard.

What are metadata?

Metadata is "data about data." Metadata include data associated with either an information system or an information object for purposes of description, administration, legal requirements, technical functionality, use and usage, and preservation (Source: Dublin Core Metadata Initiative (DCMI)).

Will my comments be read?

Emails that you send through the 'contact us' page of Data.gov will be sent to the relevant agency for review. You may also participate by rating the usefulness of the datasets currently available by voting at the metadata page for the respective dataset.   

What are some resources for viewing geospatial datasets?

The geospatial datasets available on Data.gov are provided in up to three open file formats: Keyhole Markup Language (KML), Compressed Keyhole Markup Language (KMZ) and ESRI Shapefile. These datasets are all viewable in many commercial and freely available applications. More information about Geographic Information System (GIS) software can be found by doing a web search.

What if I am having difficulty downloading a dataset from the catalog?

Some web browser configurations, particularly those that are designed for high security computing environments, can interfere with access to certain datasets from the catalog. This is most commonly related to government websites that use security certificates, and end user browsers that are not configured to recognize those certificates as being authoritative. If you are having difficulty downloading one or more datasets from the Data.gov catalog, please contact your local IT support staff to determine whether browser configuration issues can be addressed for your workstation.

How does Data.gov maintain accessibility?

The commitment to accessibility for all is reflected on this site in our efforts to ensure all functionality and all content are accessible to all Data.gov users. The Data.gov site is routinely tested for compliance with Section 508 of the Rehabilitation Act using a technical standards check-list, in-depth testing with screen readers, policy experts, and person with disabilities. For more information on Section 508 technical standards please visit www.Section508.gov.

In addition, the Data.gov site is also routinely reviewed for alignment with the latest Web Accessibility Initiative Guidelines for W3C. The Web Accessibility Initiative Guidelines at www.W3.org/WAI/ define how browsers, media players, and other "user agents" support people with disabilities and work assistive technologies.

Images on the site contain 'alt tags,' which aid users who listen to the content of the site by using a screen reader, rather than reading the site. Likewise, a 'skip to' link provides these users with a method for bypassing the header and going directly to the main content each time a page is accessed. Text transcripts accompany audio clips, and closed captioning is available on videos.

Users can get information regarding the accessibility of Adobe Portable Document Format (PDF) files from the Access Adobe website.

The Data.gov website is being updated frequently to make it as accessible as possible. If you use assistive technology (such as a screen reader, eye tracking device, voice recognition software, etc.) and have difficulty accessing information on Data.gov, please contact us and provide the URL (web address) of the material you tried to access, the problem you experienced, and your contact information. A Data.gov team member will contact you and attempt to provide the information you're seeking.

What is the "Developers Corner" section of Data.gov used for?

The "Developers Corner" is a place where an agency, organization, or individual can share their mashups they have created using datasets from the catalog with the Data.gov community. Since the launch of Data.gov we have encouraged programmers and developers to review the datasets in the catalog and create web applications and or programs to further harness the value of the data.

What information is in the METRICS area?

Metrics reporting on the following areas: Federal Agency Participation as part of the Open Government Directive, Datasets published by Agency per month, Visitor Statistics, and User Suggested Dataset Statistics.

How did you determine which datasets are the highest rated?

Data.gov utilizes the Bayesian Rating (BR) to determine which datasets are the highest ranked. The Bayesian Rating uses the Bayesian Average. This is a mathematical term that calculates a rating of an item based on the "believability" of the votes. The greater the certainty b ased on the number of votes, the more the Bayesian rating approximates the plain, u nweighted rating. When there are very few votes, the Bayesian rating of a dataset will be closer to the average ratings of all datasets that were voted on.

Use this equation:

BR = ((AV * AR) + (V * R)) / (AV + V)

Legend:

  • AV: The average number of votes of all items that have number of votes > 0
  • AR: The average rating of each item (again, of those that have number of votes > 0)
  • V: The number of votes for this item
  • R: The rating of this item

Note: The AV (average number of votes) is used as the "fairness" weight in this formula. The higher this value, the more votes it takes to influence the Bayesian rating value.