Domain Challenge 1: Metadata

gthomas's picture
By George Thomas | On Sat, 06/02/2012 - 12:13pm





Overview

The Metadata domain challenge requests the application of existing voluntary consensus standards for metadata common to all open government data, and invites new designs for health domain specific metadata to classify datasets in our growing catalog, creating entities, attributes and relations that form the foundations for better discovery, integration and liquidity. This $35K round one HDP domain challenge runs for four months and is open to submitters from June 5 through October 2 2012 on challenge.gov. First place wins $20K, second $10K and third $5K. Metadata tags that have HTTP URI's are globally unique on the world-wide network of computers. Since their specifications are RESTfully dereferenceable, you can link to them. Strong identity, ease of resolution, and linkability that we get from HTTP URI's provides a powerful mechanism to disambiguate and integrate decentralized data from disparate publishers. Help us realize the Web of Health Data!

Details

The W3C has a number of standard vocabulary recommendations for Linked Data publishers, defining cross domain semantic metadata for open government data, including concept schemes, provenance, statistics, organizations, people, data catalogs and their holdings, linked data assets, and geospatial data, and more, in addition to the foundational standards of the Web of Data (such as HTTP, XML, RDF and various serializations, SPARQL, OWL, etc). Other voluntary consensus standards development organizations are also making valuable contributions to open standards for Linked Data publishers, such as the emerging GeoSPARQL standard from the Open Geospatial Consortium.

In some cases, the entities and relations in these vocabulary standards are expressed using UML class diagrams as an abstract syntax, then automatically translated into various concrete syntaxes like XML Schemas and RDF Schemas, which also makes many of the standards from the Object Management Group easy to express as RDF Schemas, such as those that describe business motivation (including but not limited to vision, mission, strategies, tactics, goals, objectives), service orientation, process automation, systems integration, and other government specific standards. Often times there exist domain specific standards organizations, with standards products that express domain specific entities and relations, such as those for the health or environmental sectors. The Data.gov PMO recently stood up a site to collect these standards when expressed as RDF Schemas for use by the growing community of Government Linked Data publishers, which includes HHS/CMS, EPA, DOE/NREL, USDA, and the Library of Congress.

The challenge winner will demonstrate the application of voluntary consensus and de facto cross domain and domain specific standards to as many of the HHS datasets available on HealthData.gov as possible. There are two objectives:

  1. Apply existing RDF Schema and OWL ontology standards (such as those listed above), and adapt other voluntary consensus and defacto standards for expressing cross domain metadata that is common to all open government data.
  2. Design new RDF Schemas (and RDFS++) for HHS domain specific metadata based on the data made available on HealthData.gov where no RDF Schema is otherwise given or available.

When designing new metadata expressed as RDF Schemas, designers should:

  • Leverage existing data dictionaries expressed as natural language in the creation of new conceptual schemas, as provided by domain authorities;
  • Observe best practices for URI schemes, consistent with prior health.data.gov work (such as the Clinical Quality Linked Data release from HDI 2011); and
  • Organize related concepts into small, compose-able component vocabularies.

    Turtle syntax for RDFS and RDF is preferred. The contributed code will be given an open source license and managed by HHS on github.com, with copyright and attribution to the developer(s) as appropriate, and will ideally be used to populate vocab.data.gov.

    Timeline

    • Submission Period Begins: June 5, 2012
    • Submission Period for Entries Ends: October 2, 2012
    • Evaluation Process for Entries Begins: October 5, 2012
    • Evaluation Process for Entries Ends: October 19, 2012
    • Winners notified: October 26, 2012
    • Winners Announced: Industry conference TBD

      $35,000 in Prizes!

      1. First Place: $20,000 plus conference exhibition opportunity
      2. Second Place: $10,000
      3. Third Place: $5,000

      Submission Requirements

      In order for an entry to be eligible to win this Challenge, it must meet the following requirements:

      1. No ONC logo – The app must not use ONC’s logo or official seal in the Submission, and must not claim endorsement.
      2. Functionality/Accuracy – A Submission may be disqualified if the software application fails to function as expressed in the description provided by the user, or if the software application provides inaccurate or incomplete information.
      3. Security – Submissions must be free of malware. Contestant agrees that the ONC may conduct testing on the app to determine whether malware or other security threats may be present. ONC may disqualify the app if, in ONC’s judgment, the app may damage government or others’ equipment or operating environment.

      Review Criteria

      The ONC review panel will make selections based upon the following criteria:

      • Metadata: the number of cross domain and domain specific voluntary consensus and defacto standard schemas, vocabularies or ontologies that are (re)used or designed and applied to HHS data on HealthData.gov
      • Data: the number of datasets that the standards based cross domain metadata and schema designed domain specific data is applied to
      • Linked Data: the solution should use best practices for the expression of metadata definitions and instance data identification, leveraging the relevant open standards, including but not limited to foundational standards (RDF, RDFS, SPARQL, OWL), and other defacto vocabularies and ontologies such as those listed here as required, with the expectation that existing standards will be reused to the fullest extent possible
      • Components: leveraging software components that are already a part of the HDP is preferable, but other open source solutions may be used
      • Tools: use of automation and round trip engineering that enable multiple concrete syntax realization from abstract syntax of cross domain and/or domain specific metadata is desirable, with no expectation that the tools must be open source or otherwise contributed to HDP as part of this challenge submission. Only newly designed domain specific RDF Schemas, their composition with cross domain standards based RDF Schemas, and their application to various datasets are expected to be submitted for this challenge. Tool functionality may be highlighted to explain implementations as desired.
      • Best practices: where any new schemas and software code is created, they should exemplify Linked Data design best practices and known software patterns, or otherwise establish them.
      • Documentation: articulation of design using well known architecture artifacts
      • Engagement: willingness to participate in the community as a maintainer/committer after award

        Organizational Partners

        The General Services Administration, the World Wide Web Consortium, the Open Geospatial Consortium, and the Object Management Group are all organizational partners for this i2 challenge. Discuss with HHS on challenge.gov

         

        Subject Area: 
        Rate this Blog Post: 
        Average: 3.3 (4 votes)

        Add new comment