Genomic Science Program. Click to return to home page.
Department of Energy Office of Science. Click to visit main DOE SC site.

Why create a Systems Biology Knowledgebase?

Driven by the ever-increasing wealth of data resulting from new generations of genomics-based technologies, systems biology is demanding a computational environment for comparing and integrating large, heterogeneous datasets and using this information to develop predictive models. In addition to accessing and managing this data, other aspects of Genomic Science research compel development of the DOE Systems Biology Knowledgebase (KBase) to facilitate the open sharing of this program’s data, information, and analytical software. These aspects are described below.

Integrating Data from Diverse Biological Systems Relevant to DOE Missions. Living systems possessing capabilities important to addressing DOE missions in energy production, carbon cycling, and environmental remediation represent a vast range of biological, environmental, and biochemical diversity not observed in more traditional research targeting model organisms. Biological systems for DOE missions include microbial communities from the deep subsurface, thermal springs, cow rumen, guts of wood-eating insects, and other environments; root fungi and bacteria that influence carbon accumulation in plants; and trees and grasses for bioenergy feedstocks. The heterogeneous mix of data emanating from these investigations spans diverse environmental conditions and wide-ranging scales of time (nanoseconds to decades) and space (nanometers to kilometers).

Capturing the Environmental Context of DOE Biological Research. Not only are the biological systems diverse, but the environments with which these organisms intimately interact are even more varied. Understanding the environmental conditions that influence an organism's biological function requires knowing and describing the specific microenvironment immediately surrounding that organism; average conditions over larger scales of space and time are not sufficient. To enable comparisons among data and experimental results, each dataset from an environmental sample must be accompanied by metadata that provide contextual information. Having a common resource such as KBase for collectively gathering data and experimental results will stimulate a community- wide effort for establishing guidelines and standards needed to adequately capture environmental metadata.

Accessing the Torrent of Data from High-Throughput Analyses. The large-scale genomic methods and high-throughput instrumentation used to study microbial and plant systems are generating enormous amounts of data and information, much of which is archived in individual laboratories that often are inaccessible to the larger research community or impossible to search collectively. The rate of data production is rapidly outpacing analysis, and much of the information already generated could be more fully utilized to maximize biological discovery and reveal higher-level insights and trends occurring across research results from multiple labs.

Sharing Data and Information Across Large, Distributed Research Collaborations. The biological challenges addressed by DOE are complex and often require large multidisciplinary teams of researchers that approach similar problems from different directions to accelerate scientific progress. Several large research collaborations supported by the DOE Genomic Science program are examples of this team approach to biology that requires the well-coordinated sharing and use of large datasets among scientists in diverse locations.

Benefits of the DOE Systems Biology Knowledgebase

When fully deployed, KBase will assume a new role for biological data management systems from one traditionally perceived as bioinformatics support of mainstream experimental research to one in which computational analysis, modeling, and simulation capabilities drive a new era of in silico experimentation and hypothesis testing. As a unified framework linking otherwise disparate systems, KBase will be an important tool to accelerate biological discovery for DOE missions and provide insights and benefits that can ultimately serve numerous application areas.

To support the conceptual design and implementation planning necessary to develop KBase, the Genomic Science program in 2010 completed the DOE Systems Biology Knowledgebase R&D project. This effort, carried out with funds provided by the American Recovery and Reinvestment Act, included a series of planning workshops that brought together the systems biology and computer science communities as well as five pilot projects aimed at identifying computational problems and solutions in the context of KBase. Together, these workshops and pilot projects informed the scientific objectives, software requirements, and design approaches detailed in the DOE Systems Biology Knowledgebase Implementation Plan, the final product of the R&D project. Additional output from the community workshops is available from the DOE Systems Biology Knowledgebase Wiki site.

Democratizing Access to Experimental Data and Computational Capabilities. Biological research efforts (large and small) would gain access to dramatically more data and robust analytical and modeling tools that may not be available to smaller, individual projects. Scientists could integrate knowledge from their own research and also draw upon data generated from the entire research community. The open community science facilitated by KBase thus would advance systems biology research and accelerate the pace toward predictive understanding (see figure below).

A Faster Track to Predictive Biology. Knowledgebase-enabled integration of experimental data with models will accelerate the scientific advancements needed to improve inferences and achieve predictive biology. Building on the wealth of data being generated across many laboratories, KBase will put biology on a new trajectory within the next decade.

Leveraging New Biological Insights to Advance Multiple Applications. The power of the systems approach to biology is rooted in the fact thatat the molecular level all life is based on similar sets of fundamental processes and principles. Knowledge gained about one biological system, therefore, can advance the understanding of other systems when information is readily available in an integrated and transparent format. For example, the discovery of new regulatory pathways that influence plant biomass accumulation in bioenergy crops could also shed light on how these pathways affect carbon cycling in terrestrial vegetation or impact the productivity of agricultural crops.

Establishing the Foundation for Predictive Modeling of Biological Systems. For the first time, genomic sequence will be directly linked to the many downstream, multimodal analytical measurements of biochemical, cellular, and organismal activities. Only by developing an open infrastructure for mining, comparing, and interconnecting large biological and environmental datasets will we begin to build the comprehensive understanding needed to predict how the complex interplay between genomes and environments controls the behavior of biological systems.

For a more detailed rationale of why a Systems Biology Knowledgebase is necessary, see the

  • DOE Systems Biology Knowledgebase for a New Era in Biology: A Genomics:GTL Report from the May 2008 Workshop (published March 2009)
  • Now Featuring

    Systems Biology Knowledgebase Projects

    Projects Underpinning Knowledgebase Development [08/11]


    Systems Biology Knowledgebase Implementation Plan

    DOE Systems Biology Knowledgebase
    Implementation Plan [09/10]

  • Overview

  • DOE Systems Biology Knowledgebase for a New Era in Biology [03/09]


    News

    Research

    Genomic Science-Related BER Research Highlights