
The emergence of systems biology as a research paradigm and approach for DOE missions has resulted in dramatic increases in data flow from a new generation of genomics-based technologies. The heterogeneous mix of data and information emanating from the Genomic Science program includes functional descriptions assigned to DNA sequence, molecular interactions, images of molecules or physical structures within a microbe or plant, and details about the environment in which these organisms live. The Genomic Science program's ultimate goal of achieving a predictive understanding of biological systems will require integrating and comparing this immense amount of data, which span diverse environmental conditions, spatial scales (nanometers to kilometers), and temporal scales (nanoseconds to decades). To address these data-intensive computing challenges and serve the research community, DOE is developing the Systems Biology Knowledgebase (Kbase). A knowledgebase is a cyberinfrastructure consisting of a collection of data, organizational methods, standards, analysis tools, and interfaces representing a dynamic body of knowledge. Kbase will support open community science by serving as a freely available computational environment for sharing and integrating diverse biological data types, accessing and developing software for data analysis, and providing resources for modeling and simulation. It will leverage community-wide capabilities, experimental results, and modeling efforts and bring together research products from many different projects and laboratories to create an extensible, comprehensive cyberinfrastructure focused on DOE scientific objectives related to microbes, plants, and metacommunities (complex communities of organisms). Several recently completed and ongoing projects are contributing to Kbase development [see figure above (click to enlarge) and sidebar, Knowledgebase Progress in 2010, at right].
The fully functional Kbase is envisioned not only to include storage, retrieval, management, and integration of systems biology data, but also to enable new knowledge acquisition and management through free and open access to data, analytical software, modeling tools, and information for the research community. The vision and justification for Kbase are described in detail in the Systems Biology Knowledgebase for a New Era in Biology workshop report (see also the DOE Systems Biology Knowledgebase brochure PDF). Envisioned capabilities include:
Learn more about the benefits of the DOE Systems Biology Knowledgebase on the Why Kbase? page.
The success of Kbase will rely largely on its ability to meet the dynamic information needs of different user communities and the willingness of these communities to support open sharing of data, science, and software (see figure above, The Community-Driven DOE Systems Biology Knowledgebase; click to expand). When research data and information are not publicly available to the scientific community, a corresponding price is paid in missed opportunities, barriers to innovation and collaboration, and lost productivity resulting from inadvertent repetition of similar work.
Leading the collaboration will be principal investigator Adam Arkin of Lawrence Berkeley National Laboratory (LBNL), with co-principal investigators Rick Stevens of Argonne National Laboratory (ANL), Robert Cottingham of Oak Ridge National Laboratory (ORNL), and Sergei Maslov of Brookhaven National Laboratory. Also participating as investigators in the multi-institutional program are Pamela Ronald of the University of California, Davis; Matthew DeJongh of Hope College in Michigan; Gary Olsen of the University of Illinois at Urbana-Champaign; Doreen Ware of the Cold Spring Harbor Laboratory; and Mark Gerstein of Yale University.
Previously, the DOE Office of Biological and Environmental Research (BER) initiated or completed several key research projects and activities underpinning development of the DOE Systems Biology Knowledgebase (Kbase).
First, with funding from the American Recovery and Reinvestment Act, a year-long R&D project was carried out to support the conceptual design and implementation planning necessary to develop Kbase. Completed in September 2010, this effort included a series of community planning workshops and five pilot projects. Together, these workshops and pilots informed the scientific objectives, software requirements, and design approaches detailed in the DOE Systems Biology Knowledgebase Implementation Plan, the final product of the R&D project.
Also in late 2010, BER awarded funding to 11 university-led projects for research to develop new computational biology and bioinformatic methods to enable Kbase. These awards were given in response to Funding Opportunity Announcement DE-FOA-0000143. (Learn more about these university projects and the completed Recovery Act pilots.)
Genomic Science-Related BER Research Highlights