Genomic Science Program. Click to return to home page.
Department of Energy Office of Science. Click to visit main DOE SC site.

University-Led Research Projects for Developing Methods to Enable KBase [2010]

In late 2010, BER awarded 11 university-led projects in response to Funding Opportunity Announcement DE-FOA-0000143: Computational Biology and Bioinformatic Methods to Enable a Systems Biology Knowledgebase. Under this funding call, applications were solicited for basic research in computational systems biology that both addressed DOE missions in energy and the environment and that supported the development of a DOE Systems Biology Knowledgebase. (KBase). Accepted proposals exhibited strong collaboration among experimental data generators, bioinformatics and computational biologists, and computer scientists in four areas:

  • “Omics” Data Integration. New computational methods are desired for integrating multiple types of data such as genomic, metagenomic, proteomic, metabolomic, transcriptomic, expression, and phenotypic. These methods include developing data standards, ontologies, and controlled vocabularies as well as assessing data quality. Also encouraged are methods that significantly improve data visualization and analysis, including new methods for complex Web interfaces and third-party tool development. Methods for analyzing across different data types are priorities.
  • Genomic Annotation. Also sought are new methods for computational gene annotation that include integrating data and information into gene functional assignments. New annotation methods are needed for capturing information such as cDNA, clustering and neighborhood gene analysis, expression and phenotypic data, protein folds and structures, and phylogenetic profiling data. Methods for estimating and embedding uncertainty and confidence levels in annotation assignments are priorities.
  • Integrated Pathway Reconstructions. Significant improvements are needed in methodologies to couple metabolic and regulatory pathways and integrate associated data and information. These improvements include new methods in correlational and iterative analysis that would dynamically link data to model development. New methods in dynamical pathway reconstruction for on-the-fly pathway analysis also are being encouraged. Improvements supporting the integration of expression data (e.g., transcription and protein association and localization) with pathway simulations are priorities.
  • Whole Cellular Simulations. New methods to model complex cellular processes are being encouraged. These methods include integrating multiple data types such as two- and three-dimensional imaging and spectroscopic data with cellular models or simulations.

Summaries of the Projects Awarded Under this Funding Opportunity

Enabling a Systems Biology Knowledgebase with Gaggle and Firegoose

  • Principal Investigator: Nitin Baliga (Institute for Systems Biology)

This project will extend the existing Gaggle and Firegoose systems to develop an open-source technology that runs over the web and links desktop applications with many databases and software applications. Researchers will incorporate workflows for data analysis that can be executed from this interface to other online applications. Four specific aims are to (1) provide one-click mapping of genes, proteins, and complexes across databases and species; (2) enable multiple simultaneous workflows; (3) expand sophisticated data analysis for online resources; and enhance open-source development of the Gaggle-Firegoose infrastructure. Gaggle is an open-source Java software system that integrates existing bioinformatics programs and data sources into a user-friendly, extensible environment to allow interactive exploration, visualization, and analysis of systems biology data. Firegoose is an extension to the Mozilla Firefox web browser that enables data transfer between websites and desktop tools including Gaggle.

Development of a Knowledgebase to Integrate, Analyze, Distribute, and Visualize Microbial Community Systems Biology Data

  • Principal Investigator: Jill Banfield (University of California, Berkeley)

This project will develop a web-based knowledgebase that integrates metagenomic data with metaproteomic and metabolomic data from microbial communities. Although the knowledgebase will include several communities, an emphasis will be on microbes from acid mine drainage, a research area in which the principle investigator is experienced and has collected data. This new system will be usable by a larger scientific community in terms of layering gene sequence data with analyzed and predicted peptide sequence and metabolite data in a visual and queryable format. The general microbial research community likely will find this work useful. However, investigators also note that current applications to simple systems may pose interesting challenges when scaled to much larger communities. The project aims to develop three specific resources and capabilities: (1) a centralized database to integrate various omics datasets, (2) tools for mapping and representing proteomic and genomic datasets comprising orthologous genes in the presence of genomic variation, and (3) a metabolite atlas of the acid mine drainage microbial community.

Tools and Models for Integrating Multiple Cellular Networks

  • Principal Investigator: Mark Gerstein (Yale University)

This application will develop computational tools to link metabolic pathways with regulatory pathways and physical (protein-protein) interaction data. This work uses the principle investigator’s methods for the ENCODE project (Encyclopedia of DNA Elements) and applies these to prokaryotes of interest to DOE. (ENCODE identifies all functional elements in the human genome sequence.) This project will go beyond ENCODE and ModENCODE (Model Organism ENCODE) by also developing topological analysis tools and dynamical modeling of integrated networks. The three specific aims are to (1) develop computational tools for analyzing integrated networks; (2) conduct correlative and topological analysis using these tools, in combination with other genomic information; and (3) carry out dynamical and evolutionary modeling of the integrated network.

Curation and Computational Design of Bioenergy-Related Metabolic Pathways

  • Principal Investigator: Peter Karp (SRI International)

This project will develop an enhancement in the MetaCyc Pathway Tools aimed specifically at bioenergy-related processes. Pathway Tools are a set of metabolic pathway and enzyme tools generally created on an organism-by-organism basis. This application first will push out these tools to enable greater use in bioengineering for bioenergy-related processes and second will produce new graphical visualizations of metabolic pathways that can allow users to manipulate, rank, and visualize pathways. Two specific aims are to (1) enhance MetaCyc data and generate a bioenergy-related pathway and genome database and (2) develop computational tools for engineering metabolic pathways that satisfy specified design goals.

Computational Modeling of Fluctuations in Energy and Metabolic Pathways of Methanogenic Archaea
( Jointly funded with the DOE Office of Advanced Scientific Computing Research)

  • Principal Investigator: Zaida Luthey-Schulten (University of Illinois, Urbana-Champaign)

This project will develop methodology and corresponding computational tools to simulate a population of microbes in response to environmental fluctuations. Aimed particularly at the methanogenic archaea Methanosarcina species, the work begins with genome-scale modeling of the microbe’s metabolic and regulatory pathways. This method then will be integrated into a cellular modeling method that takes into account environmental fluctuations. Investigators will work in collaboration with William Metcalf ’s (University of Illinois, Urbana-Champaign) ongoing experimental studies on Methanosarcina. Specific aims are to (1) construct an integrated stochastic and systems model of Methanosarcina, (2) investigate how an in silico population of the microbe’s cells respond to environmental fluctuations, and (3) validate the computational methodology and demonstrate its applicability to other biological systems.

A Systems Biology Knowledgebase: Context for Content

  • Principal Investigator: Bernhard Palsson (University of California, San Diego)

This project will develop a portal and the computational tools to integrate multiple omics data to reconstruct transcriptional regulatory networks of microbes of interest to DOE (e.g., Escherichia coli, Geobacter, and Thermotoga). The data include protein binding (ChIP-chip), gene expression (microarrays and RNA-Seq), transcriptional start sites (sequencing), peptide (LC-FTICRMS), and gene annotations. The application will also develop a formal mathematical framework for modeling transcriptional regulatory networks in these species. The framework captures gene-protein-reaction associations, condition-specific transcriptional basic unit structure, functional regulation of each transcriptional unit in the expression context, and structural constraints that govern transcription factor–promoter binding. Three specific aims for the project are to (1) develop computational tools to integrate omics data for genome annotation and transcription, (2) develop a genome-scale knowledgebase to provide operational constraints on cellular function, and (3) formulate in silico models to enable genome-scale queries.

Integrated Approach to Reconstruction of Regulatory Networks

  • Principal Investigator: Dmitry Rodionov (Burnham Institute)

This project will extend research to identify regulons for regulatory network reconstruction and develop a method for comparing regulatory networks across microbial species. This will be accomplished by developing new clustering algorithms for cross-species comparisons, integrating known data and information from other resources and databases, and developing a platform for users to analyze experimental data. Specific aims of this application are to (1) develop an integrative platform for genome-scale regulon reconstruction, (2) infer regulatory annotations for several groups of bacteria related to DOE missions, and (3) develop a knowledgebase for microbial transcriptional regulation data and analysis. The final goal will be to develop a platform that integrates the experimental and computational data on transcriptional regulation in microbes. Another end goal is to allow any user to upload data (public or private), perform analyses with the data, and compare them to the analysis work conducted by the researcher who generated the data for a particular experiment.

An Open-Source Platform for Multiscale Spatially Distributed Simulations of Microbial Ecosystems

These snapshots show a simulation of microbial growth and were generated using a spatially distributed dynamic flux balance analysis approach. The software platform, called COMETS (Computation Of Microbial Ecosystems in Time and Space), is being developed by the Segrè laboratory at Boston University. Here, in a first version of the software built by graduate student William Riehl, a central carbon metabolism model of Escherichia coli  is used to simulate colony growth. [Image courtesy of Daniel Segrè.]

  • Principal Investigator: Daniel Segrè (Boston University)

This project will develop an open-source platform for simulating microbial ecosystems. A simulation package will be developed based on a spatially distributed and time-dependent flux balance analysis program. One unique feature of this work will be the ability to bridge spatial and temporal scales, thus enabling simulation of microbial growth given environmental settings, including nutrient availabilities and metabolite exchange. Specific aims are to (1) modify a current dynamic flux balance analysis (dFBA) program to include spatially structured interacting metabolite dynamics of the microbial system and (2) study interactions in terms of dynamically changing colony morphology by modeling the simultaneous growth of mutualistic pairs of microbes. This work will draw on corresponding experimental data made available through project collaborators.

Phylogenomic Tools and Web Resources for the Systems Biology Knowledgebase

  • Principal Investigator: Kimmen Sjölander (University of California, Berkeley)

This project will develop new methods to functionally annotate microbial species based on phylogenomic relationships and using the hidden Markov model (HMM) methodology based on the structural information of families of homologous genomes. The principal investigator will work collaboratively with a Harvard biologist to analyze dataset(s) containing sequence data from environmental samples of marine invertebrate-bacterial symbionts. The project also will involve collaborating with the National Institute of Advanced Industrial Science and Technology and University of Tokyo computational biologists on multispecies cooperative pathway analysis. Three primary objectives for the project are to (1) extend the PhyloFact annotation method to include new microbial data and related database information such as the Kyoto Encyclopedia of Genes and Genomes (KEGG), PFAM, Gene Ontology (GO), experimental evidence codes, and structural information; (2) develop a new HMM algorithm to create novel gene trees; and (3) apply the PhyloFact annotation pipeline to collaborative marine microbial systems.

Development of an Extensible Computational Framework for Centralized Storage and Distributed Curation and Analysis of Genomic Data and Genome-Scale Metabolic Models

  • Principal Investigator: Rick Stevens (University of Chicago)

This work will develop a computational framework that combines a centralized extensible database for integrating omics and sequence data with a distributed pipeline for using these data to annotate genomes and to reconstruct and analyze new genome-scale metabolic models. The proposed framework will be interfaced with the SEED. Three significant components of this interface will be enhancing the backend of SEED to support new data types and queries, integrating this into a model-building application for whole genome–scale networks (regulatory and metabolic) and developing an application programming interface (API) for KBase to utilize this work.

Specific project objectives are (1) an improved infrastructure to enhance the framework’s extensibility, accessibility, and scalability; (2) an extended database to accommodate new predicted and experimental biological data types such as microbial transcriptional regulatory networks, genome-scale metabolic models, experimental evidence (e.g., microarray data, ChIPchip data, and equilibrium constants), eukaryote genomes, and growth phenotype data (e.g., biology array data, culture conditions, growth rates, and gene essentiality); and (3) a new API to provide remote access to the database and tools, including RAST annotation of raw genome sequences, automated reconstruction of draft genome-scale metabolic models, flux balance analysis of such models; and querying of all data.

Gene Ontology Terms and Automated Annotation for Energy-Related Microbial Genomes

  • Principal Investigators: Biswarup Mukhopadhyay, Brett Tyler, and João Carlos Setubal (Virginia Polytechnic Institute and State University)

This effort will develop a set of GO terms for describing energy-related microbial processes. GO is one of the more widely used functional ontologies for annotating genes, and this project will address the known community gap in GO terms for microbial processes that makes GO much more relevant for human systems. Two specific aims are to (1) develop MENGO terms (ontologies for microbial energy processes) and host a series of tutorials and workshops at key meetings to inform and train microbiologists on these terms and (2) develop a database and web interface for storing and displaying these terms and microbial annotations.




Related BER Research Highlights

    • More BER Research Highlights »