Genomic Science Program
U.S. Department of Energy | Office of Science | Biological and Environmental Research Program

OMEGGA: A Computationally Efficient Omics-Guided Global Gapfilling Algorithm for Phenotype-Consistent Metabolic Network Reconstruction


Hyun-Seob Song1* (, Firnaaz Ahamed1, David M. L. Brown Jr.2, Christopher S. Henry3, Janaka N. Edirisinghe3, Aimee K. Kessell1, William C. Nelson4, Jason E. McDermott4, and Kirsten S. Hofmockel4


1University of Nebraska-Lincoln; 2Pacific Northwest National Laboratory (PNNL); 3Argonne National Laboratory; and 4Pacific Northwest National Laboratory



PNNL’s Phenotypic Response of Soil Microbiomes Science Focus Area (SFA) aims to achieve a systems-level understanding of the soil microbiome’s phenotypic response to changing moisture. Researchers perform multi-scale examinations of molecular and ecological interactions occurring within and between members of microbial consortia during organic carbon decomposition, using chitin as a model compound. Integrated experiments address spatial and inter-kingdom interactions among bacteria, fungi, viruses and plants that regulate community functions throughout the soil profile. Data are used to parametrize individual- and population-based models for predicting interspecies and inter-kingdom interactions. Predictions are tested in laboratory and field experiments to reveal individual and community microbial phenotypes. Knowledge gained provides fundamental understanding of how soil microbes interact to decompose organic carbon and enable prediction of how biochemical reaction networks shift in response to changing moisture regimes.


Genome-scale metabolic networks are a valuable tool for gaining a mechanistic understanding of microbial metabolism and predicting trophic interactions within microbial communities. Metabolic network models are constructed through three key steps–draft metabolic model construction based on genome annotations, filling knowledge gaps in biochemical pathways by adding missing reactions (gapfilling), and further refinement and curation. The DOE Systems Biology Knowledgebase (KBase) automates this process by providing a suite of computational apps and modules ( Draft networks constructed based on genome annotations alone do not contain all of the key reactions necessary for robust predictions, and therefore fail to predict biomass production/cell growth as experimentally observed. Gapfilling (i.e., identifying and adding those missing reactions) is a critical next step for enhancing the predictive power of metabolic networks. Current gapfilling algorithms seek a minimal number of reactions following the parsimonious approach and repeats this process in a sequential manner for a given set of phenotypic growth data. However, the reactions added as such are not always biologically relevant, causing the model predictions to be inconsistent.

To address these issues, researchers designed a new gapfilling algorithm termed OMEGGA (OMics-Enabled Global GApfilling). As indicated by its name, OMEGGA uses diverse data sources (such as amplicon, transcriptomic, proteomic, and metabolomic data) to simultaneously fit a draft metabolic model to all available phenotype data. In this work, researchers demonstrate the two major capabilities of OMEGGA: global and omics-guided gapfilling.

For global (or simultaneous) gapfilling, researchers developed a linear programming (LP)-based algorithm to identify a minimal set of reactions meeting all experimentally observed growth conditions, without iterative fitting. The LP-based algorithm shows far superior performance compared to existing mixed integer linear programming (MILP)-based algorithms as demonstrated through a case study using Escherichia coli. While the computational burden builds up exponentially as the number of media conditions increases, the actual computational time was indeed acceptable, indicating that the algorithm can be flexibly extended to more complex datasets by leveraging higher performance computational power in KBase. Importantly, the clever design of LP-based algorithm allows researchers to use non-proprietary LP solvers, avoiding any potential licensing issues. In parallel, researchers also developed an algorithm that incorporates–(1) gene annotations from multiple complementary pipelines (e.g., RAST, Prokka, Koala, DeepEC), and (2) additional omics data (e.g., transcriptomic profiles). In the case study of E. coli, the gapfilling solutions showed much stronger genomic and experimental consistencies than the typical parsimonious gapfilling. The inclusion of biologically relevant reactions is critical to avoid false positives, which traditionally requires manual curation.

Researchers will extend the test cases to include non-model organisms by using condition-specific multiomics and phenotype data from the Model Soil Consortia-2 (MSC-2) and associated isolated organisms developed through PNNL’s Soil Microbiome SFA. Researchers are working with the KBase team to incorporate the OMEGGA algorithm into KBase. The team will build an external application library for gapfilling and incorporate that into existing KBase apps that can leverage the multiomics data to derive more biologically relevant and realistic gapfilling solutions. To improve quality and supportability of the software, testing and documentation will also be incorporated into automated processes. The new optimization algorithms and KBase apps greatly facilitate the construction of high-quality metabolic networks by simultaneously incorporating molecular and phenotypic observations, eliminating the need for time-consuming, manual troubleshooting.

Funding Information

PNNL is a multi-program national laboratory operated by Battelle for the DOE under Contract DE-AC05-76RLO 1830. This program is supported by the U. S. Department of Energy, Office of Science, through the Genomic Science program, Office of Biological and Environmental Research, under FWP 70880 and FWP 78749.