Genomic Science Program
U.S. Department of Energy | Office of Science | Biological and Environmental Research Program

Computational Tools for Multiomic Data Standardization and Integration to Represent Whole Microbial Communities

Authors:

Kostas Konstantinidis1*(kostas@ce.gatech.edu), Caitlin Petro1, Katherine Duchesneau1, Malak Tfaily2, Rachel Wilson3, Jeffrey P. Chanton3, Christopher W. Schadt4, Spencer Roth4, and Joel E. Kostka1

Institutions:

1Georgia Institute of Technology; 2University of Arizona; 3Florida State University; and 4Oak Ridge National Laboratory

Goals

The goals of this research are to: i) standardize methods for detecting the relative abundance of molecular features (e.g., genes, pathways or species) in various omics datasets (e.g., metagenomics, metatranscriptomics, metaproteomics and metabolomics) for use by the scientific community; ii) extend the previously developed dynamic mathematical models for water-based ecosystems by integrating the standardized data from (i) and additional omics data, such as metabolomics, as parameters of the model towards identifying microbe-microbe and microbe-environment interactions within a microbial community; and iii) apply the advanced models to appropriate multiomic data from the DOE’s Spruce and Peatland Responses Under Changing Environments (SPRUCE) project to provide insights into the microbial interaction networks that mediate belowground carbon cycling in these peatland soils as well as how these interaction networks may be altered by climate change drivers (e.g., elevated temperature and CO2).

Abstract

Microbial species, especially in soils, are engaged in incredibly complex interactions based on their physiological responses to the environment and chemical communication via a wide range of molecules in low concentrations. Deciphering the multi-dimensional causes and consequences of such interactions during environmental transitions is challenging because traditional methods reveal only the numerically dominant members of the community related to the flow of the major carbon and nitrogen sources, and/or are typically limited to static correlation networks of abundances that cannot encompass well the dynamic environment. A predictive understanding of how the functioning of soil (and other) ecosystems responds to future environmental perturbations is limited by the inability to elucidate the physiological interactions within complex soil microbial communities and the effect of the physicochemical environment on those interactions. Further, the identification of key microbial guilds, i.e., microbial groups of species that exploit the same resource(s) related to carbon turnover, remains essentially elusive.

To address these challenges, researchers have recently developed mathematical models that represent microbe-microbe and microbe-environment interactions within a community and can predict how these interactions change in the future when environmental parameters such as temperature or precipitation change, i.e., the models represent dynamic models of whole microbial communities. The mathematical models are based on the principle of the ecological Lotka-Volterra (LV) differential equations and require time-series omic data (Dam et al. 2016). Applications of these models to available metagenomic data from freshwater lakes led to a number of insights and predictions, some of which are supported by biological evidence. For instance, an interaction cluster was identified in Lake Lanier (Atlanta, GA) that included cyanobacterial primary producers and proteobacterial heterotrophs that live on the exudates of the cyanobacteria (Dam et al. 2020). Notably, ~46% of all species-species interactions in Lake Lanier were negative indicating competition, while others were positive suggesting cooperation. These results contrasted with those for Lake Mendota (Madison, WI), a lake that freezes in the wintertime that indicated a higher level (~66%) of competition, presumably driven by the fact that Lake Lanier experiences much milder weather fluctuations (Dam et al. 2020). Even though some of the findings may appear to be somewhat anticipated based on existing knowledge, it is important to note that the mathematical biologically agnostic approach is able to quantify these effects on population abundance dynamics and interactions, which is essential for forecasting future behavior. Therefore, the modeling framework provides a new strategy for integrating omics data to summarize the functionality and species-species interactions within natural habitats, while the application of these models to soil data as part of this project represents a novel contribution. Researchers will report on the efforts to adapt these LV models to the soil multiomic data available from the DOE’s SPRUCE project.

An essential part of mathematical modeling is the accurate estimation of in situ abundance of molecular features (e.g., genes, pathways or species). However, how to precisely measure the abundance of features in metagenomic or other omic datasets remains challenging because the available methods have not yet been standardized, and it is not clear how the data from different approaches can be compared and interpolated. Estimation of in situ abundances is the cornerstone for several additional downstream analyses such as identifying differentially abundant taxa between samples, metabolic modeling, etc. Researchers will present the approaches to standardize the methods for detecting the relative abundance of features in various omics datasets (e.g., metagenomics, metatranscriptomics, metaproteomics and metabolomics) for use by the scientific community. As a representative example, researchers have recently advanced the tool for metagenomic (or metatranscriptomic) read recruitment plotting to provide precise estimates of whole-genome and individual gene abundances, and the extent of intra-population gene-content and sequence diversity (Gerhardt et al. 2021). Further, by analyzing publicly available genome and metagenome data, researchers show that the diversity within species is organized in distinct 99.5% Average Nucleotide Identity (ANI) clusters that can be used to consistently describe genomovars and strains (Rodriguez-R et al. 2022). Using these standards and concepts, researchers will also present the efforts to quantify how individual species and strains within species respond to the temperature and CO2 treatments applied at the DOE’s SPRUCE project.

References

Dam, P., et al. 2016. “Dynamic Models of the Complex Microbial Metapopulation of Lake ” Npj Systems Biology and Applications 2, 16007.

Dam, P., et al. “Model-based Comparisons of the Abundance Dynamics of Bacterial Communities in Two Lakes.” Scientific Reports 10(2423).

Gerhardt, K., et al. 2021. “RecruitPlotEasy: An Advanced Read Recruitment Plot Tool for Assessing Metagenomic Population Abundance and Genetic Diversity.” Frontiers in Bioinformatics 1, 826701.

Rodriguez-R, L. M., et al. 2022. “A Natural Definition for a Bacterial Strain and Clonal Complex.” bioRxiv https://doi.org/10.1101/2022.06.27.497766.