Community-Driven Data Science System to Advance Microbiome Research
The DOE’s National Microbiome Data Collaborative (NMDC) is developing an open-access framework that facilitates more efficient use of microbiome data for applications in energy, environment, health, and agriculture. Over the past few decades, microbiome data have grown exponentially. However, the sheer amount of data available presents a significant bottleneck for analysis and interpretation.
To tackle this data integration challenge, NMDC leverages DOE’s existing data-science resources and high-performance computing systems. The guiding principles at the initiative’s core are (1) making data findable, accessible, interoperable, and reusable (FAIR); (2) connecting data and compute resources; and (3) community engagement that supports open science and shared ownership.
Capabilities being enabled by NMDC include:
- Aggregating and viewing both taxonomic and functional profiles of unassembled and assembled metagenome sequence data to gain new insights into microbiome composition and function.
- Accessing, analyzing, and integrating multiomics datasets (metagenome, metatranscriptome, metaproteome, metabolome, and environmental data) to discover community dynamics, metabolic networks, and other microbe-microbe, microbe-host, and microbe-environment interactions.
- Accelerating searches through linked data using existing and enhanced ways to describe microbiome datasets, diversifying the sample space and depth for new discoveries.