Combining GWAS of Metabolomic and Transcriptomic Datasets to Accelerate Discovery of Genes Regulating the Effect of Drought on Plant Growth and Metabolism in Sorghum and Setaria

Authors:

Allen Hubbard*, Philip Ozersky, Louis Connelly, Xiaoping Li, Hui Jiang, Jennifer Barrett, Collin Lubbert, Maddison Pope, Shrikaar Kambhampati, and Ivan Baxter

Institutions:

Donald Danforth Plant Science Center

Goals

Bioenergy feedstocks need to be deployed on marginal soils with minimal inputs to be economically viable and have a low environmental impact. Currently, crop water supply is a key limitation to production. The yields of C₄ bioenergy crops such as Sorghum bicolor have increased through breeding and improved agronomy. Still, the amount of biomass produced for a given amount of water use (water-use efficiency, or WUE) remains unchanged. Therefore, this project aims to develop novel technologies and methodologies to redesign the bioenergy feedstock Sorghum for optimal WUE. Within this broader context, this subproject is leveraging the sorghum pangenome and large phenotypic datasets in Setaria viridis and S. bicolor to discover metabolically important genes for the regulation of WUE in the C₄ grasses. This project aims to develop and demonstrate novel methods and resources to accelerate the production of genetic variants and accelerate phenotyping in both reverse genetics and forward genetics approaches leading to discovery of genes regulating metabolic regulators of WUE.

Abstract

Plants make an amazing array of metabolites to grow and respond to environmental change. The large number of compounds created by plants are poorly characterized, and the genetic programs controlling them are largely unknown. In order to better understand the metabolomic response of C₄ plants to drought stress, researchers conducted parallel experiments in Sorghum and Setaria using diversity panels. Plants were grown in a controlled environment phenotyping system at two watering levels, and samples were harvested 6 days after the watering levels were set.

Metabolites for each sample were quantified in an untargeted fashion via liquid chromatography–mass spectrometry (LC-MS) using two different columns in both positive and negative mode to identify a large number of compound classes. A third of the samples were also profiled for RNA transcripts. ~3800 metabolomics samples, each run on two columns in two modes created an immense informatics challenge. To improve the sensitivity and accuracy of metabolite detection in similar large datasets, the team has developed a suite of three computational tools to overcome the challenges of unreliable algorithms and inefficient validation protocols: isolock, autoCredential, and anovAlign (IAA). Isolock uses metabolite-isotopologue pairs (isopairs) to calculate and correct for mass drift noise across LC-MS runs. AutoCredential leverages statistical features of LC-MS data to amplify naturally present ¹³C isotopologues and validate metabolites through isopairs. AnovAlign, an anova-derived algorithm, is used to align retention time windows across samples to improve delineation of retention time windows for mass features. Using the IAA suite, researchers have quantified thousands of mass features across the 3,800 metabolomics samples. Genome-wide association study (GWAS) analysis has identified a large number of loci affecting these metabolites, including several loci in syntenic regions of the Setaria and Sorghum genomes for the same metabolite. Using informatics tools to harness the sorghum pangenome, researchers are combining the loci with transcriptomic and genomic data to identify candidate genes and alleles underlying the metabolomic response to water deficit, as well as leveraging tandem mass spectrometry to better characterize promising mass features.

Funding Information

This research was supported by the DOE Office of Science, Biological and Environmental Research (BER) Program, grant no. DE-SC0023160 and DE-SC0018277.