Improving Candidate Gene Discovery by Combining Multiple Genetic Mapping Datasets

Authors:

Nirwan Tandukar, Fausto Rodríguez-Zapata, Jung-Ying Tzeng, and Rubén Rellán-Álvarez* (rrellan@ncsu.edu)

Institutions:

North Carolina State University

Goals

(1) Perform an environmental GWAS in a panel of ~2000 sorghum accessions that have already been genotyped and georeferenced using phosphorus availability and early season cold stress as the phenotypes for the GWAS analysis.

(2) Characterize the genetic architecture of lipid content during the early stages of sorghum development using the SAP. The team will sequence the SAP accessions at 10–15X and make these data available. Researchers will perform a GWAS on lipid content under both stress conditions (low temperature and low phosphorus).

(3) Develop algorithms that incorporate all the different types of information collected (i.e., metabolite levels, GWAS candidate genes, selection signals) to improve the ability to detect signals of small effects and increase confidence in the selection of candidate genes. The algorithms and pipelines developed here will be made available to the community as R packages.

Abstract

With a growing wealth of genetic datasets generated by next-generation sequencing coupled with the advent of large plant phenotyping datasets, exciting new corridors for investigations have opened for understanding complex traits due to the environment. The Genome-Wide Association Studies (GWAS) model identifies associations between single nucleotide polymorphisms (SNPs) and the phenotype. Complex biological processes involve multiple phenotypes. Researchers have previously identified lipid variation for maize adaptation in Mexican highlands, which has adapted to low phosphorus and cold. The team is now using high-dimensional Sorghum bicolor genetic datasets to perform environmental GWAS for various soil phosphorus phenotypes (availability, concentration, and solubility) in African region, Fst measurement in the same panel adapted to high and low phosphorus, and finally, profiling various lipid concentrations (LC/MS) in low and high phosphorus in Sorghum Association Panel (SAP) and their subsequent metabolomics GWAS. Comprehensive research on the complete genetic architecture is laborious, costly, and time extensive due to the overwhelming number of genes and their regulatory networks, different phenotypes explaining the same adaptive process, and multidimensional genomics datasets. Hence, the need for developing a robust statistical framework that can combine information from different experiments and genomics dataset in an individual p-value level that can aggregate multiple small and large effects of genes and redefine the order of emphasis of genes. For such an outcome, researchers use the Cauchy distribution to define a test statistic as a weighted sum of Cauchy transformation of individual p-values, which can? be used to combine p-values across different datasets. Researchers are working towards creating such a framework through R packages that will be publicly available. Finally, researchers hope to have an accurate description of the molecular mechanisms involving phosphorus. Team members also hope to test whether lipids in sorghum play a similar adaptive role in Africa, and whether there is convergence between the sorghum and maize in the plausible molecular mechanisms for such an adaptation.

Funding Information

This research was supported by the DOE Office of Science, Office of Biological and Environmental Research (BER), grant no. DE‐SC0021889.