144 prioritized genes of flooding-tolerance (FTgenes) in soybean
Data files
Sep 15, 2022 version files 12.81 KB
-
FTgenes_144.xlsx
-
README.txt
Abstract
Soybean [Glycine max (L.) Merr.] is one of the most important legume crops abundant in edible protein and oil in the world. Due to the drastic climate change, flooding, drought and unevenly distributed rainfall have gradually increased in terms of the frequency and intensity worldwide. In particular, severe flooding has caused extensive losses to soybean production. In light of the harsh situation, there has been an urge to breed strong soybean seeds with high flooding tolerance. We collected and integrated genetic data that relevant to flooding-tolerant responses in soybean from multiple dimensional data sources. A step-function adjusted factor prioritization algorithm was proposed to prioritize these integrated genetic data. A total of 144 candidate genes of flooding-tolerance (FTgenes) in soybean were prioritized, using a cut-off threshold of combined score of 42, from 36,705 test genes that collected from multidimensional genomic features linking to soybean flooding tolerance. Several validation results using independent samples from SoyNet, GWAS, SoyBase, GO database and transcriptome databases all exhibited excellent agreement, suggesting these 144 FTgenes were significantly superior than others. Our results provide valuable information, meaningful insight, and contribution to varieties selection of soybean. The FTgenes demonstrated the potential for uncovering important insights underlying flooding-tolerant response in soybean in systems biology stuydies.
Methods
We collected genetic data from nine data platforms, including genome-wide association study (GWAS), association mapping, linkage mapping, gene expression, pathway regulation, protein-protein interaction networks (PPIN), network analysis, proteomes, text mining, and functional genomic data from model plants. An evidence-based scoring and a step-function adjusted factor weighting system was proposed to integrate these genomic data across multidimensional data sources and prioritize these integrated genetic data. To avoid false positive results, all positive and negative results were considered and integrated to give a real value for every genetic locus to construct evidence-based genes pool (denoted as test genes). We computed a weighted combined score summarized from multiple data sources for each of the test genes. Similarly, a set of core genes was established for prioritizing the test genes. A clear separation between the test genes and the core genes was determined to identify prioritized FTgenes.
Usage notes
Each of the FTgenes provides a combined score, which summed over their magnitude of association or response change to the stress from multiple data sources. The gene version is in Glyma v2.0.