Convergent evolution of desiccation tolerance in grasses
Data files
Dec 11, 2023 version files 25.14 MB
-
Convergent_DT_in_grasses.tar.gz
-
README.md
Abstract
Desiccation tolerance has evolved repeatedly in plants as an adaptation to survive extreme environments. Plants use similar biophysical and cellular mechanisms to survive life without water, but convergence at the molecular, gene, and regulatory levels remains to be tested. Here, we explore the evolutionary mechanisms underlying the recurrent evolution of desiccation tolerance across grasses. We present genomes of three resurrection grasses native Sub-Saharan Africa. We leveraged comparative genomic and transcriptomic approaches to identify patterns of convergence and divergence across these species. We observed substantial overlap in gene duplication and expression associated with desiccation, and syntenic genes of shared origin are activated across species, indicative of parallel evolution. In other cases, similar metabolic pathways are induced, but using different gene sets, pointing towards phenotypic convergence. Species-specific mechanisms supplement these shared core mechanisms, underlining the complexity and diversity of evolutionary adaptations. Our findings provide insight into the evolutionary processes driving desiccation tolerance and highlight the roles of parallel mutation and complementary pathway adaptation in response to environmental challenges.
README: Convergent evolution of desiccation tolerance in grasses
https://doi.org/10.5061/dryad.kh18932c4
Here, we provide physiological metadata, annotation information, and other useful intermediate files from our analyses.
Description of the data and file structure
These data are organized in directories. In the parent directory, we provide physiological data in the file 3_grasses_timecourse_metadata.csv. Plants were sampled at targeted hydration states during the process of dehydration, including well watered (WW), partially dehydrated (D1, D2, and D3), fully desiccated (D4), and rehydrated (R1 and R2). We validated the physiological status of tissues by measuring relative water content (RWC) and photosynthetic efficiency (Fv/Fm), and these data are summarized in the 3_grasses_timecourse_metadata.csv file. The parent directory also contains a matrix of all the syntenic orthologs for each study species in the file Syntenic_orthogroups.tsv. We used the MCScan toolkit (v1.1) implemented in python [https://github.com/tanghaibao/jcvi/wiki/MCscan-(Python-version)] using the chromosome-scale Oropetium thomeaum genome as an anchor. Syntenic blocks were identified using gene models aligned using LAST with a minimum of five overlapping syntenic genes. A set of 18,428 conserved syntenic orthologs (syntelogs) across five desiccation tolerant grasses was created and used for downstream comparative genomic and cross-species transcriptomic analyses.
The sub directory Gene_expression contains expression matrices for three species (Microchloa caffra, Oropetium capense, and Tripogon minimus) as well as the combined expression matrix for the syntelog expression across all three species. Trimmed reads were sudo-aligned to reference genomes using Salmon (v 1.9.0), and the resulting quantification files were processed with tximport (v 3.18) and DEseq2 R package (v 1.42.0) to generate normalized expression matrices. We computed the expression of syntenic orthologs for each species by summing the expression of all genes assigned to that syntelog. The sub-directory DEGs within Gene_expression contains lists of differentially expressed genes (DEGs) for each of the three species under both dehydration and rehydration conditions. DEGs were identified independently for each species with DEseq2 using RWC as a covariate. These analyses produced species-specific lists of DEGs during dehydration and rehydration with significant (FDR adjusted P-value <0.05) associations with RWC. DEG files include the gene name, the reference expression level (basemean) the log2FC, standard errors, p-value and FDR adjusted p-values. Syntelog (synt) IDs are also included. Empty cells indicate that no syntelog was identified for that gene.
The subdirectory Co-expression_modules contains sub-directories for each of the three species with lists of the genes and syntelogs in each co-expression module. We generated co-expression networks using Weighted Gene Co-expression Network Analysis (WGCNA) R package (v1.7).
The sub-directory Gene_ontology contains the GO annotations for each of the three species as well as lists of enriched GO terms in each co-expression module and DEG set. GO terms were assigned through homology with the well annotated genome of sister species Oropetium thomaeum. This was done through a BLASTP (v 2.14.0) search of all O. thomaeum protein sequences against the protein sequences of each study species. Parameters were set to return the single best match for each peptide and an e-value cutoff of 1e-10. We assigned the GO terms from O. thomaeum to the homologous genes in our target species. The resulting files list the GO terms assigned to each gene. We then used TopGO R package (v 2.54.0) to identify significantly enriched GO terms (P-value<0.05) within co-expression modules and sets of DEGs for up- and down-regulated genes during dehydration and rehydration in each target species. GO enrichment files show the p-values for each gene in the different conditions.
The sub directory KEGG contains the KEGG annotations for each of the three species. KEGG annotations were generated for each species using BLASTKoala (https://www.kegg.jp/blastkoala/) on the complete set of annotated peptide sequences.
Sharing/Access information
Other data associated with this study, including genome assemblies of three resurrection grasses native to Sub-Saharan Africa and comprehensive gene expression datasets are deposited in NCBI BioProject PRJNA1044305.
Methods
These data were used to identify core mechanisms of desiccation tolerance shared across grasses. We subjected replicated sets of the resurrection grasses Oropetiuum capense, Tripogon minimus, and Microchloa caffra to a controlled dehydration-rehydration timecourse. Tissues were collected at six comparable timepoints (four during drying and two during recovery) for each of the three species. Plants were sampled at targeted hydration states including from well-watered, partially dehydrated, fully desiccated, and 24 and 48 hours post rehydration. The hydration status of tissues was validated by measuring relative water content. Gene expression data (RNAseq) was generated by extracting total RNA using Spectrum Plant Total RNA kit according to the manufacturer's instructions. Total RNA was then cleaned to remove impurities and contaminants using Zymo Clean & Concentrator kit. RNAseq libraries were constructed by Novagene following a standard polyA+ enrichment strategy and sequenced on an Illumina HiSeq 4000 for 150 bp paired end reads. We leveraged comparative genomic and transcriptomic approaches to identify patterns of convergence and divergence across these species.