Leaf gene expression trajectories during the growing season are consistent between sites and years in American beech
Data files
Jan 09, 2023 version files 628.29 MB
-
fagr_polished_shasta_assembly_default_minlen_3kb.fasta
477.79 MB
-
FAGRfinal_annotated.faa
10.54 MB
-
Fg-All-4197.fa
1.02 MB
-
Fg-sort.vcf.gz
138.94 MB
-
README.md
1.93 KB
Oct 16, 2023 version files 676.31 MB
-
fagr_polished_shasta_assembly_default_minlen_3kb.fasta
477.79 MB
-
FAGR.gfacs.genes.fasta.faa
14.76 MB
-
FAGRfinal_annotated.faa
10.54 MB
-
Fg-All-4197.fa
1.02 MB
-
Fg-sort.vcf.gz
138.94 MB
-
README.md
1.58 KB
-
SI-Sezen_Beech_2023_SupplementaryData.xlsx
33.26 MB
Feb 02, 2024 version files 676.31 MB
-
fagr_polished_shasta_assembly_default_minlen_3kb.fasta
477.79 MB
-
FAGR.gfacs.genes.fasta.faa
14.76 MB
-
FAGRfinal_annotated.faa
10.54 MB
-
Fg-All-4197.fa
1.02 MB
-
Fg-sort.vcf.gz
138.94 MB
-
README.md
1.87 KB
-
SI-Sezen_Beech_2023_SupplementaryData.xlsx
33.26 MB
Abstract
Transcriptomics, the quantification of gene expression, provides a versatile tool for ecological monitoring. Here, we show that through genome-guided profiling of transcripts mapping to 33,042 gene models, expression differences can be discerned among multi-year and seasonal leaf samples collected from American beech trees at two latitudinally separated sites. Despite a bottleneck imposed due to large-scale post-Columbian deforestation, the SNP-based population genetic background analysis has yielded sufficient variation to account for differences between populations and among individuals. Our time series of expression analyses during spring-summer and summer-fall transitions for two consecutive years involved 4,197 differentially expressed protein-coding genes. Using Populus orthologs of the differentially expressed genes, we reconstructed a protein-protein interactome as a representation of the leaf physiological states of trees during the seasonal transitions. Gene set enrichment analysis revealed GO terms that highlight molecular functions and biological processes possibly influenced by abiotic forcings such as recovery from drought and response to excess precipitation. Further, based on 324 co-regulated transcripts, we focused on a subset of terms that could be putatively attributed to phenological shifts due to late spring. Our conservative results indicate that extended transcriptome-based monitoring of forests can capture ranges of responses arising from other factors including air quality, chronic disease as well as herbivore outbreaks that require activation and/or downregulation of genes collectively tuning reaction norms needed for the survival of long-living trees such as the American beech.
Description of the Data and file structure
Public access to the two beech genomes assembled by the University of Connecticut and the Cornell University: https://treegenesdb.org/jbrowse\
There are six items in this submission.
[1] An Excel formatted multi-tab spreadsheet containing sample list (gray shading denotes missing or low quality samples not included in the study), genome alignment rates, transcript raw counts, ordination results, list of top 20 differentially expressed genes, g:Profiler output permalinks, kinship matrix, Kendalls correlation values and an annotated list of 324 leaf phenology-related gene orthologs.
[2] A compressed Variant Call Format (VCF) file containing SNP variations of 40 trees (20 from Harvard Forest and 20 from the SERC ForestGEO forest dynamics plots) aligned to the draft American beech genome assembled by the University of Connecticut.
[3] Fasta formatted amino acid sequences of protein models predicted from the draft genome assembly.
[4] A Fasta formatted subset of 4197 protein sequences corresponding to the differentially expressed transcripts. This set was used to construct the interactome network using the STRING database.
[5] A Fasta file formatted draft American beech genome assembly used in this study. Assembly was computed by Shasta assembler.
[6] A text file compiling representative snippets from the command line and code used in the manuscript.
Sharing/access Information
Links to other publicly accessible locations of the data:
https://treegenesdb.org/jbrowse\
RNA-seq derived raw FASTQ read sequences are uploaded to the NCBI Sequence Read Archive (SRA) with the bioproject number PRJNA630305.
Was data derived from another source?
If yes, list source(s):
American beech leaves have been felled from the canopy through Spring, Summer and Fall between 2017 and 2018. For subcanopy trees we used a 12m pruner pole. Sample collections were done in the morning to avoid expression changes throughout the day. For each sample, 10 leaf discs were punched into 2ml extraction vials containing 3 steel balls of 3mm in diameter, and flash frozen in liquid nitrogen in the field (SI Fig. S2). Frozen samples were then transferred to a -80 freezer at the end of the day. RNA extractions were carried out using E.Z.N.A Plant RNA Kit by Omega Bio-Tek (Norcross, GA, USA). Quality metrics for extracted RNA were determined using the Agilent Bioanalyzer 2100 instrument (Agilent Technologies, Santa Clara, CA) and samples containing genomic DNA contaminants were treated with DNAse I (ThermoFisher Scientific, Waltham, MA). Illumina TruSeq libraries were prepared for each sample RNA (Illumina Inc., San Diego, CA, USA). Libraries were indexed and pooled to be sequenced on the Illumina NovaSeq 6000 platform as 150 nucleotide long paired end reads targeting ~14 million total reads per sample (Illumina Inc., San Diego, CA, USA). Sequences are uploaded to the NCBI Sequence Read Archive (SRA) with the bioproject number PRJNA630305.