Data from: Gene expression differences between western redcedar seedlings resistant and susceptible to cedar leaf blight
Data files
Feb 13, 2024 version files 137.77 MB
-
edgeR_DE-SRA_PRJNA994939.matrix
210.58 KB
-
README.md
7.11 KB
-
Transcriptome-SRA_PRJNA994939.fasta
137.55 MB
Abstract
Western redcedar (T. plicata) is an important Cupressaceae both at economic and cultural levels in the Pacific Northwest of North America. In adult trees, the species produces one of the most weathering-resistant heartwoods among conifers, making it one of the preferred species for outdoor applications. However, young T. plicata plants are susceptible to infection with cedar leaf blight (D. thujina), an important foliar pathogen that can be devastating in nurseries and small-spaced plantations. Despite that, variability in the resistance against D. thujina in T. plicata has been documented, and such a variability can be used to breed T. plicata for resistance against the pathogen. This investigation aimed to discern the phenotypic and gene expression differences between resistant and susceptible T. plicata seedlings to shed light on the potential constitutive resistance mechanisms against cedar leaf blight in western redcedar. The study consisted of two parts. First, the histological differences between four resistant and four susceptible families that were never infected with the pathogen were investigated. And second, the differences between one resistant and one susceptible family that were infected and not infected with the pathogen were analyzed at the chemical (C, N, mineral nutrients, lignin, fiber, starch, and terpenes) and gene expression (RNA-Seq) levels. The histological part showed that T. plicata seedlings resistant to D. thujina had constitutively thicker cuticles and lower stomata densities than susceptible plants. The chemical analyses revealed that, regardless of their infection status, resistant plants had higher foliar concentrations of sabinene and α-thujene, and higher levels of expression of transcripts that code for leucine-rich repeat receptor-like protein kinases and for bark storage proteins. In conclusion, the data collected in this study shows that constitutive differences at the phenotypic (histological and chemical) and gene expression level exist between T. plicata seedlings susceptible and resistant to D. thujina. Such differences have potential use for marker-assisted selection and breeding for resistance against cedar leaf blight in western redcedar in the future.
https://doi.org/10.5061/dryad.m905qfv80
The sets deposited here include 1) the transcriptomic assembly, and 2) the differential expression data produced from an investigation that aimed to detect constitutive differences between Thuja plicata (western redcedar) seedlings that were susceptible or resistant to Didymascella thujina (cedar leaf blight, CLB). The study also included histological and foliar chemical composition components, but those data are part of the manuscript itself.
The assembled transcriptome can be considered a metagenome as sequences annotated as belonging to organisms other than T. plicata or D. thujina were not removed unless they did not meet the minimum nucleotide length. That approach was taken in case that organisms other than T. plicata could account for resistance against D. thujina in this pathosystem (e.g. due to endosymbionts), which could have been overlooked if just a T. plicata genome was used.
The experimental design used in this part of the study was as follows (see the manuscript as well): Twenty, one-year-old seedlings from each resistant category were exposed to natural D. thujina inoculum in an infected 10-year-old progeny trial in Jordan River (British Columbia, Canada). These CLB+ plants (see table below) were placed in a greenhouse and maintained after inoculation until symptoms developed. A similar number of seedlings per resistance category that had never been exposed to D. thujina (i.e. the CLB- treatment, se below) were maintained at a different facility while the CLB+ seedlings were being inoculated and then they were moved to the same greenhouse as the infected ones, but kept in different rooms for the same length of time. Only three randomly selected seedlings per resistance category and infection condition were used for RNA extraction and RNA-Seq analysis (for a total of 12 RNA-Seq samples). Here is a summary of the experimental design:
Item | Details |
---|---|
Resistance categories and infection conditions | 2 Resistance categories (resistant, susceptible); 2 Treatments (with and without disease symptoms; CLB+, CLB-); ·3 Replicates per resistance category × treatment combination. |
Seedlings’ age | Older than a year when infected. Close to 2 years old when sampled (symptoms had developed in the exposed). |
Inoculation technique | ·Exposure to Didymascella thujina in a Thuja plicata progeny trial site in Jordan River (British Columbia, Canada). |
RNA-Seq analyses | Illumina HiSeq 100bp paired-end technology. In house made libraries per sample, outsourced to Génome Québec. |
RNA-Seq data from the 12 seedlings sampled (i.e. 2 × 2 × 3 in item “Resistance categories and infection conditions” in the above table) were used to assemble the reference transcriptome that was uploaded here. The RNA-Seq data of those 12 seedlings can be found at the Sequence Read Archive of the National Center for Biotechnology Information (NCBI) (BioProject PRJNA994939, entries SRR25280989-SRR25281000).
Description of the data and file structure
The data uploaded here are as follows:
-
Transcriptome_SRA_PRJNA994939.fasta:
This is a de novo transcriptomic (RNA-Seq based) assembly produced in Trinity to analyze the gene expression data produced in this study. The sequences’ names are as those found in the manuscript that this assembly is associated with.
-
edgeR_DE-SRA_PRJNA994939.matrix:
This matrix contains the differential expression data produced after reads from the RNA-Seq samples were mapped to the transcriptomic assembly (notice that these are the raw count data, which had not been log2-centered). File edgeR_DE-SRA_PRJNA994939.matrix is the result of the following procedure:
1) Mapping of reads in BioProject PRJNA99493 (NCBI entries SRR25280989-SRR25281000) to Transcriptome_SRA_PRJNA994939.fasta using RSEM v. 1.2.20.
2) Calculation of the fragments per kilobase of transcript per million mapped (FPKMs).
3) Normalization of the FPKMs by calculating their trimmed mean of M-values (TMM).
4) Pairwise differential expression analyses using edgeR, and compilation of such data from sequences that had a minimum fold-change of four and a maximum false discovery rate (FDR) of 0.001.
The names of the sequences in file edgeR_DE-SRA_PRJNA994939.matrix match those in the transcriptome (Transcriptome_SRA_PRJNA994939.fasta). Refer to the following table to identify the respective sample in the columns of file edgeR_DE-SRA_PRJNA994939.matrix:
Column Seedling ID CLB resistance class Condition HI.2078.008.Index_2.U_Filtered.RSEM 583-23 Susceptible Uninfected HI.2078.008.Index_3.V_Filtered.RSEM 583-27 Susceptible Uninfected HI.2078.008.Index_4.W_Filtered.RSEM 583-33 Susceptible Uninfected HI.2533.001.Index_7.Z_Filtered.RSEM 685-24 Resistant Uninfected HI.2078.008.Index_6.Y_Filtered.RSEM 685-27 Resistant Uninfected HI.2078.008.Index_5.X_Filtered.RSEM 685-34 Resistant Uninfected HI.2078.008.Index_1.T_Filtered.RSEM 583-2 Susceptible Infected HI.2533.001.Index_12.Q_Filtered.RSEM 583-12 Susceptible Infected HI.2533.001.Index_10.CC_Filtered.RSEM 583-17 Susceptible Infected HI.2533.001.Index_8.AA_Filtered.RSEM 685-1 Resistant Infected HI.2533.001.Index_9.BB_Filtered.RSEM 685-4 Resistant Infected HI.2533.001.Index_11.R_Filtered.RSEM 685-18 Resistant Infected
Sharing/Access information
The transcriptomic assembly and differential expression data shared here is linked to NCBI’s BioProject PRJNA994939 (entries SRR25280989-SRR25281000). Refer to NCBI’s Sequence Read Archive (https://www.ncbi.nlm.nih.gov/sra) to access the raw RNA-Seq data produced from the biological material sampled in this investigation.
This dataset includes transcriptomic and gene expression data from an investigation on the impact of Didymascella thujina (cedar leaf blight) on Thuja plicata (western redcedar). The datasets included in this submission were collected and produced as follows below. The information shown here was extracted from the manuscript:
RNA extraction, mRNA enrichment, library production and sequencing
RNA extraction from foliage of three CLB+ and three CLB- seedlings was done using a modified version of the protocol of Rajakani et al. (2013) [...]. mRNA enrichment was done using protocol C of the Thermo Scientific™ MagJET mRNA Enrichment Kit (Life Technologies Inc., Burlington ON, Canada). Libraries were made using the NEB Next® Ultra™ RNA Library Prep Kit for Illumina® v. 1.2. (New England BioLabs® Inc., Ipswich MA, USA). DNA was purified as required using the Thermo Scientific GeneJET NGS Cleanup Kit (Life Technologies Inc.), and size selection (∼450 bp fragment size) was completed with the Thermo Scientific MagJET NGS Cleanup and Size Selection Kit (Life Technologies Inc.). Libraries were barcoded using the NEB Next® Multiplex Oligos for Illumina® - Index Primers Set 1 (New England BioLabs® Inc.). Quality control and quantification of the individual libraries was done with a DNA 1K Analysis Kit (Bio-Rad Laboratories, Mississauga ON, Canada) in an Experion™ Automated Electrophoresis Station (Bio-Rad Laboratories). The final pool consisted of 40 ng of DNA per library. Pair-ended 100 base sequencing was completed in a single lane of an Illumina® HiSeq 2000 sequencer at Genome Quebec Innovation Centre (Montreal QC, Canada).
Assembling and annotation of the reference transcriptome
[...] All of the processes described below were completed on the WestGrid Hermes cluster (https://www.westgrid.ca/) hosted at the University of Victoria using customized shell, Python and R scripts. HPC GridRunner was used to enhance annotation searches such as BLAST and HMMER.
Paired-end FASTQ Illumina® 1.9 (Phred-33 ASCII) compressed files were produced for each sample after sequencing. Each file was checked for quality before and after trimming using FastQC v. 0.11.2 (Andrews, 2014). Trimming was done in Trimmomatic v. 0.33 (Bolger et al. 2014; [...]). The reference transcriptome was built using Trinity v. 2.0.6 (Grabherr et al., 2011) with the default settings for paired-end data, and its statistics were calculated in PRINSEQ v. 0.20.1 (Schmieder and Edwards, 2011). Annotation was completed using Trinotate v. 2.0.2 (http://trinotate.github.io; [...]).
Differential expression analyses
The downstream analyses (Haas et al., 2013) were conducted using the assembled contigs and contig variants from Trinity v. 2.0.6 (Grabherr et al., 2011) instead of the smaller number of corresponding deduced genes. Trinity refers to the contigs and variants as “transcripts” [...]. Reads were mapped to the assembly with RSEM v. 1.2.20 (Li and Dewey, 2011) and fragments per kilobase of transcript per million mapped (FPKMs, Trapnell et al. 2010) were calculated. Normalization was achieved in edgeR (Robinson et al., 2010) by computing the trimmed mean of M-values (TMM; Dillies et al. 2013, Robinson and Oshlack 2010). The differential expression (DE) analysis was completed by comparing all samples in pairs using the default settings in edgeR, and then extracting and merging the sequences that had a minimum fold-change of four and a maximum false discovery rate of 0.001 from all the samples into a single matrix.