Influence of RNA-Seq library construction, sampling methods, and tissue harvesting time on gene expression estimation
Chiari, Ylenia et al. (2023), Influence of RNA-Seq library construction, sampling methods, and tissue harvesting time on gene expression estimation, Dryad, Dataset, https://doi.org/10.5061/dryad.ns1rn8ptb
RNA sequencing (RNA-Seq) is popular for measuring gene expression in non-model organisms, including wild populations. While RNA-Seq can detect gene expression variation among wild-caught individuals and yield important insights into biological function, sampling methods may influence gene expression estimates. We examined the influence of multiple technical variables on estimated gene expression in a non-model fish, the westslope cutthroat trout (Oncorhynchus clarkii lewisi), using two RNA-Seq library types: 3’ RNA-Seq (QuantSeq) and whole mRNA-Seq (NEB). We evaluated effects of dip netting versus electrofishing, and of harvesting tissue immediately versus 5 minutes after euthanasia on estimated gene expression in blood, gill, and muscle. We found no significant differences in gene expression between sampling methods or tissue collection times with either library type. When library types were compared using the same blood samples, 58% of genes detected by both NEB and QuantSeq showed significantly different expression between library types, and NEB detected 31% more genes than QuantSeq. Although QuantSeq and NEB recovered different numbers of genes and expression levels, there were no differences in gene expression between sampling methods and tissue harvesting time for either library type. Our study suggests that researchers can safely rely on different fish sampling strategies in the field. In addition, while QuantSeq is more cost-effective, NEB detects more expressed genes. Therefore, when it is crucial to detect as many genes as possible (especially low expressed genes), when alternative splicing is of interest, or when working with an organism lacking good genomic resources, whole mRNA-Seq is more powerful.
Different tissues (gills, muscle and blood) were harvested from fry of westslope cutthroat trout immediately after death or after 5 min from death. Tissues were preserved in RNA later and extracted with QIazol (Qiagen). Transcriptome data were obtained following the 3' RNA Tag-Seq and whole mRNA-Seq library and sequencing procedures. Analyses of differential gene expression among sampling techniques and tissue harvesting time were carried out.
Table S1: Sample, RNA quality, gene counts, and library information. Sheet “Samples All” lists all samples collected (sample ID and Admera Health ID for QuantSeq and for NEB) with information about the treatment group they belong to, tissue type, sampling method, length and weight of the fish, RIN value, and RNA concentration. Sample size used for each comparison and divided for tissue type, treatment group, and library preparation is also indicated. Sheet “QuantSeq” lists all samples used for the QuantSeq library with the following information for each sample: treatment group, sample ID, Admera Health ID’s, tissue type, sampling method, RIN value, concentration, raw read count, read count after mapping the randomly selected 11 million reads, and percentage of uniquely mapped genes on the reference genome. Sheet “NEB” lists all samples used in the NEB library detailing for each sample the following: treatment group, sample ID, Admera Health ID’s (and new Admera Health ID if existing), tissue type, sampling method, RIN value, concentration, raw read count (PE and single), read count after mapping the randomly selected 40 million reads, and percentage of uniquely mapped genes on the reference genome. “Null" in some cells indicates empty cells -i.e., the samples were not processed due to a lower RIN than the cutoff and thus they do not have the Admera Health ID, or the data on fish weight and length are the same as in the line above since it is the same individual. "na" in some cells indicates that the RIN value could not be obtained.
Table S2: Output results of the Differential Expression Analysis. Results of Differential Expression Analysis done with DESeq2 for all comparisons, each of them presented on a separate sheet. Groups 1, 2, and 3 refer to sampling by netting, electrofishing, and electrofishing with processing 5 min after euthanasia, respectively.
Table S3: Summary of gene expression patterns for different sampling methods and tissue types. The total numbers of genes with detectable expression for each sampling/tissue comparison are indicated along with the number and proportion of genes with significantly higher gene expression in one of the two tissues being compared for each sampling method.