Skip to main content

The strength and form of natural selection on transcript abundance in the wild

Cite this dataset

Ahmad, Freed et al. (2020). The strength and form of natural selection on transcript abundance in the wild [Dataset]. Dryad.


Gene transcription variation is known to contribute to disease susceptibility and adaptation, but we currently know very little about how contemporary natural selection shapes transcript abundance. Here, we propose a novel analytical framework to quantify the strength and form of ongoing natural selection at the transcriptome level in a wild vertebrate. We estimated selection on transcript abundance in a cohort of a wild salmonid fish (Salmo trutta) affected by an extracellular myxozoan parasite (Tetracapsuloides bryosalmonae) through mark-recapture field sampling and the integration of RNA-seq with classical regression-based selection analysis. We show, based on fin transcriptomes of the host, that infection by the parasite and subsequent host survival is linked to upregulation of mitotic cell cycle process. We also detect a widespread signal of disruptive selection on transcripts linked to host immune defence, host-pathogen interactions, cellular repair and maintenance. Our results provide insights about how selection can be measured at the transcriptome level to dissect the molecular mechanisms of contemporary natural selection driven by climate change and emerging anthropogenic threats. We anticipate that the approach described here will enable critical information on the molecular mechanisms and targets of natural selection to be obtained in real time.

Usage notes

About raw read counts (stored in ReadCountsGenes_Union.rda)

The raw read counts are provided as an R object, in the input_files/ReadCountsGenes_Union.rda. It is a RangedSummarizedExperiment object which is returned from the GenomicAlignments::summarizeOverlaps function and contains the the read counts data along-with annotation features. The object was created using R script read_counts_in_alignments.R. Its loading in R requires the installation of the package GenomicAlignments e.g. BiocManager::install("GenomicAlignments") and all of its dependencies (See for details). After installation and loading of required packages, the R command load("ReadCountsGenes_Union.rda") should load the RangedSummarizedExperiment object "count_genes2" into R. If there was no error then type:

> dim(count_genes2)

#[1] 57639  1191 #good to go

About Reproducibility:

The R session information e.g. R and the loaded packages versions, is provided the text file Final_sessionInfo.txt and should be consulted for the reproducibility/replicattion/re-analyis of this data analysis.

LINUX commands:

The LINUX commands used for the generation of trout fin-specific splice sites, reference genome modifications and the final genome alignments are provided in the text file LINUX_commands.txt.