Skip to main content
Dryad

Diapause vs. reproductive programs: transcriptional phenotypes in Calanus finmarchicus

Cite this dataset

Lenz, Petra H. et al. (2021). Diapause vs. reproductive programs: transcriptional phenotypes in Calanus finmarchicus [Dataset]. Dryad. https://doi.org/10.5061/dryad.12jm63xw7

Abstract

Many arthropods undergo a seasonal dormancy termed “diapause” to optimize timing of reproduction in highly seasonal environments. In the North Atlantic, the copepod Calanus finmarchicus completes one to three generations annually with some individuals maturing into adults, while others interrupt their development to enter diapause. It is unknown which, why and when individuals enter the diapause program. Transcriptomic data from copepods on known programs were analyzed using dimensionality reduction of gene expression and functional analyses to identify program-specific genes and biological processes. These analyses elucidated physiological differences and established protocols that distinguish between programs. Differences in gene expression were associated with maturation of individuals on the reproductive program, while those on the diapause program showed little change over time. Only two of six filters effectively separated copepods by developmental program. The first one included all genes annotated to RNA metabolism and this was confirmed using differential gene expression analysis. The second filter identified 54 differentially expressed genes that were consistently up-regulated in individuals on the diapause program in comparison with those on the reproductive program. Annotated to oogenesis, RNA metabolism and fatty acid biosynthesis, these genes are both indicators for diapause preparation and good candidates for functional studies.

Methods

The study used an existing RNA-Seq dataset generated by Tarrant and colleagues for the calanoid copepod Calanus finmarchicus pre-adult copepodids (CV). Short-read sequences for 16 samples were downloaded from the short-sequence read archive (SRA) in National Center for Biotechnology Information (NCBI) database (Illumina HiSeq2000, 50 bp, paired-end with ≥30M spots per sample, BioProject: PRJNA 231164) (Tarrant et al., 2014). The dataset included a laboratory-cultured population and a field-collected wild with two time-points each (early and late). The two populations could be distinguished, based on the developmental programs; the laboratory-cultured group was on the “reproductive program” while the field-collected group was on the “diapause program”.  Briefly, the laboratory-cultured samples consisted of recently molted (≤ 24 hrs) stage CV copepodids that had been isolated and incubated separately until harvested at three (early culture, “EC”) and 10 days (late culture, “LC”) post-molt. The diapause-program copepodids had been collected from the field (Trollet Station in Trondheimsfjord, Norway) on May 28, 2013 (early field, “EF”) and 14 days later on June 10, 2013 (late field, “LF). Additional details on the experiments can be found in previous studies: Tarrant et al., 2014 and 2016. Our goal was to determine whether the two programs could be separated by their respective gene expression (transcriptomic) phenotypes, and whether this difference would lead to new insights into the physiological basis of the diapause program.

The bioinformatic analyses on the 16 RNA-Seq libraries has involved several strategies. In the first strategy we applied the dimensionality-reduction algorithm, t-Distributed Stochastic Neighbor Embedding technique (t-SNE) to cluster samples agnostically by similarity in gene expression patterns. In the second strategy we applied a gene expression workflow to identify differentially expressed genes (DEGs) between the two programs. This was followed by downstream correlation network analysis (WGCNA) and examination of predicted gene function. Lastly, the third strategy focused on functional analysis of expression differences and comparison with expected physiological and transcriptional differences. Using a target approach, we identify sets of genes associated with relevant biological processes which separate the samples into reproductive and diapause categories and were independently or minimally affected by environment and/or time.

Funding

National Science Foundation, Award: OCE-1459235

National Science Foundation, Award: OCE-1756767

National Science Foundation, Award: OPP-1746087