SNPs derived from a common garden experiment across the biogeographic range of Kelletia kelletii
Data files
Sep 18, 2024 version files 499.06 MB
-
kw_allsnps.vcf
458.60 MB
-
kw_degsnps.vcf
40.46 MB
-
README.md
1.06 KB
Abstract
Signals of natural selection can be quickly eroded in high gene-flow systems, severely challenging efforts to understand how and when genetic adaptation occurs in the ocean. This long-standing, unresolved topic in ecology has renewed importance because rapidly changing environmental conditions are driving range expansions that, in many cases, necessitate rapid evolutionary responses. Kellet’s whelk expanded their biogeographic range in the 1980s and are readily adapting to novel conditions in its expanded range. To test for genetic adaptation in a coastal marine species with high dispersal potential, we performed a series of crosses on Kellet's whelk (Kelletia kelletii) collected from its historical and recently colonized range, and conducted RNA-Seq on offspring that we reared in a common garden environment. We identified 2,770 differentially expressed genes between 54 samples with historical-range and expanded-range ancestry. Using SNPs called directly from the differentially expressed genes, we revealed parental population structure that enabled us to assign “unknown” samples back to their range of origin with unprecedented accuracy for a marine species (92.6 to 94.5%). The SNP with the highest predictive importance occurred on triosephosphate isomerase (TPI), an essential enzyme for glycolysis and glucogenesis, which also plays a role in cold stress response. TPI is both highly upregulated and contains a non-synonymous mutation in the expanded range, where ocean temperatures are colder than in the historical range. Our findings pave the way for accurately identifying patterns of dispersal, gene flow, and population connectivity in the ocean by demonstrating that rapid genetic adaptation can occur even in high gene flow species and that experimental transcriptomics can reveal mechanisms for how marine organisms respond to changing environmental conditions.
README: VCFs for: Genetic adaptation despite high gene flow in a range-expanding population
https://doi.org/10.5061/dryad.qbzkh18s3
Included here are unfiltered VCFs files associated with the publication "Genetic adaptation despite high gene flow in a range-expanding population" in Molecular Ecology. SNPs are called from RNA-Seq reads of samples reared in a common garden.
Description of the data and file structure
We have included VCF files for both the "All-SNPs" and "DEGs-SNPs" datasets associated with the study. All-SNPs includes SNPs called across the transcriptome, and DEG-SNPs include only the SNPs found on on differentially expressed genes (DEGs) between the expanded and historical range of Kellet's whelks.
Sharing/Access information
Raw reads can be found in SRA: https://www.ncbi.nlm.nih.gov/sra/PRJNA1000198
Code/Software
Methods
Using SCUBA, we collected adult Kellet’s whelks by hand from sub-tidal (approximately 15 m depth) locations across California (Figure 1A, CDFW SCP #8018 to C.W.). Adult whelks from three locations: Monterey (MON, north of Point Conception; 36.6181670 N, 121.89 W), Naples (NAP, just south of Point Conception; 34.4219670 N, 119.952283 W), and Point Loma (POL, Southern California; 32.665333 N, 117.261517 W) were used in our main experimental cross (described below). Additionally adult whelks from Diablo Canyon (DIC in the expanded range; 35.22445 N, 120.877483 W) were collected and used to validate results of the main experimental cross.
We acclimated 40 individuals each from MON, NAP, and POL in common garden aquaria using the flow-through filtered seawater system at the California Polytechnic State University Research Pier in Avila, California (35.169817 N, 120.740838 W). Additionally, an experimental reciprocal cross between MON and NAP was created with 7 individuals from both sites. The wild-collected adults were acclimated for 7 to 10 months, fed identical diets of mixed seafood (scallops, shrimp, blue mussels, and squid) and then cross-bred within 92 Liter common aquaria. Because all offspring were reared in a common garden and thus environmental conditions were identical, differences in their gene expression can largely be attributed to genetic and/or multi-generational epigenetic differences between populations (Christie et al., 2016; Roberge et al., 2008). A small subset of these differentially expressed genes may also be driven by maternal or paternal effects (Mousseau & Fox, 1998; Wolf & Wade, 2009). In total, these four experimental crosses used 134 adult whelks.
We separated roughly 500-1000 veliger larvae (F1 offspring) from each egg capsule from the maternal egg capsule tissue for use in subsequent RNA extraction and sequencing. Thus, each “sample” in our analyses represents a pool of larvae from a single egg capsule.
SNP variants on reads aligned to the de novo transcriptome (Daniels et al., 2023) were identified using ANGSD (Korneliussen et al., 2014), a maximum likelihood approach to identify major and minor alleles (Skotte et al., 2012), the GATK genotyping algorithm (Van der Auwera & O’Connor, 2020), and a uniform prior. Bases were only retained if they had a Phred-scaled quality score of at least 20. Genotypes with a posterior probability of less than 95% and/or a depth less than 20 in an individual or less than 100 across all individuals were removed, and only loci genotyped in every individual were retained. Lastly, we removed any loci with a minor allele frequency of less than 0.05 in every population using the R package snpR/1.2.79 (Hemstrom & Jones, 2022). Because selection at putatively adaptive loci can often manifest as down-stream changes in expression, we also identified SNP variants that occurred on differentially expressed contigs between MON and NAP. Although genetic changes to expression are due to changes at nearby cis or distant trans regulatory elements, we identified SNPs within the DEGs to search for functional (structural) differences in the genes themselves. We called 122,599 SNPs directly from the RNA-seq data using ANGSD (Korneliussen et al., 2014), of which 4,168 were located on DEG contigs (see SI methods and Table S1 for filtering details). After SNP calling and filtering (Hemstrom et al., 2024), 94,654 SNP loci were identified throughout the transcriptome (hereafter “all-SNPs”) with 3,118 loci located on DEG contigs (hereafter “DEG-SNPs”).