Data from: Seascape genomics of red abalone: Limited range-wide population structure and evidence for local adaptation
Data files
Jan 29, 2025 version files 1.79 GB
Abstract
Characterizing patterns of genetic diversity including evidence of local adaptation is relevant for predicting and managing species recovering from over-exploitation in the face of climate change. Red abalone (Haliotis rufescens) is a species of conservation concern due to recent declines from over-harvesting, disease, and climate change, resulting in the closure of commercial and recreational fisheries. We hypothesized that the environmental mosaic that defines nearshore habitats in the California current ecosystem, including variable pH and temperature, has enriched some regions for locally adapted genotypes that may be important for species persistence in changing environments. Using whole genome re-sequencing data from 23 populations spanning their entire range (southern Oregon, USA, to Baja California, MEX) we investigated patterns of population connectivity and local adaptation. We discovered high genetic diversity that is shared within and among populations, suggesting high historical range-wide gene flow. Using multiple layers of environmental metadata, we tested for genotype-environment associations that would reveal local adaptation across the mosaic of coastal environments that define the California Current ecosystem. We found little evidence for large selective sweeps between populations that occupy local habitats that vary by pH, strength of upwelling, chlorophyll, salinity, and sea surface temperature. This is consistent with a broad range of species with similar life histories that show limited neutral or adaptive genetic variation across the same region and the same environments, suggesting that the mosaic of environmental variation across the CCS is insufficient to drive local adaptation in the face of high gene flow. Given the high genetic connectivity across their range, state-mandated regulatory actions would be most effective if aligned across jurisdictional boundaries (i.e., Mexico, California, and Oregon).
README: VCF file of seascape genomics of red abalone
https://doi.org/10.5061/dryad.z612jm6m8
In this study, we used whole genome re-sequencing data from 23 populations spanning the entire range of red abalone (southern Oregon, USA, to Baja California, MEX) to reveal patterns of population connectivity and signatures of natural selection. We generated a VCF file from all 264 samples sequenced at a depth of ~10x using snpArcher (https://snparcher.readthedocs.io/en/latest/index.html). This pipeline employs a Snakemake workflow that performs quality control, read mapping, and variant calling. We mapped reads to the Haliotis rufescens reference genome assembly (GCA_023055435.1; Griffiths et al., 2022) and called SNPs using GATK in the snpArcher Pipeline. The data files included in this dataset are as follows:
NoSibs.NoLowDP.filtered.vcf.missing75maf5.min10.maxDP40.plink.LDfiltered_0.8.vcf
NoSibs.NoLowDP.filtered.vcf.missing75maf5.min10.maxDP40.contigremoved3.plink.LDfiltered_0.8.vcf
Description of the data and file structure
The initial VCF file contained 43 million biallelic SNPs. We conducted all filtering steps using VCFtools (Danecek et al., 2011) and bcftools (Li & Durbin, 2009). We first identified related individuals using the King relatedness matrix in plink v1.90 (Purcell et al., 2007). We then removed 8 individuals which had a relatedness >0.125 indicating second degree relatedness (i.e., half-siblings or cousins). Next, we removed 8 individuals with poor sequencing depth (< 3x) since the greatest amount of variance explained in the PCA was by sequencing depth. Of these samples, 7 of them were from Monterey, which were the only samples where DNA had already been extracted for a previous project (De Wit & Palumbi, 2013). Next, we retained variants that had been successfully genotyped in 75 % of individuals, with a minimum depth of 10, maximum depth of 40, minimum quality of 30, and minor allele frequency of 5 %. We also removed variants with significant deviations from Hardy-Weinberg Expectation (p < 0.00001) using the INFO field generated by GATK's HaplotypeCaller. Finally, we removed variants that were in linkage disequilibrium (LD) above a correlation coefficient of 0.8 using plink v1.90. This filtered dataset retained a total of 236 individuals and 718,179 SNPs. We discovered genomic clustering of two geographically mixed groups that were sorting along PC axis 2 (Fig. 1A; see manuscript). We hypothesized that individuals may be sorting by genetic sex, which follows a heterogametic system (XX and XY) in the Haliotis *genus (Luo et al., 2021; Weng et al., 2022). To ensure the sex-determining region was not obscuring discovery of population structure and local adaptation, we identified SNPs associated with PC axis 2 (see methods on *Sex-Determining Region) and removed these SNPs from our final dataset. After removal of the putative sex-determining region, our final SNP dataset contained 881,814 SNPS. Our LD-thinned dataset was used for population structure and genetic analyses, while the non-LD-thinned SNP dataset was used for selection analyses (outlier and genome-environment associations) which contained 710,420 SNPs.
Sharing/Access information
Access to raw sequencing data:
- DNA sequencing data are deposited in the National Center for Biotechnology Information's Short Reads Archive (BioProject PRJNA867688). All bioinformatics scripts and pipelines are available on GitHub (https://github.com/JoannaGriffiths/CCGP_Red_Abalone).
Code/Software
For scripts and pipelines for generating this VCF from the raw sequencing data and for any downstream analyses performed with the VCF, see https://github.com/JoannaGriffiths/CCGP_Red_Abalone