Data from: Contrasting effect of hybridization on genetic differentiation in three rockfish species with similar life history
Data files
Apr 17, 2025 version files 128.53 GB
-
brown_raw.tar.gz
17.07 GB
-
brown_rockfish_intraspecific_AEW.recode.vcf
13.02 MB
-
copper_raw.tar.gz
64.82 GB
-
copper_rockfish_intraspecific_AEW.recode.vcf
54.50 MB
-
copperquillbackbrown_rockfish_interspecific_AEW.recode.vcf
87.87 MB
-
Metadata.xlsx
29.58 KB
-
quillback_rockfish_intraspecific_AEW.recode.vcf
27.15 MB
-
quillback.tar.gz
46.46 GB
-
README.md
1.63 KB
Abstract
Hybridization can provide evolutionary benefits (e.g., population resilience to climate change) through the introduction of adaptive alleles and increase of genetic diversity. Nevertheless, management strategies may be designed based only on the parental species within a hybrid zone, without considering the hybrids. This can lead to ineffective spatial management of species, which can directly harm population diversity and negatively impact food webs. Three species of rockfish (Brown Rockfish (Sebastes caurinus), Copper Rockfish (S. auriculatus), and Quillback Rockfish (S. maliger)) are known to hybridize within Puget Sound, Washington, but genetic data from these species are used to infer population structure in the entire genus, including in species that do not hybridize. The goal of this project was to estimate the hybridization rates within the region and determine the effect of hybridization on geographic patterns of genetic structure. We sequenced 290 Brown, Copper, and Quillback rockfish using restriction‐site associated DNA sequencing (RADseq) from four regions within and outside Puget Sound, Washington. We show that (i) hybridization within Puget Sound was asymmetrical, not recent, widespread among individuals, and relatively low level within the genome, (ii) hybridization affected population structure in Copper and Brown rockfish, but not in Quillback Rockfish and (iii) after taking hybridization into account we found limited directional dispersal in Brown and Copper rockfish, and evidence for two isolated populations in Quillback Rockfish. Our results suggest that rockfish population structure is species‐specific, dependent on the extent of hybridization, and cannot be inferred from one species to another despite similar life history.
https://doi.org/10.5061/dryad.j0zpc86mv
This repository contains the raw fastq files (gzipped tarballs) and end product files (in VCF format) for a study on the inter- and intraspecific population structure between three species of rockfish common in Puget Sound, WA, using RADseq. We show that (i) hybridization within Puget Sound was asymmetrical, ancient, widespread, and relatively low level, (ii) hybridization affected population structure in Copper and Brown rockfish, but not in Quillback Rockfish and (iii) after taking hybridization into account, there were two populations in Quillback, one in Copper, and three populations in Brown Rockfish.
Description of the data and file structure
This repository contains:
- All final VCF files for the inter- and intraspecific analyses
- All VCF files were filtered for both high quality SNPs and samples (see methods for filtering details). Since the inter- and intraspecific analyses included slightly different filtering steps, the samples and individuals in the VCF files may not be the same.
- Raw sequence files for all samples used in this study (compressed into tarballs and separated by species)
- A metadata sheet
- includes location, length/weight, and DNA quality. Some metadata was not collected due to inconsistent sampling procedures and is labeled with 'Unknown'.
Code/Software
All relevant code is provided in this github repository.
Sampling procedure
We used 290 individuals from three species of rockfish (Brown, Copper, and Quillback) that were previously collected by the Washington Department of Fish and Wildlife (WDFW), the National Marine Fisheries Service’s Northwest Fisheries Science Center (NWFSC), and the Department of Fisheries and Oceans (DFO Canada) from 1999-2021 (Fig. 1, Table 1). Tissue was clipped from individual fins and either preserved in 95% ethanol or dried. Individuals were collected from four regions: 1) British Columbia (Canadian Salish Sea north of US/Canada border, BC), 2) northern Puget Sound (US Salish Sea north of Admiralty Inlet, south of the US/Canada border and east of the Victoria sill, NPS), 3) southern Puget Sound (Puget Sound proper, south of Admiralty Inlet, SPS), and 4) the US West Coast (US Pacific Coast west of Victoria sill, WC) (Fig. 1). We also re-sampled 62 individuals from a study on rockfish hybridization that were identified as ‘pure’ parental species, including 26 Brown, eight Copper, and 28 Quillback rockfish (Schwenke et al., 2018, detailed in Suppl. Table 1). Due to differences in the abundance and distribution of species across this geographic range, Brown Rockfish could only be sampled in three of these locations and the WC individuals were collected in Mexico and southern California.
DNA extraction, library preparation and sequencing
Genomic DNA was extracted using the Nexttec DNA isolation kit (Nexttec Incorporated, Middlebury, VT, USA) following the manufacturer’s protocol and quantified using a Qubit Fluorometer (ThermoFisher Scientific, Waltham, MA, USA). DNA concentration was normalized to 125 ng in 10 µL of molecular grade water. Restriction site-associated DNA sequencing (RADseq) libraries were prepared using a version of the Ali et al. (2016) protocol without the targeted bait capture step, referred to in the literature as BestRAD (https://github.com/merlab-uw/Protocols/blob/main/bestRAD). Briefly, genomic DNA was digested using the SbfI enzyme. An adapter (P1) containing a forward amplification primer site, an Illumina sequencing primer site, and an individual 6 bp barcode was ligated to each fragment at the restriction site end. Fragments were then randomly sheared using sonication and size-selected to 300-500 bp in length. Subsequently, P2 adapters were ligated to the reverse end and libraries were amplified by PCR. Each library was assessed for quality on a 1% agarose gel and a Bioanalyzer DNA 1000 kit (Agilent Technologies, Santa Clara, CA). Libraries were pooled in equimolar amounts and sequenced on a NovaSeq at the University of Oregon. Individual samples were sequenced paired end (300 cycles) on either a S4 or SP run type. Individuals were assigned to one of six RADseq libraries randomly to avoid any lane effect.
Initial filtering
Raw sequence data were quality checked using FastQC v0.11.9 (Andrews, 2010) and visualized in MultiQC (Ewels et al., 2016). Prior to SNP calling and genome alignment, the raw sequences were demultiplexed using process_radtags in Stacks v2.60 (Catchen et al., 2011; Rochette et al., 2019). Here, sequences were trimmed to 104 bases and filtered for quality. Individuals with fewer than 250,000 total reads were excluded from downstream analysis. Our paired-end sequences were then aligned to the Honeycomb Rockfish (S. umbrosus) genome from GenBank (NCBI Accession Number: PRJNA562243) with Bowtie 2 v2.4 using the ‘very-sensitive’ option (Langmead & Salzberg, 2012). At the time of data analysis for this study, the Honeycomb Rockfish genome was one of only two annotated full genomes and was chosen due to its closer phylogenetic relationship to our species (Hyde & Vetter, 2007). Following genome alignment, SNP calling and basic population genetics statistics were calculated using the gstacks (marukilow model) and populations modules from the Stacks pipeline. SNPs were called if they had a minimum mapping quality of 40. SNPs were filtered following published recommendations (O’Leary et al., 2018) requiring that loci meet the following criteria: minimum genotype depth ≥ 5, mean minimum read depth ≥ 15, genotype call rate ≥ 80%.
Interspecific analyses
For the interspecific analyses, the first SNP on each RADtag was chosen using the –write-single-snp option in populations. Only one SNP per RADtag was chosen to minimize the potential for highly linked loci. We did not select diagnostic SNPs (FST = 1) because those loci would likely primarily distinguish between Brown Rockfish and the other two species, which are more closely related to each other than to Brown Rockfish (Hyde et al. 2007). Additionally, we did not filter based on departure from Hardy-Weinberg proportions (HWE) as introgression may cause a Wahlund effect (a reduction of heterozygosity due to population structure within a sample) that would influence FIS and HWE p-values. To visualize interspecific population structure patterns and identify individuals with mixed ancestry, we used principal components analyses (PCA) and STRUCTURE analysis, the two most commonly used approaches to describe population structure (Liu et al., 2020). First, to identify evidence of recent hybridization, we conducted PCA using the R package adegenet v2.1.8 (Jombart, 2008) and visualized the results using ggplot2 v3.3.6 (Wickham, 2016). Second, we used STRUCTURE v2.3.4 to estimate admixture (Pritchard et al., 2000). Two replicates were run for K = 1-10 clusters with a burn-in period of 10,000 iterations and 100,000 MCMC repetitions. STRUCTURE was run without a priori population knowledge and using the admixture model. We identified the range of likely K groups with Structure Harvester (Earl & vonHoldt, 2012), using the ΔK statistic (Evanno et al., 2005) and using the mean L(K). Lastly, pairwise and overall FST values (Weir & Cockerham, 1984) were estimated between all four sampling locations with 1,000 bootstraps using the R package hierfstat v0.5-11 (Goudet, 2005). Values were considered statistically significant if the lower limit of the 95% bootstrap confidence interval did not overlap with zero.
Fastsimcoal (v2.8, Excoffier et al., 2021) was used to distinguish between patterns of hybridization and incomplete lineage sorting. Models with different admixture rates were compared, including (1) no gene flow, (2) unidirectional admixture (i.e., admixture from Quillback into Brown and Copper but not vice versa), and (3) bidirectional admixture (i.e., admixture between Quillback and Brown and Quillback and Copper) (Suppl. Fig. 1). For each model, fastsimcoal was run with 40 ECM optimization cycles and 150,000 coalescent simulations. This was repeated 100 times, with random starting parameter values. The best fit model was determined based on the lowest AIC value and highest likelihood. The distribution of the best likelihood values was calculated by running the model with the best parameter values 100 times. Mutation rate and generation time were taken from Kolora et al. (2021). The divergence time for the three species was taken from Hyde & Vetter (2007). SFS files were generated from the interspecific analysis, using easySFS (Gutenkunst et al., 2009). We only included individuals from SPS in our models, as previous studies suggest they are more likely to hybridize in that region (Schwenke et al., 2018).
Admixture f3 statistics were calculated on the empirical data to assess whether intermediate individuals were due to admixture (Patterson et al., 2012). The R package AdmixTools2 2.0.0 (Maier et al., 2023) was used to test whether a population (target) is the result of admixture between two other source populations (source X and source Y). Individuals with suspected admixture (target) were identified from STRUCTURE analysis as having ≤ 95% ancestry from one species. These individuals were analyzed against individuals with ≥ 95% ancestry from either species (source populations). A negative f3 value and Z score provided evidence of admixture in the target population between the two parental/source populations.
Intraspecific Analyses
For the intraspecific analyses, we retained one SNP on each RADtag with the highest minor allele frequency (MAF). Choosing the highest MAF SNP ensured that intraspecific datasets were different from the interspecific data so results could be compared and correlated to each other. Overall, 27%, 17%, and 32% of SNPs were shared between the interspecific data and the Brown, Copper, and Quillback intraspecific datasets, respectively. SNPs with genotype frequencies that deviated significantly from Hardy-Weinberg Equilibrium (HWE) expectations were removed using the following procedure: p values were calculated across individuals for each population using an exact test based on Monte Carlo permutations of alleles within the R package pegas v1.1 (Paradis, 2010). P values were then combined for each locus using Fisher’s combination of probabilities and adjusted to q values for the false discovery rate (Benjamini & Hochberg, 1995). Loci with q values below 0.05 were considered significantly out of HWE and removed from downstream analysis. For both the inter- and intraspecific datasets, genetic relatedness was calculated using an identity by descent estimate on all pairs of individuals within PLINK v1.07 (Purcell et al., 2007) to evaluate if our dataset included highly related individuals. FIS was also calculated using vcftools v0.1.16 (Danecek et al., 2011). Final datasets were converted from vcf to other formats (GenePop and STRUCTURE) using PGDSpider v2.1.1.5 (Lischer & Excoffier, 2012).
We analyzed intraspecific population structure using the above three methods (PCA, FST, and STRUCTURE). Additionally, we compared results from the STRUCTURE and PCA analyses with percent admixture ancestry to determine if there was a correlation between admixed ancestry and intraspecific clustering in both analyses. In the STRUCTURE analysis, we correlated percent admixture ancestry with each of the genetic groups in the intraspecific analysis. In the PCA, we correlated percent admixture ancestry with PC1. Finally, to rule out the effect of chromosomal inversions on population structure, we estimated linkage disequilibrium (r2) within each chromosome using PLINK v1.07 (Purcell et al., 2007). R2 values were then mapped on each chromosome to identify blocks of highly linked loci using the R function LDheatmap v1.0-6 (Shin et al., 2006). Chromosomes with loci in strong LD (r2 > 0.5) over extended blocks (distance greater than 1 Mb) were analyzed separately using PCAs in adegenet v2.1.8 (Jombart, 2008) to determine whether individuals clustered in the three-stripe patterns consistent with chromosomal inversions.
- Wray, Anita; Petrou, Eleni; Nichols, Krista M. et al. (2024). Contrasting effect of hybridization on genetic differentiation in three rockfish species with similar life history. Evolutionary Applications. https://doi.org/10.1111/eva.13749
