Rare, long-distance dispersal underpins genetic connectivity in the pink sea fan, Eunicella verrucosa
Data files
Feb 27, 2024 version files 2.61 GB
Abstract
Characterising patterns of genetic connectivity in marine species is of critical importance given the anthropogenic pressures placed on the marine environment. For sessile species, population connectivity can be shaped by many processes, such as pelagic larval duration, oceanographic boundaries, and currents. This study combines restriction-site associated DNA sequencing (RADseq) and passive particle dispersal modelling to delineate patterns of population connectivity in the pink sea fan, Eunicella verrucosa, a temperate octocoral. Individuals were sampled from 20 sites covering most of the species’ northeast Atlantic range, and a site in the northwest Mediterranean Sea to inform on connectivity across the Atlantic-Mediterranean transition. Using 7,510 neutral SNPs, a geographic cline of genetic clusters was detected, partitioning into: Ireland, Britain, France, Spain (Atlantic), and Portugal and Spain (Mediterranean). Evidence of significant inbreeding was detected at all sites, a finding not detected in a previous study of this species based on microsatellite loci. Genetic connectivity was characterised by an isolation by distance pattern (IBD) (r2 = 0.78, p<0.001), which persisted across the Mediterranean-Atlantic boundary. In contrast, exploration of ancestral population assignment using the program ADMIXTURE indicated genetic partitioning across the Bay of Biscay, which we suggest represents a natural break in the species’ range, possibly linked to a lack of suitable habitat. As the pelagic larval duration (PLD) is unknown, passive particle dispersal simulations were run for 14 and 21 days. For both modelled PLDs, inter-annual variations in particle trajectories suggested that in a long-lived, sessile species, range-wide IBD is driven by rare, longer dispersal events which act to maintain gene flow. These results suggest that oceanographic patterns may facilitate range-wide stepping-stone genetic connectivity in E. verrucosa, and highlight that both oceanography and natural breaks in a species’ range should be considered in the designation of ecologically coherent MPA networks.
README: Rare, long-distance dispersal underpins genetic connectivity in the pink sea fan, *Eunicella verrucosa*
https://doi.org/10.5061/dryad.xwdbrv1m9
This repository contains relevant raw and filtered data, along with annotated R scripts to run the analyses within this manuscript (or adapt to your own datasets of the same format).
Description of the data and file structure
Method Types and Data Generation
--Next RADseq derived SNP data obtained from rangewide samples of pink sea fan (Eunicella verrucosa) colonies.
--Passive particle tracking trajectories obtained using an Atlantic-Iberian Biscay Irish-Ocean Physics analysis and a Forecast oceanographic model from CMEMS in R.
--Raw ASCI and NC files of marine environmental data obtained via BioOracle (https://bio-oracle.org/) and Copernicus (https://data.marine.copernicus.eu/) needed for Redundancy Analysis.
--R scripts to run all major analyses within the manuscript to address questions of population structure, genetic diversity and differentiation, spatial and environmental selection.
File and Data Overview
--this README file
--Macleod_etal2024_EvoApps_Everrucosa_GeneticAnalysis.zip
--Macleod_etal2024_EvoApps_Everrucosa_ParticleDispersalModelling.zip
Each Macleod_*.zip folder contains all the necessary information and raw files to perform the analytical investigation of the Next RADseq derived SNP genotypes and particle tracking spatial data performed in the Macleod et al., 2024 manuscript, via:
The Macleod_etal2024_EvoApps_Everrucosa_GeneticAnalysis.zip folder contains the following:
- sf246_7650snps_* -- the raw SNP genotype data following initial filtering in Stacks v2.0 and VCF tools for missing loci/individuals, minor allele frequency, heterozygosity and linkage disequilibrium (see paper for filtering parameters). This file is saved as a VCF (Variant Call Format) format and a genind (R data format for analysis in R) so the data can be readily loaded in R.
- sf246_7510snps_* the filtered neutral SNP genotypes from the above file from performing outlier analysis (OutFLANK/pcadapt) via a HPC/R. This file is saved as a VCF (Variant Call Format) format and a genind (R data format for analysis in R) so the data can be readily loaded in R.
- 01_Population_structure_analysis folder -- this folder contains three annotated R scripts (DAPC.R, snapclust.R and ADMIXTURE_analysis.R) to perform ADMIXTURE analysis, DAPC and snapclust analysis using the neutral SNP file described above (sf246_7510snps_vcf.vcf). For ADMIXTURE analysis via a HPC, output S files for runs for K 1-9, the cross_validation.txt output file are provided. The input file (sf246_7510.bed) file to perform ADMIXTURE via a HPC is also included. Lastly, sf246_popmap_fullnames_plotting.txt provides individual names for plotting ADMIXTURE results in R.
- 02_Heterozygosity_Fst_differentiation folder -- this folder contains an R script (02_Het_Fst.R) to calculate Observed and Expected Heterozygosity statistics, Global and Pairwise *F*ST (Weir and Cockerham) and Inbreeding coefficients (*F*IS). This uses the sf246_7510snps_vcf.vcf neutral SNP file as input.
- 03_IBD_analysis folder -- this folder contains an annotated R script (03_IBD_analysis.R) to perform Isolation By Distance (IBD) analysis (Mantel test) using the *F*ST matric output from the R script in 02. It details how to produce least-cost marine distances needed to compute IBD (output file from this is provided - see lc_distances_km) and contains the file Sampling_Coordinates.txt needed to compute least-cost distances. Two separate folders (IBD_Tar_REM and IBD_TAR_Irl_REM) are given detailing the relevant FST matrices and least-cost distances to perform further IBD tests with Tarragona (Mediterranean Sea site) excluded and the Nortwest Ireland sites excluded, respectively.
- 04_Prep_spatial_data -- this folder contains an annotated R script (04_Prep_dbMEMs_AEMs.R) to perform spatial eigenfunction analysis to produce Distance-Based Moran's Eigenvectors Maps (dbMEMs) and Asymmetric Eigenvector Maps (AEMs). This script uses least-cost distances output file from the R script in 03 to produce the dbMEM file (dbMEM_InW_Jul22_MAForder). Connectivity matrices representing a 14-day and 21-day PLD are provided as input files to produce the AEMs file (AEM2010-2020_weight_21PLD.txt) this (details of how these were produced can be found in the Supplementary Material accompanying the paper and reproduced using the raw modelling output files in Macleod_etal2024_EvoApps_Everrucosa_ParticleDispersalModelling.zip folder).
- 05_Prep_Environmental_data -- this folder contains the annotated R script (05_Prep_environmental_data.R) to prepare the relevant environmental data contained within the Raw_data.zip file. This file contains the raw data for: sea surface/bottom temperature, salinity, chlorophyll, current velocity and topography downloaded from BioOracle (https://bio-oracle.org/) and Copernicus (https://data.marine.copernicus.eu/) and will be used in the Redundancy Analysis. Outputs from prepared environmental data R script (05_Prep_environmental_data.R) are provided in folder Processed_envir_data.zip. A custom_functions.R file is provided for preparation of environmental data.
- 06_RDA_analysis -- this folder contains an annotated R script (06_RDA_analysis.R) that uses outputs from 04 (dbMEM_InW_Jul22_MAForder and AEM2010-2020_weight_21PLD.txt) and 05 (EnvMatrix_celcius_MAF_order.txt) to perform Redundancy analysis on the sf246_7510snps.vcf file. For this, the SNP allele frequencies are calculated (output file is sf246_ComNeu.frq.strat and table version sf246_ComNeu_allelefreq.txt). Sampling_Coordinates.txt file containing reordered sites to match the allele frequency file is provided, alongside sf246_ID.txt file detailing individual names for RDA plotting.
The Macleod_etal2024_EvoApps_Everrucosa_ParticleDispersalModelling.zip folder contains the following:
- The folder Particle_Tracking_Simulations_Data.zip contains all raw particle tracking .gpkg (GeoPackage files) produced from the particle tracking model written in R. There is a file for each of the 20 sampling release sites across each individually modelled year 2010-2020 inclusive (see Supplementary Materials for full details on how the passive dispersal modelling was performed). There is an annotated R script (Particle_Tracking_Simulations_Data_analysis.R) detailing how to extract and plot particle trajectories and calculate distance metrics shown in Table 2 of Macleod et al., 2024.
Sharing/Access information
--Underlying RADseq reads are available at the European Nucleotide Archive (ENA) under the study accession PRJEB71722.
--All analyses were run in R v4.2.3, using R Studio. For version numbers used of individual R software packages, please see the Methods sections of the published manuscript. R and R packages are all open-source.
--For any queries regarding data analyses accompanying the published manuscript, please contact Kirsty Macleod - k.macleod@exeter.ac.uk
Methods
Pink sea fan tissue samples (n = 285) were collected via SCUBA (under sampling license L/2019/00143 granted by the Marine Management Organisation) from depths ranging between 5 and 35 m between 2007-2019. In brief, apical 5 cm cuttings were removed from each sea fan colony underwater and preserved in >95% ethanol immediately after surfacing to prevent DNA degradation. Samples were stored long-term at 4 oC. Genomic DNA was extracted from polyp tissue using a salting-out protocol optimised for gorgonin-protein tissue (see Supplementary Material associated with the manuscript for full protocol).
Sequencing and SNP isolation were performed through nextRAD, a reduced representation sequencing method. Genomic DNA was first fragmented with Nextera reagent (Illumina, Inc), which also ligates short adapter sequences to the ends of the fragments. The Nextera reaction was scaled for fragmenting 5 ng of genomic DNA, although 7.5 ng of genomic DNA was used for input to compensate for degraded DNA in the samples and to increase fragment sizes. Fragmented DNA was then amplified for 26 cycles at 73 oC, with one of the primers matching the adapter and extending 9 nucleotides into the genomic DNA with the selective sequence GTGTAGAGG. Thus, only fragments starting with a sequence that can be hybridized by the selective sequence of the primer will be efficiently amplified. The nextRAD libraries were sequenced on an Illumina HiSeq 4000 with one lane of 2x150 bp reads (University of Oregon)
Raw reads were cleaned, de-multiplexed and trimmed to 140 bp using the process_radtags pipeline in Stacks v2.54 (Rochette and Catchen, 2017) and aligned to the pink sea fan reference genome (Macleod et al., 2023) using Bwa-mem2 v2.2.1 (Vasimuddin et al., 2019). Loci were built using gstacks from Stacks with a minimum stack depth of 3 (Paris et al., 2017).
SNPs were filtered using the populations pipeline of Stacks was used to retain SNPs with a minimum stack depth of 9, present in at least 16 sampling locations, minor allele frequency of 0.05, and maximum loci heterozygosity of 0.6. Further filtering using VCFtools (Danecek et al., 2011) removed genotypes containing less than 90% of SNPs, which were identified with the missingno function in R (R Core Team, 2023) using the package Poppr v2.8.3 (Kamvar et al., 2014), and a depth coverage <9. Loci in linkage disequilibrium were filtered in VCFtools (Danecek et al., 2011) using a correlation (r2) threshold of >0.7 between all marker pairs. Finally, the function mlg from the R package Poppr was used to identify potential clones.
After initial filtering in populations, 23,111 SNPs were identified in 285 individuals. After filtering out individuals and loci with greater than 10% missing data and monomorphic SNPs in VCFtools, 8906 SNPs in 246 individuals were retained. Additionally, 1,256 SNPs were removed from each pair of SNPs identified in linkage disequilibrium. No duplicated genotypes were detected. The final dataset consists 7,650 SNPs in 246 individuals.
In brief, for all 20 sampling sites neutral SNPs were identified using outlier detection programs OutFLANK and pcadapt, only neutral SNPs common to both programs were retained. Genetic diversity was assessed via observed/expected heterozygosity and inbreeding coefficient (Fis). Genetic differentiation was computed using Pairwise FST (Weir and Cockerham) and global FST. Genetic population structure was explored using three approaches: a DAPC (discriminant analysis of principal components), ADMIXTURE v1.3, and snapclust.
To compare oceanographic particle dispersal with patterns of genetic connectivity, passive oceanographic modelling was conducted in R using an Atlantic-Iberian Biscay Irish-Ocean Physics analysis and a Forecast oceanographic model from CMEMS (see Supplementary Material associated with manuscipt for full model details). For each site, raw modelled particle trajectories are contained within a geopackage file (.gpkg).
Distance-based Moran's eigenvector maps (dbMEMs), asymmetric eigenvector maps (AEMs) were produced on marine least-cost distances and on the dispersal estimates between sampling sites, respectively. Environmental variables describing sea surface and bottom temperature, surface and bottom salinity, surface current velocity, were obtained from Copernicus (see Supplementary Material associated with manuscipt for full details). Redundancy analysis was performed to explore the effect of environmental variables, dbMEMs, and AEMs on modelled trajectories.