VCF for neutral data set and potential connectivity matrices of Harpagifer antarcticus, along the Western Antarctic Peninsula
Data files
May 03, 2024 version files 16.29 MB
Abstract
Connectivity is a fundamental process of population dynamics in marine ecosystems. In the last decade, with the emergence of new methods, combining different approaches to understand the patterns of connectivity among populations and their regulation has become increasingly feasible. The Western Antarctic Peninsula (WAP) is characterized by complex oceanographic dynamics, where local conditions could act as barriers to population connectivity. Here, the notothenioid fish Harpagifer antarcticus, a demersal species with a complex life cycle (adults with poor swim capabilities and pelagic larvae), was used to assess connectivity along the WAP by combining biophysical modeling and population genomics methods. Both approaches showed congruent patterns. Areas of larvae retention and low potential connectivity, observed in the biophysical model output, coincide with four genetic groups within the WAP: (1) South Shetland Islands, (2) Bransfield Strait, (3) the central, and (4) the southern area of WAP (Marguerite Bay). These genetic groups exhibited limited gene flow between them, consistent with local oceanographic conditions, which would represent barriers to larval dispersal. The joint effect of geographic distance and larval dispersal by ocean currents, had a greater influence on the observed population structure, than each variable evaluated separately. The combined effect of geographic distance and a complex oceanographic dynamic would be generating limited levels of population connectivity in the fish H. antarcticusalong the WAP. Based on this population connectivity estimations, priority areas for conservation were discussed, considering the Marine Protected Area proposed for this threatened region of the Southern Ocean.
Methods
Biophysical model
An implementation of the Regional Ocean Modeling System (ROMS; Haidvogel et al. 2008) was used to simulate ocean circulation along the WAP. To estimate the potential role of the ocean as an advective mechanism, Lagrangian particle tracking simulations were implemented to determine the potential dispersal of H. antarcticus larvae. The implementation of ROMS for the WAP (Hudson et al., 2021) has 1.5 km horizontal resolution and 24 vertical sigma layers. It includes a dynamic sea ice model (Budgell, 2005) and the interaction between floating ice shelves and oceanographic conditions (Holland and Jenkins, 1999; Dinniman et al., 2011). Atmospheric forcing is from archived forecasts from the Antarctic Mesoscale Prediction System (Powers et al., 2012), tidal forcing was added at the model lateral boundaries using tidal sea surface height and velocity from the CATS2008 regional Antarctic tidal model (Padman et al., 2002).
The model was run from November to May of years 2008-2009 (the oldest available for which the hydrodynamic model was calibrated), 2009-2010, 2010-2011 and 2011-2012, during the austral summer, because this is the season when hatching of H. antarcticus larvae take place (White and Burren, 1992). Besides, La Mesa et al. (2017) described that larvae of H. antarcticus develop into a demersal juvenile stage before facing their first winter and taking this information into account, Lagrangian simulations were run only during the summer seasons. For each year in which the model was run, the procedure was the same. Each season, 1000 neutrally buoyant particles (100 per site) were released every 10 days, on 12 sites along the model domain (Fig. 1), starting from November 15 until February 15 (giving a total of 10 runs of the model for each sampling season). A total of 10,000 particles were released and tracked for 100 days, following the estimated pelagic larval duration for H. antarcticus (La Mesa et al., 2017). This means that any particle that entered a release grid cell of any site at day 100 was considered as a potential successful settled individual. Considering that there is limited knowledge about larval behavior for H. antarcticus, such as swimming capabilities or vertical migration patterns, the simulated larvae drifted as passive particles in the model. These particles were released at 0, 20, 40 and 60 m depth, and they were advected by the model circulation at every model time step (50 s) using the full 3D velocity fields (at the time and position of each particle) plus a random walk in the vertical direction, which is a function of the parameterized model vertical diffusion (Hunter et al., 1993; Visser, 1997). Particle positions were saved hourly. The vertical random walk was included for all particle releases, except for surface releases (Visser, 1997; van Sebille et al., 2018).
With the output of the Lagrangian simulations, for each year, three matrices were created. First, the mean trajectory matrix T, which provides information about the path traveled by the particles. The cell Tij shows the number of particles released from site i (origin, at x axis) that passed through (but not necessarily settled) in site j. It is a mean value of every day of the simulation to estimate the route the particles follow from their origin point during the 100 days of simulation. This matrix was made for each of the 10 runs of the model, then the 10 matrices were averaged to obtain the mean trajectory matrix. The second was a dispersal matrix D, considering all 12 release points. The cell Dij shows the number of particles that were released from site i (origin, at x axis) and potentially settled in site j (destination, at y axis) at day 100. The diagonal of this matrix (Dii) represents particles released from site i that return to site i (or their origin point). The D matrix was constructed for each of the 10 runs of the model, obtaining 10 matrices, which were averaged to construct the final dispersal matrix. One final dispersal matrix was constructed for each of the four seasons modeled. Then, these final dispersal matrices were transformed to connectivity matrices C (one per season), where Cijshows the proportion of total particles that potentially settled in site j that originated from site i. These C matrices showed the results of potential dispersal of H. antarcticus larvae from each year included in the model. Finally, the four C matrices were averaged to create the final connectivity matrix, which summarizes the dispersal patterns of all seasons modeled, this last connectivity matrix was used to disentangle the influence of larval dispersal on genetic structure.
A total of 12 release points were selected for the Lagrangian simulations, 9 of them corresponded to localities from which there were also available samples, used for genomic analyses: Fildes Bay (FIB), Chile Bay (CHB), Deception Island (DIS), Bransfield Strait (BST), Foyn Harbour (FHA), Green Reef (GRE), Doumer Island (DOI), Adelaide Island (AIS) and Horseshoe Island (HOS). Port Lockroy (PLO) was not included in the model due to its position, very close to other localities, which interfered with dispersal calculations. The Signy Island (SIG) is outside the model domain. The other 3 points included in the simulation: North King George Island (NKG), inshore and offshore Marguerite Bay (MIN and MOT, respectively) were selected as potential habitats for H. antarcticus and to test possible stepping-stone connectivity with the biophysical model. The T matrix provides information about the particle's trajectory during the 100 days of simulation. While the C matrix represents the final scenario at day 100, the potentially settled individuals after being transported to another location or retained at the origin.
SNP calling and filtering
For genomic analyses, 143 individuals of H. antarcticus were collected by hand, from intertidal zone (on rubble bottom habitats) and through SCUBA diving in shallow waters of 11 localities along the Western Antarctic Peninsula (WAP), between 60.72ºS 45.66ºW and 67.89ºS 67.40ºW (Fig. 1; Supplementary material, Table S1), during austral summer conditions from 2004 to 2021. All specimens were sacrificed following a bioethical protocol, prior preservation in 95% ethanol. DNA extractions were done using the DNeasy Blood® and Tissue Kit (QIAGEN®, USA). The quantity and integrity of DNA were measured using Qubit 4 (Thermo, USA). The Genotyping-by-Sequencing (GBS) method was performed, at the Biotechnology Center of the University of Wisconsin, using (after optimization) the ApeKI restriction enzyme. After enzyme digestion, each DNA fragment was linked to a barcode adaptor to recognize it in silico. Libraries were prepared using a HiSeq2000 (Illumina, USA) platform.
For quality checks, reads were visualized in FastQC 0.10.1. SNP-calling was carried out with the pipeline Universal Network-Enabled Analysis Kit (UNEAK) in Tassel v. 3 software (Lu et al., 2013). Using this pipeline, many of the challenges that arise working with de novo assembly can be handled. The general procedure of UNEAK pipeline involves a trim of the reads to 64 bps, avoiding the ends of the reads that could be enriched by sequencing errors. Then, identical 64-bp reads are collapsed into tags and tag pairs having a single base pair mismatch are candidates for SNP-calling. Also, the pipeline includes a filter where complex networks of tag pairs are identified, which would be discarded because they could represent mixture of repeats, paralogs and error tags (see Lu et al., 2013 for more details of UNEAK pipeline). The pipeline includes a parameter called the error tolerance rate (ETR) to improve the network filter, in this study the default value was used.
The SNPs were filtered in Tassel v. 5 software, using a site minimum call rate of 0.75 (to handle missing data per individual), a minimum proportion of sites present of 0.7 (to kept loci present in at least 70% of individuals) and a minor allele frequency of 0.01, as recommended by Benestan et al. (2016). After these filters, deviations from Hardy–Weinberg equilibrium (HWE) were tested, per locus and per population, using 10,000 permutations in Arlequin 3.5.2.2 (Excoffier and Lischer, 2010). The p-values were corrected using a false discovery rate correction (q-value = 0.05). Following the guidelines from Pearman et al. (2022) for RRS data filtering, we applied a modified 'out all' criterion in our study, excluding loci that departed from Hardy-Weinberg equilibrium (HWE) in more than half of the analyzed populations. This approach ensures the reliability of our inferences regarding population structure, avoiding sequencing artifacts, in alignment with recommended practices for population genetics in non-model organisms.
SNPs potentially under diversifying selection (hereafter outliers) were detected using population differentiation analyses. Two approaches were used. First, was the pcadapt v.4.3.3 (Luu et al., 2016) R package. This approach uses a Principal Component Analysis (PCA) to detect population structure, then each SNP is regressed at the principal components (PCs) retained. Here 10 PCs were retained based on their eigenvalues according to Cattell’s recommendation (Cattell, 1966). Then, a statistical test is applied to the PCA when regressing SNPs with the PCs and a cut-off of q-values = 0.05 was selected to assign the outliers. Because pcadapt does not require grouping individuals into populations, this package is not impacted by admixed individuals (Luu et al., 2016). The second was the software BayeScan 2.1, which implements an FST outlier approach to detect loci highly differentiated that exacerbate the genetic structure (Foll and Gaggiotti, 2008). BayeScan uses a Bayesian method to estimate the probability of each locus to be under the effect of selection. This method assumes that allele frequencies follow a Dirichlet distribution and uses locus-specific and population information from FST coefficients. Although BayeScan is impacted by the presence of admixed individuals (Luu et al., 2016), it does not assume equal differentiation between pairs of populations. BayeScan is robust when dealing with complex demographic scenarios for neutral genetic differentiation. To perform Bayescan run, 500,000 iterations, 10% burn-in period and a prior odd of 1,000 were used. To avoid the occurrence of false positives on outlier loci detection, using both pcadapt and BayeScan, a false discovery rate correction of q-values = 0.05 was used. The outliers detected by both approaches were eliminated (filtered) from the final dataset that aims to highlight demographic isolation and assess connectivity patterns. After filtering, a final data set of 20,778 non-outlier SNPs genotyped at 143 individuals was obtained.