Bridging the Scotia Arc: Climate-driven shifts in connectivity of the freshwater crustacean Branchinecta gaini in sub-Antarctic and Antarctic ecosystems
Data files
Dec 02, 2025 version files 7.18 MB
-
Bgaini_16S_101Seq.nex
43.11 KB
-
Bgaini_cox1_294seq.nex
203.74 KB
-
Bran_186_8912_2plates_mC75_mAF001.vcf
6.92 MB
-
OccurrenceBgaini_ENM.xlsx
12.14 KB
-
README.md
3.21 KB
Abstract
The datasets presented here correspond to the three methodologies combined. First, it corresponds to obtaining mitochondrial DNA sequences (cox1 and 16S). Second, the SNPs obtained on the basis of the filtering described in the methodology and finally the occurrences used for the niche models. All these data correspond to the species Branchinecta gaini, an Anostraco throughout its distribution in the southern part of South America, South Georgia and maritime Antarctica.
Dataset Overview
This dataset contains the data and code required to replicate analyses in Branchinecta gaini (in review), testing the hypothesis of range shift due to climate change using historical and contemporary genetic structure and niche modeling. Data cover 20 sampling lakes and freshwater ponds within Southern South America, the subantarctic island of South Georgia, and maritime Antarctic islands and archipelagos. Genetic data include a dataset of single-nucleotide polymorphisms (SNPs) and two loci of mitochondrial DNA.
Dates of Data Collection
A. Branchinecta sampling: 2017 - 2022
B. Branchinecta DNA extraction: 2021-2023
C. Genetic sequencing data: 2021-2024
Ethics Approval
Sampling protocol and permission of collected material in Antarctic regions were conducted under INACH environmental regulations. Collections activities outside Antarctica territory were performed following the Chilean Ministerio de Economía, Fomento y Turimo regulations under the permit numbers Nº E-2021-261, granted by the Subsecretary of Fishery and Aquaculture.
Files and Folders
Bgaini_16S_101Seq.nex
This file contains 101 sequences of 16S rRNA in nex format to conduct different genetic diversity indices.
Bgaini_cox1_294seq.nex
This file contains 294 sequences of cox1 of Branchinecta gaini from South America, Subantarctic islands, and Maritime Antarctica. The file is in the nex format to conduct different genetic diversity indices, DiYABC analyses, demographic inferences, and phylogenetic reconstructions.
Bran_186_8912_2plates_mC75_mAF001.vcf
This Variant Call Format (VCF) file contains Single Nucleotide Polymorphism (SNP) data from 186 individuals across subantarctic and Antarctic sampling locations. It comprises 8912 SNPs obtained from SNP calling in Tassel.
Supplementary_material_TableS1.xlsx
List of Branchinecta gaini records detailing the type of analysis for which it was used, following FAIR criteria. Population identifier (IDpop): Six biogeographic regions are presented (1) SSA i) ARG: Argentinean Patagonia, ii) MSE: Magellanic Subantarctic Ecoregion, (2) SG: South Georgia, (3) SOI: South Orkney Islands, (4) SSI: South Shetland Islands, (5) EAP: East Antarctic Peninsula, (6) WAP: West Antarctic Peninsula. ENM: Ecological Niche Modelling. Three biogeographic provinces are presented: (1) South America, (2) Subantarctic island, (3) maritime Antarctica.
OccurrenceBgaini_ENM.xlsx
List of Branchinecta gaini records used in the ENM analysis containing ID, decimalLatitude, and decimalLongitude data only.
Supplementary_material_S3.docx
Molecular protocol for cox1 and 16S PCR. Details of the mix preparation, thermal cycling parameters, and primer sequences.
Supplementary_material_S5.reftableHeader
This file contains the values of the different scenarios and prior distribution parameters employed in DIYABC v2.1.
SupplementaryMaterialBgaini.docx
Supplementary tables and figures are listed throughout the paper (S2, S4, S6, S7, S8, S9, S10 & S11).
Environmental data
We extracted 19 bioclimatic variables and the bioscd variable from the CHELSA database (www.chelsa-climate.org), with a spatial resolution of approximately 30 arc-seconds (~1 km) (Karger, et al. 2017). These variables were clipped to the study area which, in the case of Antarctica, corresponds to ice-free areas. The current ice-free layer was defined using the rock_outcrop_high_res_polygon (Burton-Johnson, et al. 2016) from the Antarctic Digital Database (ADD version 7; http://www.add.scar.org). For future projections, we used ice-free area data provided by Lee, et al. (2017). To assess multicollinearity among predictor variables, we calculated Pearson correlation coefficients and excluded variables with r-values > 0.75 to ensure the independence of the predictors. The final set of variables included: BIO1 (mean annual air temperature); BIO5 (mean daily maximum air temperature of the warmest month); BIO6 (mean daily minimum air temperature of the coldest month); BIO12 (annual precipitation); BIO15 (precipitation seasonality); and BIO SCD (snow cover days).
DNA sequences, SNP calling and filtering
To address historical and contemporary connectivity across the distribution of B.gaini, we conducted analyses using two datasets. For the historical perspective, we used partial fragments of two mitochondrial genes (cox1 and 16S rRNA), and for more recent genetic connectivity, we used Reduce Representation Sequencing (RRS) technique to produce Single Nucleotide Polymorphisms (SNP). Mitochondrial genes were amplified using PCR (electronic supplementary material, table S2). Amplicons were purified and sequenced in both directions by Macrogen Inc. (Santiago, Chile). Alignments were obtained using Geneious R10 (https://www.geneious.com). For RSS, samples were sequenced through a genotyping-by-sequencing (GBS) method at the Biotechnology Center in the University of Wisconsin using, after optimization, the PstI/MspI restriction enzymes. After enzyme digestion, each DNA fragment was linked to a barcode adaptor to recognize it in silico and libraries were prepared using a HiSeq2000 (Illumina, USA) platform. Reads were visualized in FastQC 0.10.1 for quality checking. SNP-calling was carried out with the pipeline Universal Network-Enabled Analysis Kit (UNEAK) in Tassel v. 3 (Lu, et al. 2013). We used a minor allele frequency of 0.05, a minimum proportion of sites present of 0.7, and a site minimum call rate of 0.75, to ensure that at least 75% of the individuals in each SNP were covered for at least one tag. After filtering, we estimated Hardy–Weinberg equilibrium (HWE) deviations per locus and per population with Arlequin 3.5.2.2 (Excoffier and Lischer 2010) using 10 000 permutations. p-values were corrected with a false discovery rate (FDR) correction (q-value = 0.05), and SNPs that appeared in HW disequilibrium in at least 60% of the populations were removed from the dataset. This approach ensures the reliability of our inferences regarding population structure, avoiding sequencing artefacts, in alignment with recommended practices for population genetics in non-model organisms (Pearman, et al. 2022).
To retain only neutral markers, we used two population differentiation analyses to identify SNPs potentially under diversifying selection. These SNPs were eliminated from the final dataset. The first analysis was performed using the pcadapt v.4.3.3 (Luu, et al. 2017) R package. This approach uses a Principal Component Analysis (PCA) to detect population structure. Then, each SNP is regressed at the principal components (PCs) retained. Here, 10 PCs were retained based on their eigenvalues following (Cattell 1966). We applied a statistical test to the PCA when regressing SNPs with the PCs and a cut-off of q-value = 0.05 was selected to assign the outliers. This analysis is not impacted by admixed individuals, because pcadapt does not require grouping individuals into populations (Luu, et al. 2017). Second, we used an FST outlier approach implemented in Bayescan 2.1 (Foll and Gaggiotti 2008), which uses a Bayesian method to estimate the probability of each locus being under the influence of selection. Considering that such loci tend to be highly differentiated and exacerbate the genetic structure, those identified were not considered for analyses. A total of five separate runs were performed with 500,000 iterations, 10% burn-in period and a prior odds of 1000 were used. To avoid the occurrence of false positives using both approaches, a FDR correction of q-value = 0.05 was applied. Outliers detected by either or both approaches were eliminated (filtered) from the final dataset, with the aim of highlighting demographic isolation and assessing connectivity patterns. After filtering, a final dataset of 7446 non-outlier SNPs genotyped from 186 individuals was obtained.
