Skip to main content

VCF for neutral data set in Harpagifer bispinis along the Magellan Province

Cite this dataset

Segovia, Nicolás (2022). VCF for neutral data set in Harpagifer bispinis along the Magellan Province [Dataset]. Dryad.


Quaternary glacial cycles shaped the current distribution of polar and cold-temperate biotas. In the Magellan province of South America, ice covering during the last glacial maximum radically altered the landscape/seascape, speciation rate, and the distribution of species. Here we studied nototheniid fishes Harpagifer spp. along the Magellan province reported as two nominal species: H. bispinis in Patagonia and H. palliolatus, endemic to the Falkland/Malvinas Islands. Previous molecular analyses in Harpagifer showed that the genus may have recently colonized southern South America ~ 1 million years ago. The extensive use in systematics of molecular markers to determine evolutionary units has been improved due to the advances in NGS. Combining traditional DNA sequences and non-targeted GBS-SNPs we evaluated both, the presence of effective evolutionary units and contemporary patterns of genetic structure across the Magellan province. DNA sequences consistently showed an absence of phylogeographic structure, with shared dominant haplotypes between nominal species, pointing towards the presence of a single evolutionary unit. In contrast, SNPs identified three groups in Patagonia, two located north and south of the Strait of Magellan, and a third well-differentiated one in the Falkland/Malvinas Islands. Connectivity analyses using SNPs suggest limited and asymmetric gene flow from Patagonia to the Falkland/Malvinas. Contrasting rough- and fine-scale genetic evolutionary patterns recorded in Harpagifer enhance the relevance in the use of combined methodologies for species delimitation analyses. Depending on the question to be addressed, we could discriminate among phylogeographic structure discarding incipient speciation, and contemporary spatial differentiation processes linked to drift-migration equilibrium models.


Populations of Harpagifer bispinis were collected from the intertidal of 12 localities along Pacific Patagonia between 48.73°S; 74.05°W to 55.84°S; 67.37°W, and H. palliolatus specimens were collected at Hookers Point (51.03°S; 57.7°W) in the Falklands/Malvinas Islands (Naretto et al., 2020). All specimens were preserved in 95% ethanol for molecular analyses. Nucleic acids for traditional DNA assays (mtDNA and nucDNA) were prepared using a salting out methodology (Aljanabi & Martinez 1997) while DNA extractions for Genotyping-By-Sequencing (GBS) (DeDonato, Peters, Mitchell, Hussain & Imumorin, 2013) were done using the DNeasy Blood® and Tissue Kit (QIAGEN®, USA). The quantity and integrity of DNA was measured using both Nanodrop 2 (Thermo, USA) and Qubit 4 (Thermo, USA).

Partial fragments of two mitochondrial and one nuclear gene were amplified through PCR procedures. We obtained a fragment of ~ 650 base pairs (bp) of the mitochondrial cytochrome c oxidase subunit I (COI) using specific primers FishF2 (Ward, Zemlak, Innes, Last & Herbert, 2005) and HCO2198 (Folmer, Black, Hoeh, Lutz & Vrijenhoek, 1994). Specific primers CR-HarpaF and CR-HarpaR (Hüne et al., 2015) were used to amplify a fragment of ~580 bp of the mitochondrial D-loop. Finally, primers RhF192 and RhR1039 (Dettai et al., 2012) were used to amplify a fragment of ~670 bp of the nuclear rhodopsin gene. PCR protocols were done following the specification of each marker (Table S2). Amplification products were purified and sequenced in both directions at Macrogen Inc. (Seoul, South Korea). Alignments were done independently for each marker in Geneious R10 ( The haplotype phases of the sequence data of rhodopsin were inferred using PHASE 2.1 implemented in DnaSP 6.0 (Rozas et al., 2017). For this, several independent runs were performed using 10,000 iterations, a thinning interval of 10 and a burn-in period of 10%. Final haplotypes were determined according to the overall goodness of fit and haplotypes with posterior probabilities ≥ 90%.

For Reduced Representation Sequencing (RRS), samples were sequenced through a GBS method at the Biotechnology Center in the University of Wisconsin using, after optimization, the ApeKI restriction enzyme. After enzyme digestion, each individual DNA fragment was linked to a unique barcode adaptor to recognize it in silico and libraries were prepared using a HiSeq2000 (Illumina, USA) platform. Millions of 100pb reads were generated and visualized in FastQC 0.10.1 (Andrew 2010) for quality checks. SNP-calling was carried out with the pipeline Universal Network-Enabled Analysis Kit (UNEAK) in TASSEL v.3 (Lu et al., 2013). This pipeline is a network-based SNP detection algorithm designed for data with no reference genome. After filtering, identical reads were aligned as tags using an error tolerance rate of 0.03 to reduce the chance that real tags be discarded as sequencing errors and to remove potential paralogs before the final SNP calling. We used a minor allele frequency of 0.05 and a site minimum call rate of 0.75 to ensure that the 75% of the individuals in each SNP were covered for at least 1 tag. Finally, we used a HWE filter in PLINK (Purcel et al., 2007) to extract those loci with significant evidence of deviations in at least 60% of the localities after a False Discovery Rate (FDR) correction.


Agencia Nacional de Investigación y Desarrollo, Award: 3190482