Single nucleotide polymorphism (SNPs) data for Scurria scurra, Scurria variabilis, Scurria ceciliana and Scurria araucana
Data files
Apr 15, 2024 version files 543 MB
-
metadata_scurrias.xlsx
214.53 KB
-
README.md
2.21 KB
-
Saraucana-All.vcf
87.71 MB
-
Saraucana-NoOuts.vcf
87.60 MB
-
Saraucana-Outs.vcf
112.92 KB
-
Sceciliana-All.vcf
35.02 MB
-
Sceciliana-NoOuts.vcf
35 MB
-
Sceciliana-Outs.vcf
27.44 KB
-
Sscurra-All.vcf
103.40 MB
-
Sscurra-NoOuts.vcf
103.27 MB
-
Sscurra-Outs.vcf
130.37 KB
-
Svariabilis-All.vcf
45.25 MB
-
Svariabilis-NoOuts.vcf
45.23 MB
-
Svariabilis-Outs.vcf
23.09 KB
Aug 04, 2025 version files 542.83 MB
-
metadata_scurrias.csv
41.06 KB
-
README.md
2.74 KB
-
Saraucana-All.vcf
87.71 MB
-
Saraucana-NoOuts.vcf
87.60 MB
-
Saraucana-Outs.vcf
112.92 KB
-
Sceciliana-All.vcf
35.02 MB
-
Sceciliana-NoOuts.vcf
35 MB
-
Sceciliana-Outs.vcf
27.44 KB
-
Sscurra-All.vcf
103.40 MB
-
Sscurra-NoOuts.vcf
103.27 MB
-
Sscurra-Outs.vcf
130.37 KB
-
Svariabilis-All.vcf
45.25 MB
-
Svariabilis-NoOuts.vcf
45.23 MB
-
Svariabilis-Outs.vcf
23.09 KB
Abstract
The distribution of genetic diversity is often heterogeneous in space, and it usually correlates with environmental transitions or historical processes that affect demography. The coast of Chile encompasses two biogeographic provinces and spans a broad environmental gradient, together with oceanographic processes linked to coastal topography that can affect species' genetic diversity. Here, we evaluated the genetic connectivity and historical demography of four Scurria limpets, S. scurra, S. variabilis, S. ceciliana, and S. araucana, between ca. 19° S and 53° S in the Chilean coast using genome-wide SNP markers. Genetic structure varied among species, which was evidenced by species-specific breaks together with two shared breaks. One of the shared breaks was located at 22–25° S and was observed in S. araucana and S. variabilis, while the second break around 31–34° S was shared by three Scurria species. Interestingly, the identified genetic breaks are also shared with other low-dispersing invertebrates. Demographic histories show bottlenecks in S. scurra and S. araucana populations and recent population expansion in all species. The shared genetic breaks can be linked to oceanographic features acting as soft barriers to dispersal and also to historical climate, evidencing the utility of comparing multiple and sympatric species to understand the influence of a particular seascape on genetic diversity.
VCF files used in the analyses of the article "Comparative population genetics of congeneric limpets across a biogeographic transition zone reveals common patterns of genetic structure and demographic history", published in Molecular Ecology https://doi.org/10.1111/mec.16978
Description of the data and file structure
This dataset contains VCF files for the four species of limpets analyzed, Scurria scurra, Scurria variabilis, Scurria ceciliana and Scurria araucana.
VCF files containing all loci: Saraucana-All.vcf, Sceciliana-All.vcf, Sscurra-All.vcf, Svariabilis-All.vcf
VCF files containing only outlier loci: Saraucana-Outs.vcf, Sceciliana-Outs.vcf, Sscurra-Outs.vcf, Svariabilis-Outs.vcf
VCF files containing all loci except the outliers : Saraucana-NoOuts.vcf, Sceciliana-NoOuts.vcf, Sscurra-NoOuts.vcf, Svariabilis-NoOuts.vcf
Note that for the demographic history analyses, only a subset of the "NoOuts.vcf" files were used since only few sites were considered (check article Materials and Methods section for more information).
In the metadata file, sample information concerning species identity, sampling location and accession number for raw reads are given.
METADATA
File: metadata_scurrias.csv
sample : refer to the sample name used in the vcf file for each individual of each species.
morfoID : this consists of the morphological species assignment given in the field while sampling.
geneticID : this consists of the genetic species assignment given after the phylogenetic analyses made with samples of many Scurria. A paper showing the phylogeny with details is currently under preparation.
dateSampling : the year-month each speciments were sampled.
siteAbbr : the abbreviation of sites names where samples were collected in Chile. This abbreviation is also included in the sample names.
siteNames: the site names where samples were collected in Chile.
coordinates: coordinates of sites where sampling was made, given in degrees, minutes and seconds (WGS84).
ncbiRaw: accession number for samples raw reads deposited in NCBI database.
Sharing/Access information
Raw sequence reads for each sample can be retrieved in NCBI database under the BioProject number PRJNA944965.
Accession number for the raw sequence reads for each sample is given in the metadata-scurrias.xlsx file
Versioned Changes: The previous file was .xlsx and each tab had the information from one species, there was a message in the submission section saying it could not be read. So, I saved it in .csv and made it a long list with all species.
These data were retrieved by restriction-site associated DNA sequencing of muscle (foot) tissue using the enzyme pstI.
Raw reads were demultiplexed using Stacks. Scurria scurra, Scurria cecilian, and Scurria araucana RAD loci were mapped to the reference genome of Scurria scurria (Saenz-Agudelo P., unpublished) using the ref_map.pl pipeline of Stacks, while for Scurria variabilis loci were called de novo using the denovo_map. Pipeline of Stacks.
VCF files were generated in populations from Stacks, keeping only 80% as the minimum percentage of all individuals to process a locus. Further filtering steps were made in VCFtools, namely mean minimum read depth per locus (15), maximum mean depth per site (62-73, depending on the species, es), and a final genotype call rate of 90%. No individuals with more than 20% missing data were kept. The final filter consisted of removing loci with evidence of linkage disequilibrium, which was estimated using the function snpgdsLDpruning from the R package SNPRelate.
For detecting putatively outlier loci, BayeScan and pcadapt were used, and the common loci detected by both analyses were removed from the "NoOuts" datasets, and only these outlier loci were kept in the "Outs" datasets.
To open VCF files, any program that read such files can be used, such as VCFtools, snpR package in R, etc.