SNP datasets obtained with ddRADseq from four contact zones between Podarcis carbonelli and four other Podarcis species
Caeiro-Dias, Guilherme et al. (2020), SNP datasets obtained with ddRADseq from four contact zones between Podarcis carbonelli and four other Podarcis species, Dryad, Dataset, https://doi.org/10.5061/dryad.k0p2ngf51
We used double digestion restriction site associated DNA (ddRAD) sequencing to discover SNPs in samples from four contact zones between Podarcis carbonelli and four other Podarcis species. We obtained a panel of SNPs for each for each contact zone and reference populations and a dataset of diagnostic SNPs between reference populations for each contact zone but excluding private alleles from references, i.e. excluding alleles that are not present in the populations of contact. The final datasets (complete and diagnostic) were obtained after removing loci with depth coverage <8, missing data >20% and removing individuals with more than 35% of missing data. Across complete and diagnostic datasets, mean coverage by individuals ranged from 28 to 47 and by loci from 28 to 44. The analysis of replicate samples (about 6% of samples were replicated, i.e. were amplified and sequenced in independent libraries and SNP calling was performed independently) showed high levels (>99%) of multilocus genotype replicability.
Samples were collected between spring and autumn of 2013 in four contact zones between P. carbonelli and one of four other Podarcis species (P. bocagei, P. virescens, P. guadarramae lusitanicus and Podarcis vaucheri). We colected 115 specimens in the contact zone between P. bocagei and P. carbonelli, 61 between P. virescens and P. carbonelli, 77 between P. guadarramae lusitanicus and P. carbonelli and 69 between P. vaucheri and P. carbonelli. In all contact zones the sampling scheme aimed at capturing all the individuals encountered, avoiding bias towards species, sex or age. We used 19 P. bocagei, 7 P. virescens, 18 P. g. lusitanicus, 9 P. vaucheri and 33 P. carbonelli samples from nearby populations outside the contact zones as reference. We obtained SNP datasets from ddRAD sequencing by preparing two libraries (both include more samples thatn the samples described here) with following the same protocol and sequenced on both Illumina® HiSeq 2000 and HiSeq 1500. Individual raw reads were demultiplexed using the process_radtags module of Stacks version 2.2 (Catchen et al., 2013). SNP calling was performed using the Stacks pipeline, following Rochette and Catchen (2017) recomandations, by running consecutivelly ustacks (build loci), cstacks (create a catalogue of loci), sstacks (match individual samples against the catalogue), tsv2bam (transpose data) and gstacks (align each read to a locus and call SNPs) units. SNP filtering was done with populations unit from Stacks, VCFtools 0.1.15 (Danecek et al., 2011) and a custom Python script (available at https://github.com/catpinho/filter_RADseq_data).