Skip to main content
Dryad

Podarcis bocagei vs P. carbonelli hybrid zone SNP datasets from ddRADseq

Abstract

We used double digestion restriction site associated DNA (ddRAD) sequencing to discover SNPs in samples across a transect including a hybrid zone between Podarcis carbonelli and Podarcis carbonelli. We used P. bocagei and P. carbonelli samples from the locations at the extremes of the transect as references. We obtained a SNP dataset including all SNPs after removing loci with depth coverage <8, missing data >20%, removing loci containing more than five SNPs, and with more than 70% heterozygosity (complete dataset; 6905 SNPs, 329 individuals). Additionally, we obtained from the complete dataset two other datasets, prior to apply a missing data filter. One dataset contained loci with allele frequencies higher than 0.8 in the reference population containing only parental individuals of one species and lower than 0.2 in the reference population of the other species ("80/20" dataset; 2300 SNPs, 329 individuals); the other dataset comprised diagnostic SNPs between reference populations (diagnostic dataset; 1241 SNPs, 236 individuals) but excluding private alleles from references, i.e. excluding alleles that are not present in the populations of contact. Individuals with missing data >35% were removed from all datasets (the number of individuals reported for each dataset is after applying this filter, but note that the 80/20 and the diagnostic datasets were obtained before applying this filter to the complete dataset). Across datasets, average depth of coverage by individuals was 28 (median = 26.8, min = 12.5, max = 85.8) and by loci was 29 (median = 28.8; min = 15.6; max = 48.6). The analysis of replicate samples (four samples were replicated twice, i.e. were amplified and sequenced in independent libraries and SNP calling was performed independently) showed high levels (99.87%) of multilocus genotype replicability.