Population genetics for conservation of spadefoot toads, Pelobates fuscus, in Western and Central Europe
Data files
Jul 23, 2025 version files 41.35 KB
Abstract
Despite extensive legislation aimed at conservation, European amphibians such as the common spadefoot toad (Pelobates fuscus (Laurenti, 1768)) exhibit alarming declines across large parts of its range. As knowledge about the genetic background of a species has become crucial for species conservation assessment, this study used newly developed microsatellite loci to investigate the genetic diversity and structure of P. fuscus. Clustering revealed major groups that reflect postglacial colonization patterns. Additionally, we found contrasting patterns of genetic diversity across the 66 studied populations, reflecting the conservation status of the species in Europe and with marked genetic impoverishment across most populations in the northwestern part of the species’ distribution. Our findings provide genetic information on the conservation genetic status of the common spadefoot toad across Europe, emphasizing the need for urgent conservation efforts. We recommend further genetic monitoring, habitat restoration, and potential translocations to safeguard the species in the face of ongoing challenges.
https://doi.org/10.5061/dryad.gmsbcc2wq
How was the data retrieved: In this study we used P. fuscus samples from 66 populations across the species’ range, with 18 of these populations previously analysed as part of an academic qualification (20,21)(Figure 1; Figure S1; Table S1). The majority of DNA samples were collected from adult specimens between 1986 and 2020, primarily through tissue samples from deceased individuals (such as toe clips, liver tissue, or tail clips from tadpoles) and buccal swabs from live adults.
DNA was extracted using the DNeasy Tissue Kit (Qiagen) with or without the BioSprint robotic work station with the 96 DNA Plant Kit (Qiagen), according to the manufacturer's protocols. For this study, both newly sampled samples, as well as DNA extract of Pelobates fuscus used in previously published and unpublished studies, were analysed. For more information about the methodology and methods, we would like to refer to the manuscript.
Description of the data and file structure
This genetic dataset is compiled into a ready-to-use CSV file (Moutonetal_Microsatellite_data_European_common_spadefoot_toad.csv) containing the filtered dataset analysed within this study. The file uses GenAlEx (Peakall R, Smouse PE., 2012) formatting and includes the genetic data of 334 European common spadefoot toads (ind) across 66 European populations (Pop), based on a microsatellite marker set of 12 loci (Pf10635s, Pf14239s, Pf16076s, Pf9847s, Pf5754s, Pf9791sB, Pf3505s, Pf4451s, Pf8461s, Pf6194, Pf981 and Pf4582). The second, corresponding allele column header is left blank.
Other valuable information (e.g., PCR conditions and locus sequences) can be found in the Supplementary information. This supplementary material is available through doi:10.6084/m9.figshare.c.7921368 and contains the names of the sample providers, geographic coordinates of the sampling locations, an overview of the used microsatellite loci, results of a linear regression analysis, detailed results of the pairwise value calculations & corresponding figures, individual based PCA-plot & PCA-plot legend, supplementary Bayesian clustering results, results of the BOTTLENECK analysis and a map from the Habitats Directive displaying the status of common spadefoot toads in Europe.
Study area and sampling methods
In this study, we used P. fuscus samples from 66 populations across the species’ range, with 18 of these populations previously analysed as part of an academic qualification (20,21)(Figure 1; Figure S1; Table S1). The majority of DNA samples were collected from adult specimens between 1986 and 2020, primarily through tissue samples from deceased individuals (such as toe clips, liver tissue, or tail clips from tadpoles) and buccal swabs from live adults. In addition, a smaller number of samples were obtained from freshly hatched tadpoles or embryos from clutches. For the latter, the possible introduced bias in allele frequencies by sampling larvae (22) was addressed by removing full siblings from the dataset (see also further below). To account for the fact that data were gathered over a wide geographical range and over a long period of time, we carried out a linear regression analysis (both a full model and one based on AIC) to analyse whether year of sampling and geographical identity had significant effects on the final results. Tissue samples were preserved in 100% ethanol at -20°C, while cotton swabs were stored at -20°C within a few hours or days after collection. All parties involved possessed adequate sampling permits that conform to national or regional law.
DNA-extraction, DNA-amplification, and Microsatellite analysis
DNA from tissue samples was extracted using the DNeasy Tissue Kit (Qiagen) with or without the BioSprint robotic workstation with the 96 DNA Plant Kit (Qiagen, Germany), according to the manufacturer’s protocols. DNA extracts of Pelobates fuscus analysed in previously published and unpublished studies were also used (13,17,22). Total genomic DNA of these samples was extracted from tissue samples using an overnight Proteinase K digestion (concentration 10 mg/mL), followed by a standard high-salt extraction method (23). For gDNA-extraction of swabs, the protocol by Broquet et al. (2007)(24) was used. Initially, test-sequencing of the mitochondrial control region (D-loop) was carried out. The non-coding D-loop comprises 1.4 kb in spadefoot toads (25), and is expected to exhibit a higher substitution rate than cytochrome b, which is barely variable in Central Europe (17). However, presumably due to pronounced secondary structures and repetitive sections, the D-loop could not be sequenced successfully with affordable techniques, and thus appeared unsuitable for further population genetic analyses. The microsatellites used in this study originated from two sources: a transcriptome made from pooled organs prepared and screened by the Leibniz Institute of Freshwater Ecology and Inland Fisheries (IGB, GenBank PRJNA357879), and a classical enriched microsatellite library, prepared by the Swiss company Ecogenics (21). For the design at the IGB, the program Primer3 v.0.4.0 (http://bioinfo.ut.ee/primer3-0.4.0/) was used. Only primers with a repeat motif of 4bp and more than five repetitions were selected. Following thorough assessments of their variability (differences in lengths, similar annealing temperatures (Ta), and no loops or primer dimers being formed between loci), 14 markers were selected and amplified (20). We genotyped PCR amplicons (for PCR conditions see Table S2-S3) on a capillary sequencer ABI 3500 or 3500xL Genetic Analyser (Applied Biosystems) and used the GeneMapper® software package to score the samples.
Genetic data analysis
FILTERING – Before filtering, the dataset comprised 570 genotypes from 66 sampling sites, containing information for 14 loci. First, we tested for missing data, removing samples or loci with more than 20% missing data. Second, one monomorphic locus was removed. Next, we addressed possible sibling relationships. When sampling larvae from the same pond and/or clutches, a bias in allele frequencies can be introduced (22). We therefore removed full siblings from the dataset. Full-sib relationships were assessed using Best Maximum Likelihood Sibship Assignment with Colony v2.0.6.8. (27). The following settings for this assignment were derived from Cox et al. (2017)(28): polygamous males, monogamous females, inbreeding included, medium run length, and 3 runs. Using the Best (ML) sibship assignment, members of the same full-sib family, except one, were removed from the dataset.
In addition, we applied Genepop v4.3 (29) to assess possible locus-specific deviations from Hardy-Weinberg equilibrium and to explore linkage disequilibrium for each pair of loci per location, then corrected for multiple testing using the Bonferroni method (30). A Markov chain method and default parameters were used. Decisions based on these results also took the number of samples per site into account. We also used Genepop v4.3 to explore the maximum likelihood estimation of null allele frequency. Based on Hardy-Weinberg disequilibrium, one locus was omitted from the dataset. Finally, after filtering, 334 individual genotypes originating from the 66 populations were retained for population genetic analysis (Figure 1) based on genetic information from 12 loci (Pf10635s, Pf14239s, Pf16076s, Pf9847s, Pf5754s, Pf9791sB, Pf3505s, Pf4451s, Pf8461s, Pf6194, Pf981, and Pf4582; Table S2).
GENETIC DIVERSITY – The following measures of genetic diversity were calculated for each population over all loci: mean allelic richness (AR), observed and unbiased expected heterozygosity (Ho and uHe). The genetic measures Ho and uHe were calculated using GenAlEx 6.5 (31). Allelic richness (AR) was calculated and adjusted for sample size, based on a minimum sample size of 5, using HP-Rare v1.1 (32). The calculations of genetic diversity were therefore based on information obtained from all 66 populations.
GENETIC DIVERSITY – The following measures of genetic diversity were calculated for each population over all loci: mean allelic richness (AR), observed and unbiased expected heterozygosity (Ho and uHe). The genetic measures Ho and uHe were calculated using GenAlEx 6.5 (31). Allelic richness (AR) was calculated and adjusted for sample size (rarefaction 5) using HP-Rare v1.1 (32). The calculations of genetic diversity were based on information obtained from all 66 populations.
GENETIC STRUCTURE – Calculation of pairwise FST-values (measuring fixation) and DEST values (measuring differentiation) (33) was carried out in R v. 4.3.0 (34) using R-packages DiveRsity v1.9.90 (35) and hierfstat v0.5-10 (36). Confidence intervals of 95% are based on 1000 bootstraps. For the analysis of these indices of genetic structure, we excluded locations with sample sizes smaller than 5, thereby reducing the number of populations available for interpretation from 66 to 30. Genetic PCA-ordination and the k-means non-parametric DAPC-clustering, including all genotypes (66 sampling sites), was performed using the adegenet v2.1.5 package in R (37). We tested for isolation-by-distance (IBD) by performing a Mantel test between the genetic and the geographical distance matrices using the ade4 v. 1.7-22 package in R (38). Next, we explored Bayesian clustering assignment through the programs BAPS (Bayesian Analysis of Population Structure) v6 (39) and STRUCTURE v2.3.4 (40). In BAPS, we used the spatial model for mixture clustering of individuals. With K-values ranging from K = 1 to K = 20, ten runs per K were run. Clustering analysis in STRUCTURE was performed for K = 1 to K = 20 clusters (ten runs per cluster), using an admixture ancestry model with a burn-in period of 100,000 iterations and 500,000 Markov Chain Monte Carlo (MCMC) replications. The best clustering option for the STRUCTURE results was determined using Structure Selector (41) and CLUMPAK (42). Ultimately, the optimal number of clusters was compared between the ΔK ‘Evanno method’ (43), the ‘Puechmaille method’(44), the results of the BAPS analysis, and the k-means analysis. For visualization of the individual membership values (Qi), based on the STRUCTURE results, we constructed barplots in R v. 4.3.0 (34) and CLUMPAK (42); the mean population membership values (Qmp) are used for visualization in QGIS 3.28.0-Firenze (45). Following the main STRUCTURE results, we approached subclustering in a hierarchical manner. After determination of the optimal number of global STRUCTURE-clusters (ΔK-based), each of these clusters underwent an individual clustering assignment analysis on its own to explore the structure within. For these analyses, parameters stayed the same as those used for the initial clustering analysis in STRUCTURE. Populations with mean assignment scores around 0.50 ± 0.05 across two different clusters were subsequently included within those two clusters. E.g., when a sampling location shows a mean assignment score of 0.45 to cluster A and a 0.55 assignment score to cluster B, the location will be analysed within both clusters separately. The range of explored K’s per subcluster depends on the number of sampling sites grouped into each subcluster and ranges from K = 1 to K equals the number of populations in a studied subcluster. Finally, to investigate a genetic signature of recent population declines, we utilized the IAM (Infinite Allele Model), SMM (Stepwise Mutation Model), and TPM (Two-Phase Model) methods as implemented in BOTTLENECK v1.2.02. (46) with default settings. Given nearly all populations have knowingly undergone population declines, BOTTLENECK could answer to what extent we are able to detect these events. Six out of 66 sites (Zonhoven (BE), Muldenaue (DE), Torino Ivrea, Maceratoio Cascinette d'Ivrea (IT), Gorssel and Valthe (NL), and Hrastovača (SR)) were deemed sufficient (N>13) for bottleneck analysis.
Corresponding references are provided in the manuscript.