Skip to main content

Glacial cycles drive rapid divergence of cryptic field vole species

Cite this dataset

Fletcher, Nicholas et al. (2019). Glacial cycles drive rapid divergence of cryptic field vole species [Dataset]. Dryad.


Understanding the factors that contribute to the generation of reproductively isolated forms is a fundamental goal of evolutionary biology. Cryptic species are an especially interesting challenge to study in this context since they lack obvious morphological differentiation that provides clues to adaptive divergence that may drive reproductive isolation. Geographical isolation in refugial areas during glacial cycling is known to be important for generating genetically divergent populations, but its role in the origination of new species is still not fully understood and likely to be situation dependent. We combine analysis of 35,434 single nucleotide polymorphisms (SNPs) with environmental niche modelling (ENM) to investigate genomic and ecological divergence in three cryptic species formerly classified as the field vole (Microtus agrestis). The SNPs demonstrate high genomic divergence (pairwise FST values of 0.45 - 0.72) and little evidence of gene flow among the three field vole cryptic species, and we argue that genetic drift may have been a particularly important mechanism for divergence in the group. The ENM reveals three areas as potential glacial refugia for the cryptic species and differing climatic niches, although with spatial overlap between species pairs. This evidence underscores the role that glacial cycling has in promoting genetic differentiation and reproductive isolation by subdivision into disjunct distributions at glacial maxima in areas relatively close to ice sheets. Future investigation of the intrinsic barriers to gene flow between the field vole cryptic species is required to fully assess the mechanisms that contribute to reproductive isolation. In addition, the Portuguese field vole (M. rozianus) shows high inbreeding coefficient and a restricted climatic niche, and warrants investigation into its conservation status.


Genomic DNA was extracted from ethanol-preserved vole tissues (ears or digits) from 83 specimens, collected between 1995 and 2009 in western Europe,  using the DNeasy Blood and Tissue kit (Qiagen, Valencia, CA) following standard protocols. GBS was conducted on extracted DNA by the Cornell Institute for Genomic Diversity (Elshire et al 2011). The enzyme PstI (CTGCAG) was used for digestion, and the fragmented DNA was then ligated to a barcoded adaptor and a common adaptor with the appropriate ‘sticky ends’. Each individual was given a unique barcode combination and, after ligation, all individuals were pooled into a single Eppendorf tube. The libraries were then subjected to PCR, using primers that matched the barcoded and common adaptors, to amplify appropriately sized sequence fragments and to add sequencing primers to the libraries. Libraries were cleaned using a Qiaquick PCR Purification Kit (Qiagen, Valencia, CA). They were sequenced using single-end 100 bp reads on two separate lanes of an Illumina HiSeq 2000 at Cornell University Life Sciences Core Laboratories Center.

SNP genotyping and filtering

Raw FASTQ files from the Illumina run were converted to individual SNP genotypes using the TASSEL GBS pipeline, as part of the TASSEL 5.0 software (Bradbury et al 2007). We implemented the standard TASSEL pipeline with the following parameters. First, we found all reads with barcodes that match the index file and trimmed them to 64 bp to create tags. Only tags with a minimum number of five reads were retained and subsequently merged into a master tag list for each individual. Only tags with one SNP per fragment were included in this analysis. All reads were aligned to the Microtus ochrogaster reference genome (McGraw et al 2011) using BWA (Li and Durbin 2009). The error rate for SNP calling was set to 0.03, with a genotypic mismatch rate set to 0.1.  To minimise sequencing error, triallelic minor alleles were excluded, as well as SNPs with a minor allele frequency < 0.05. Only SNPs that were present in > 60 of 83 individuals (~72%), and therefore shared across all three cryptic species, were included. It should be noted that changing this missing data threshold to 80%, 90%, and 100% did not change the pattern of our results (data not shown). Individuals with a proportion of missing SNP data > 0.25% were excluded and the average proportion of missing SNP data per individual was 0.066% across all individuals after filtering. Data were converted to .vcf files for subsequent analyses. To filter out SNPs from paralogous loci, SNPs were excluded if they showed significant (P < 0.05) excess heterozygosity and deviation from Hardy-Weinberg equilibrium as calculated using vcftools (Danecek et al 2011).

Structure Plots R Code

To examine patterns of genetic differentiation and admixture, we used STRUCTURE version 2.3 with a 50,000 cycle run burn-in period followed by 50,000 cycles using an admixture model and correlated allele frequencies among groups (Pritchard et al 2000). This was repeated for K = 1 – 5. STRUCTURE HARVESTER (Earl et al 2012) was used to determine the K with highest log-likelihood.

This is R code for visualization of the STRUCTURE plots.