Glacial cycles drive rapid divergence of cryptic field vole species
Cite this dataset
Fletcher, Nicholas et al. (2019). Glacial cycles drive rapid divergence of cryptic field vole species [Dataset]. Dryad. https://doi.org/10.5061/dryad.zcrjdfn6n
Abstract
Methods
Genomic DNA was extracted from ethanol-preserved vole tissues (ears or digits) from 83 specimens, collected between 1995 and 2009 in western Europe, using the DNeasy Blood and Tissue kit (Qiagen, Valencia, CA) following standard protocols. GBS was conducted on extracted DNA by the Cornell Institute for Genomic Diversity (Elshire et al 2011). The enzyme PstI (CTGCAG) was used for digestion, and the fragmented DNA was then ligated to a barcoded adaptor and a common adaptor with the appropriate ‘sticky ends’. Each individual was given a unique barcode combination and, after ligation, all individuals were pooled into a single Eppendorf tube. The libraries were then subjected to PCR, using primers that matched the barcoded and common adaptors, to amplify appropriately sized sequence fragments and to add sequencing primers to the libraries. Libraries were cleaned using a Qiaquick PCR Purification Kit (Qiagen, Valencia, CA). They were sequenced using single-end 100 bp reads on two separate lanes of an Illumina HiSeq 2000 at Cornell University Life Sciences Core Laboratories Center.
SNP genotyping and filtering
Raw FASTQ files from the Illumina run were converted to individual SNP genotypes using the TASSEL GBS pipeline, as part of the TASSEL 5.0 software (Bradbury et al 2007). We implemented the standard TASSEL pipeline with the following parameters. First, we found all reads with barcodes that match the index file and trimmed them to 64 bp to create tags. Only tags with a minimum number of five reads were retained and subsequently merged into a master tag list for each individual. Only tags with one SNP per fragment were included in this analysis. All reads were aligned to the Microtus ochrogaster reference genome (McGraw et al 2011) using BWA (Li and Durbin 2009). The error rate for SNP calling was set to 0.03, with a genotypic mismatch rate set to 0.1. To minimise sequencing error, triallelic minor alleles were excluded, as well as SNPs with a minor allele frequency < 0.05. Only SNPs that were present in > 60 of 83 individuals (~72%), and therefore shared across all three cryptic species, were included. It should be noted that changing this missing data threshold to 80%, 90%, and 100% did not change the pattern of our results (data not shown). Individuals with a proportion of missing SNP data > 0.25% were excluded and the average proportion of missing SNP data per individual was 0.066% across all individuals after filtering. Data were converted to .vcf files for subsequent analyses. To filter out SNPs from paralogous loci, SNPs were excluded if they showed significant (P < 0.05) excess heterozygosity and deviation from Hardy-Weinberg equilibrium as calculated using vcftools (Danecek et al 2011).
Structure Plots R Code
To examine patterns of genetic differentiation and admixture, we used STRUCTURE version 2.3 with a 50,000 cycle run burn-in period followed by 50,000 cycles using an admixture model and correlated allele frequencies among groups (Pritchard et al 2000). This was repeated for K = 1 – 5. STRUCTURE HARVESTER (Earl et al 2012) was used to determine the K with highest log-likelihood.
This is R code for visualization of the STRUCTURE plots.