Data for: Phylogeography and populations genetics of a widespread cold-adapted ant, Prenolepis imparis

Abstract

Historical climate fluctuations have left genetic signatures on species and populations across North America. Here, we used phylogenetic and population genetic analyses from 1,402 orthologous sequences of 75 individuals obtained through sequencing of Ultraconserved Elements (UCEs) to identify population genetic structure and historical demographic patterns across the range of a widespread, cold-adapted ant, the winter ant, Prenolepis imparis. We relate the genomic patterns to those expected as a result of in situ diversification, maintained connectivity, or recent migration. We recovered five well-supported, genetically isolated clades across the distribution: 1) a basal lineage located in Florida, 2) populations across the southern United States, 3) populations that span the midwestern and northeastern United States, 4) populations from the western United States, and 5) populations in southwestern Arizona and Mexico. Using Bayesian clustering analysis in STRUCTURE and k-means clustering in ADEGENET, we investigated gene flow between these major genetic clades and did not find evidence of gene flow between clades. We did find evidence of localized structure with migration in the western United States clade. High support for five major geographic lineages and lack of evidence of contemporary gene flow indicate in situ diversification across the species’ range, probably influenced by glacial cycles of the late Quaternary.

For ants that were destructively sampled, genomic DNA was extracted from whole ants using a Qiagen DNeasy Blood & Tissue kit (Valencia, CA). The kit protocol was followed as specified, with the following modifications: samples were first ground in 1.5mL tubes with a stainless-steel grinding ball, 50 μg RNase A and 10 μL DTT were added to the lysis step, and finally, samples were eluted in 300μL RNase- and DNase-free water and then put in a vacuum-heater and evaporated down to 100 μL. For samples that were non-destructively sampled, the same modifications were used, except ants were not ground but, instead, placed whole in the lysis buffer and put in a rotating oven for 48 hours, with an additional 20 μL ProK added after the first 24 hours. These samples were then soaked in 70% ethanol solution before being re-mounted.

Following extraction, DNA was sheared using a Bioruptor sonicator (Diagenode) with 1 or 4 rounds of sonication (1 min per round on low, 90 s on, and 90 s off). If the sample was collected before the year 2000 or if the sample was pinned prior to DNA extraction, we assumed the samples were partially degraded and they were only sonicated for a 1-minute shear time total. All others were sheared for 4 minutes. Because of low starting concentrations (often less than 100ng total DNA), the sheared DNA was not visualized on a gel. As the majority of the sequence variation is usually found farther away from the target sites for the UCE probes, we wanted longer size fragments (approximately 400-1000bp). However, without being able to visualize this, we did not know the size fragments of our extractions, so we purified the reaction following shearing using 0.7x low ratio Solid Phase Reversible Immobilization (SPRI) beads in order to remove smaller fragments.

Library preparation and array capture

Following sonication and purification, the DNA library preparation and array capture protocols were used as described in Meyer and Kircher (2010), with minor modifications. Most of those modifications were only introduced during the reaction clean-up. For all the purifications, we used 80% ethanol instead of 70% ethanol as stated in Meyer and Kircher (2010). In addition, 0.7x low ratio SPRI beads were used for purification after blunt-end repair and after indexing reactions in order to remove DNA fragments that were too sheared. Blunt-ends were repaired as in Meyer and Kircher (2010), except our mix had a final concentration of 0.05 U/µL of T4 DNA polymerase and 0.25 U/µL of T4 polynucleotide kinase in 20 µL of master mix. In addition, we combined three separate indexing PCRs (12 cycles) where every sample had a unique index number and combined identical samples to one final indexed library eluted in 22 µL water (instead of EB) before enrichment. We assessed the success of library preparation by measuring DNA concentrations with the Qubit fluorometer and visualizing the libraries on an agarose gel.

For UCE enrichment, we made pools at equimolar concentrations containing eight uniquely labeled samples that were pooled together to contain 500 ng total DNA. We performed enrichments using a custom UCE bait set developed for Hymenoptera (‘hym-v2’). This set has custom-designed probes targeting 2590 UCE loci in Hymenoptera (Branstetter et al. 2017). We followed library enrichment procedures for MYcroarray MYBaits kit (Mycroarray, Inc), except we used 0.1X concentration of the standard MYBaits concentration, and added 5 µL of the Roche Developer Reagent, and 1.0 µL of 10mM custom blocking oligos designed for our custom tags Meyer and Kircher (2010). The enrichment was performed at 65°C for 22 hours. We then used 10 µL of the library and cycled this 18 times during the amplification. Following post-enrichment PCR, we purified this reaction in 1.2X SPRI beads and eluted in 22 µL EB.

In order to verify enrichment, we performed qPCR on both our post-enrichment and unenriched libraries using a DyNAmo™ Flash SYBR® Green qPCR kit (Thermo Fisher Scientific). We checked that in our post-enrichment libraries, we saw a greater fold enrichment in our positive controls, UCE82, UCE591, and UCE1481, than the unenriched libraries (Faircloth et al. 2015). We quantified each enriched pool using a Qubit fluorometer, checked peak quality and peak library size using a Bioanalyzer. We then diluted each to less than 100nM and pooled them all at equimolar concentrations into a single sequencing lane. Sequencing was performed on an Illumina HiSeq4000 (Illumina, Inc., San Diego, CA, USA; Vincent J. Coates Genomic Sequencing Laboratory at UC Berkeley).

Bioinformatic processing

We used a custom Perl workflow to process UCE sequence capture data from published methods described in (Bi et al. 2012; Portik et al. 2016). The pipeline for processing de novo target capture data is available in GitHub (https://github.com/CGRL-QB3-UCBerkeley/denovoTargetCapturePhylogenomics). Briefly, raw FASTQ reads were filtered using Cutadapt (Martin 2011) and Trimmomatic (Bolger et al. 2014) to remove low quality reads and adapter sequences. Exact duplicates were eliminated using Super Deduper (https://github.com/dstreett/Super-Deduper). We used FLASH (Magoc & Salzberg 2011) to merge overlapping paired-end reads. We then used SPAdes (Bankevich et al. 2012) to assemble cleaned reads via a multi-kmer approach, to generate raw assemblies for each sample. We used BLASTn (Altschul et al. 1990)(evalue cutoff = 1e-10, similarity cutoff = 75) to compare SPAdes raw assemblies of each individual to the UCE baits to identify assembled contigs that stemmed from UCE loci. The resulting non-redundant UCE assemblies from each individual sample were used as a raw reference that included the targeted UCE and the flanking sequences (+/-500bp to targeted UCE region). Paired-end and merged cleaned reads from each individual were then aligned to the individual-specific assemblies using Novoalign (Li & Durbin 2009) and we only retained reads that mapped uniquely to the reference. We used Picard (http://broadinstitute.github.io/picard/) to add read groups and GATK (McKenna et al. 2010) to perform re-alignment around insertions/deletions. We used SAMtools/BCFtools (Li et al. 2009) to generate individual consensus sequences by calling genotypes and incorporating ambiguous sites in the individual-specific assemblies. Sites were masked as ‘N’s if the read depth was lower than 5x or if they were within 5 bp of an indel. RepeatMasker (Smit et al. 2015) was implemented to mask (by using Ns) putatively repetitive elements and short repeats, using the “ants” database. For each individual, we retained a resulting consensus contig if no more than 20% of the nucleotides were Ns after the above masking. Multi-sample alignments of each locus were generated with MAFFT (Katoh & Standley 2013) and ambiguously aligned regions in alignments were then trimmed using Trimal (Capella-Gutierrez et al. 2009). We retained alignment where at least 30% of the samples contained less than 40% missing data (Ns or gaps). We also calculated average read depth and trimmed off alignments that fell outside of the 2nd and 98th percentile of the distribution. To further control for potential paralogs, we also removed entire alignment in which any site where the maximum proportion of shared heterozygosity was above 0.3. Capture efficiency was evaluated by average per site depth for the target, flanking regions for each locus (at least 500bp upstream and downstream), sensitivity (the percentage of bases within a target sequence that are recovered in one or more reads), and specificity (which determines the percentage of cleaned reads mapped to target and +/- 500bp flanking sequences).

We created one alignment that contained all the UCE loci recovered in our analyses. We also filtered the individual UCE alignments for 100% and 90% taxa present. This folder contains the alignments by UCE. Please refer to the README.txt file for explanation of files.

We combined all filtered individual alignments in PHYLIP format and made a partitioned file ready for RAxML analysis. We ran RAxML using several different datasets and analysis methods to account for the effects of missing data and data partitioning (Branstetter et al. 2017).

Data for: Phylogeography and populations genetics of a widespread cold-adapted ant, Prenolepis imparis

Data files

Abstract

Methods

Usage notes

Works referencing this dataset