Phylogenomics indicates Amazonia as the major source of Neotropical swarm-founding social wasp diversity
Data files
Jun 03, 2020 version files 27.38 MB
-
Beast files.zip
1.11 MB
-
mafft-nexus-min-50per-taxa.phylip
26.27 MB
Abstract
Methods
Library preparation, UCE enrichment, and sequencing
The library preparation and target enrichment steps were performed following Faircloth et al. (2015). For all samples we ran 10μL of each DNA extract for 60 min at 100 volt on 1.5% agarose SB (sodium borate) gels, to estimate size of the genomic DNA. Moreover, we measured DNA concentration with a Qubit 2.0 fluorometer (High sensitivity or Broad range kit; Life Technologies Inc., Carlsbad, CA) and 2-500 ng DNA was sheared for 15-60 secs (amp = 25, pulse = 10) to an average fragment size of 500-600 base pair (bp) (as verified on an agarose gel) by sonication using a Qsonica Q800R sonicator (Qsonica LLC, Newton, CT). Degraded DNA from all pinned museum specimens were not sheared.
The sheared DNA was used as input for a modified genomic DNA library preparation protocol (Kapa Hyper Prep Library Kit, Kapa Biosystems), incorporating “with-bead” cleanup steps (Fisher et al., 2011) and a generic SPRI substitute [(Rohland et al. 2012), “speedbeads” hereafter], as described by Faircloth et al. (2015). For adapter ligation, we used TruSeq-style adapters (Faircloth et al., 2012) and PCR amplified 50% of the resulting library volume (15 μL) with a reaction mix of 25 μL HiFi HotStart polymerase (Kapa Biosystems), 2.5 μL each of Illumina TruSeq-style i5 and i7 primers (5 μM each), and 5 μL double-distilled water (ddH20). We used the following thermal protocol: 98°C for 45 s; 13 cycles of 98°C for 15 s, 65°C for 30 s, 72°C for 60 s, and final extension at 72°C for 5 m. After rehydrating (in 23 μL pH 8 Elution Buffer (EB hereafter) and purifying reactions using 1.0X speedbeads, 8–10 libraries were combined at equimolar ratios into enrichment pools with final concentrations of 139–168 ng/μL.
We enriched each pool using the updated bait design Hymenoptera-V2 contained a total of 2590-targeted UCE loci for Hymenoptera (see Branstetter et al., 2017) synthesized by MYcroarray (MYcroarray, Ann Arbor, MI). We followed library enrichment procedures for the MYcroarray MYBaits kit (Blumenstiel et al., 2010), except we used a 0.1X concentration of the standard MYBaits concentration and added 0.7 μL of 500 μM custom blocking oligos designed against our custom sequence tags. We ran the hybridization reaction for 24 h at 65°C, subsequently bound all pools to streptavidin beads (MyOne C1; Life Technologies), and washed bound libraries according to a standard target enrichment protocol (Blumenstiel et al., 2010). We used the with-bead approach for PCR recovery of enriched libraries, as described by Faircloth et al. (2015). We combined 15 μL of streptavidin bead-bound, enriched library with 25 μL HiFi HotStart Taq (Kapa Biosystems), 5 μL of Illumina TruSeq primer mix (5 μM forward and reverse primers) and 5 μL of ddH2O. We ran post-enrichment PCR using the following thermal conditions: 98°C for 45 s; 18 cycles of 98°C for 15 s, 60°C for 30 s, 72°C for 60 s; and a final extension of 72°C for 5 m. We purified resulting reactions using 1.0X speedbeads, and we rehydrated the enriched pools in 22 μL EB. We quantified 2 μL of each enriched pool using a Qubit 2.0 fluorometer (broad sensitivity kit).
We quantified post-enrichment library concentration with qPCR using a SYBR® FAST qPCR kit (Kapa Biosystems) on a ViiA™ 7 (Life Technologies), and based on the size-adjusted concentrations estimated by qPCR, we pooled libraries at equimolar concentrations and size-selected for 250–800 bp with a BluePippin (SageScience). We sent size-selected pools to the University of Utah’s High Throughput Genomics Core Facility for sequencing as single lane of a 125-bp paired-end Illumina HiSeq 2500.
Bioinformatics and matrix preparation
We performed all bioinformatics steps, including read cleaning, assembly, and alignment, using the Phyluce v1.5 software package. The cleaned and trimmed process of raw reads was performed using Illumiprocessor (Faircloth, 2013), based on the package Trimmomatic (Bolger et al., 2014). The cleaned reads were assembled using a Phyluce script (assemblo_trinity.py) around Trinity (Grabherr et al., 2011). After assembly we calculated n50’s for contigs using a script (fasta_stats.pl) and mapped contigs to UCE loci (match_contigs_to_probes.py), with the min-coverage = 50 and min-identity = 80. Next, we used two scripts (get_match_counts.py and get_fastas_from_match_counts.py) to create a fasta file containing all taxa and UCE loci. During these steps we added contigs previously published from nine taxa (Rhopalosoma nearcticum, Pachodynerus alayoi, Parancistrocerus bacu, Pseudomasaris vespoides, Metapolybia cingulata, Mischocyttarus flavitarsis, Mischocyttarus mexicanus, Polistes poeyi, and Vespa sp.) to get UCE data using the Hymenoptera-V2 bait set.
We aligned each locus using a script (align_seqcap_align.py) (min-length = 20, no-trim) using the program MAFFT v7.221 (Katoh et al., 2013). After alignment, we trimmed all alignments using a specific script (align_get_gblocks_trimmed_alignments_from_untrimmed.py) for Gblocks (Castresana, 2000) with the following settings: b1 = 0.5, b2 = 0.5, b3 = 12, and b4 = 7. Next, considering amount of missing data, we performed several rounds of matrix filtering followed by phylogenetic inference and assessment. To do this we performed two strategies: first, considering only taxa with a minimum number of UCE loci (>10 loci; >200 loci; and >700 loci) because for some pinned museum specimens we recovered a low number of loci, and second, we used a script (get_only_loci_with_min_taxa.py) that filters loci for varying amounts of minimum taxon occupancy (% samples required to be present in each locus). We used the script to filter all loci for 90, 80, 70, 50, 25%, and non-filtering of minimum taxon occupancy. For each data set we generated alignment stats using two scripts (get_align_summary_data.py and get_informative_sites.py) and generated a concatenated matrix for each data set (format_nexus_files_for_raxml.py). Finally, we analyzed all concatenated matrices using RAxML v8.2.11 (Stamatakis, 2014).