Skip to main content

Phylogenomics indicates Amazonia as the major source of Neotropical swarm-founding social wasp diversity

Cite this dataset

Menezes, Rodolpho S.T.; Lloyd, Michael W.; Brady, Seán G. (2020). Phylogenomics indicates Amazonia as the major source of Neotropical swarm-founding social wasp diversity [Dataset]. Dryad.


The Neotropical realm harbors unparalleled species richness and hence has challenged biologists to explain the cause of its high biotic diversity. Empirical studies to shed light on the processes underlying biological diversification in the Neotropics are focused mainly on vertebrates and plants, with little attention to the hyperdiverse insect fauna. Here, we use phylogenomic data from ultraconserved element (UCE) loci to reconstruct for the first time the evolutionary history of Neotropical swarm-founding social wasps (Hymenoptera, Vespidae, Epiponini). Using maximum likelihood, Bayesian, and species tree approaches we recovered a highly resolved phylogeny for epiponine wasps. Additionally, we estimated divergence dates, diversification rates, and the biogeographic history for these insects in order to test whether the group followed a “museum” (speciation events occurred gradually over many millions of years) or “cradle” (lineages evolved rapidly over a short time period) model of diversification. The origin of many genera and all sampled extant Epiponini species occurred during the Miocene and Plio-Pleistocene. Moreover, we detected no major shifts in the estimated diversification rate during the evolutionary history of Epiponini, suggesting a relatively gradual accumulation of lineages with low extinction rates. Several lines of evidence suggest that the Amazonian region played a major role in the evolution of Epiponini wasps. This spatio-temporal diversification pattern, most likely concurrent with climatic and landscape changes in the Neotropics during the Miocene and Pliocene, establishes the Amazonian region as the major source of Neotropical swarm-founding social wasp diversity.


Library preparation, UCE enrichment, and sequencing

            The library preparation and target enrichment steps were performed following Faircloth et al. (2015). For all samples we ran 10μL of each DNA extract for 60 min at 100 volt on 1.5% agarose SB (sodium borate) gels, to estimate size of the genomic DNA. Moreover, we measured DNA concentration with a Qubit 2.0 fluorometer (High sensitivity or Broad range kit; Life Technologies Inc., Carlsbad, CA) and 2-500 ng DNA was sheared for 15-60 secs (amp = 25, pulse = 10) to an average fragment size of 500-600 base pair (bp) (as verified on an agarose gel) by sonication using a Qsonica Q800R sonicator (Qsonica LLC, Newton, CT). Degraded DNA from all pinned museum specimens were not sheared.

The sheared DNA was used as input for a modified genomic DNA library preparation protocol (Kapa Hyper Prep Library Kit, Kapa Biosystems), incorporating “with-bead” cleanup steps (Fisher et al., 2011) and a generic SPRI substitute [(Rohland et al. 2012), “speedbeads” hereafter], as described by Faircloth et al. (2015). For adapter ligation, we used TruSeq-style adapters (Faircloth et al., 2012) and PCR amplified 50% of the resulting library volume (15 μL) with a reaction mix of 25 μL HiFi HotStart polymerase (Kapa Biosystems), 2.5 μL each of Illumina TruSeq-style i5 and i7 primers (5 μM each), and 5 μL double-distilled water (ddH20). We used the following thermal protocol: 98°C for 45 s; 13 cycles of 98°C for 15 s, 65°C for 30 s, 72°C for 60 s, and final extension at 72°C for 5 m. After rehydrating (in 23 μL pH 8 Elution Buffer (EB hereafter) and purifying reactions using 1.0X speedbeads, 8–10 libraries were combined at equimolar ratios into enrichment pools with final concentrations of 139–168 ng/μL.

We enriched each pool using the updated bait design Hymenoptera-V2 contained a total of 2590-targeted UCE loci for Hymenoptera (see Branstetter et al., 2017) synthesized by MYcroarray (MYcroarray, Ann Arbor, MI). We followed library enrichment procedures for the MYcroarray MYBaits kit (Blumenstiel et al., 2010), except we used a 0.1X concentration of the standard MYBaits concentration and added 0.7 μL of 500 μM custom blocking oligos designed against our custom sequence tags. We ran the hybridization reaction for 24 h at 65°C, subsequently bound all pools to streptavidin beads (MyOne C1; Life Technologies), and washed bound libraries according to a standard target enrichment protocol (Blumenstiel et al., 2010). We used the with-bead approach for PCR recovery of enriched libraries, as described by Faircloth et al. (2015). We combined 15 μL of streptavidin bead-bound, enriched library with 25 μL HiFi HotStart Taq (Kapa Biosystems), 5 μL of Illumina TruSeq primer mix (5 μM forward and reverse primers) and 5 μL of ddH2O. We ran post-enrichment PCR using the following thermal conditions: 98°C for 45 s; 18 cycles of 98°C for 15 s, 60°C for 30 s, 72°C for 60 s; and a final extension of 72°C for 5 m. We purified resulting reactions using 1.0X speedbeads, and we rehydrated the enriched pools in 22 μL EB. We quantified 2 μL of each enriched pool using a Qubit 2.0 fluorometer (broad sensitivity kit).

We quantified post-enrichment library concentration with qPCR using a SYBR® FAST qPCR kit (Kapa Biosystems) on a ViiA™ 7 (Life Technologies), and based on the size-adjusted concentrations estimated by qPCR, we pooled libraries at equimolar concentrations and size-selected for 250–800 bp with a BluePippin (SageScience). We sent size-selected pools to the University of Utah’s High Throughput Genomics Core Facility for sequencing as single lane of a 125-bp paired-end Illumina HiSeq 2500.

Bioinformatics and matrix preparation

We performed all bioinformatics steps, including read cleaning, assembly, and alignment, using the Phyluce v1.5 software package. The cleaned and trimmed process of raw reads was performed using Illumiprocessor (Faircloth, 2013), based on the package Trimmomatic (Bolger et al., 2014). The cleaned reads were assembled using a Phyluce script ( around Trinity (Grabherr et al., 2011). After assembly we calculated n50’s for contigs using a script ( and mapped contigs to UCE loci (, with the min-coverage = 50 and min-identity = 80. Next, we used two scripts ( and to create a fasta file containing all taxa and UCE loci. During these steps we added contigs previously published from nine taxa (Rhopalosoma nearcticum, Pachodynerus alayoi, Parancistrocerus bacu, Pseudomasaris vespoides, Metapolybia cingulata, Mischocyttarus flavitarsis, Mischocyttarus mexicanus, Polistes poeyi, and Vespa sp.) to get UCE data using the Hymenoptera-V2 bait set.

We aligned each locus using a script ( (min-length = 20, no-trim) using the program MAFFT v7.221 (Katoh et al., 2013). After alignment, we trimmed all alignments using a specific script ( for Gblocks (Castresana, 2000) with the following settings: b1 = 0.5, b2 = 0.5, b3 = 12, and b4 = 7. Next, considering amount of missing data, we performed several rounds of matrix filtering followed by phylogenetic inference and assessment. To do this we performed two strategies: first, considering only taxa with a minimum number of UCE loci (>10 loci; >200 loci; and >700 loci) because for some pinned museum specimens we recovered a low number of loci, and second, we used a script ( that filters loci for varying amounts of minimum taxon occupancy (% samples required to be present in each locus). We used the script to filter all loci for 90, 80, 70, 50, 25%, and non-filtering of minimum taxon occupancy. For each data set we generated alignment stats using two scripts ( and and generated a concatenated matrix for each data set ( Finally, we analyzed all concatenated matrices using RAxML v8.2.11 (Stamatakis, 2014).

Usage notes


National Science Foundation, Award: DEB-1555905

São Paulo Research Foundation, Award: 2015/02432-0

São Paulo Research Foundation, Award: 2016/21098-7

National Council for Scientific and Technological Development, Award: 431249/2018-0


South America